Newsletter

May digest — what I'm reading

Curated reading list — links and notes from the past month.

May 14, 2026

May has been a dense month for reading. A few themes kept surfacing across different sources: the economics of inference, the organizational challenges of shipping AI features at scale, and a quieter but important thread on what "evaluation" actually means when your system's outputs are open-ended.

On the inference economics side, the piece that stuck with me most was a detailed breakdown of cost-per-token trends over the past eighteen months. The trajectory is steep enough that business models built on API calls today will look very different in two years — which is either an opportunity or a threat depending on how your margins are structured. Worth reading alongside the counterargument that cheap inference drives up volume faster than it drives down revenue.

I also spent time with a long-form post-mortem from a team that spent eight months building an AI feature, launched it, and found that real users behaved almost nothing like their beta cohort. The lessons are not surprising in hindsight — synthetic testing environments filter out the tail cases that matter most — but the specificity of the failure modes was instructive. More teams should write these.

Finally, a short essay arguing that "evaluation" in the LLM context has been colonized by benchmark culture, and that most production systems need something closer to a human editorial process than an automated test suite. I do not fully agree, but it sharpened my thinking about where automated evals are load-bearing versus where they give false confidence.