Posts about llm
-
The dispatch layer was eating the quality, not the model
We blamed the LLM for a 54% task failure rate. The real culprit was seven layers of dispatch infrastructure between intent and execution.
-
Cross-Model Review: Why Model Diversity Beats Model Capability
When AI models review each other's work, independence matters more than intelligence. The same principle that makes external audit valuable makes cross-model review sharper than same-family review.
-
Stop Theorizing About Your Prompts
LLMs are the cheapest experimental subjects in history. Why aren't you testing?
-
Optimize for Routing, Not Tokens
With 1M context windows, token savings are rounding error. The real metric is P(right tool | user intent) — does your agent reach for the right tool at the right moment?
-
Good Enough Parrots
The philosophical debate about whether LLMs understand is orthogonal to whether they're useful for knowledge extraction.
-
The Expert Illusion
Why 'you are an expert' is the most popular and least useful prompt engineering technique
-
Don't Be Impressed by Fluency
AI can reproduce smart arguments on demand. I'm not sure that's different from thinking. But the uncertainty itself is worth sitting with.
-
The $1 Billion Bet Against LLMs
One of the architects of modern deep learning just raised $1B on the thesis that token prediction can't reach real reasoning. Here's what he's proposing instead — and why it matters even if he's wrong.
-
The First Datapoint
An AI agent ran unsupervised for two days and found twenty improvements to another model's training. Not an AGI claim. A rate claim.
-
Language Is the Medium, Not the Purpose
We called them language models and spent years confused about why they could reason. The name stuck to the interface, not the mechanism.
-
LLMs Are Better at Editing Than Writing
Ask an AI to write from scratch and you get the average of the corpus. Give it something rough and it amplifies what's already there. The workflow implications are significant.
-
The Silent Stall: Debugging GPT-5.4-Pro's Responses API
Three hours of debugging revealed two non-obvious behaviours about GPT-5.4-Pro that aren't in the docs: a minimum token budget requirement and a wall-clock timeout gap in Rust async code.
-
What it actually takes to run an AI agent in a bank
The resistance to AI agents in banking isn't mostly cultural. It's infrastructure — and the gap is more interesting than the politics.
-
AI Evals: Why Teams Build Metrics Before They've Read a Trace
Most teams build evaluators before reading a single trace. The sequence that actually works is the opposite: observe, categorise, then measure.