Posts about llm

4 Apr 2026
The dispatch layer was eating the quality, not the model
We blamed the LLM for a 54% task failure rate. The real culprit was seven layers of dispatch infrastructure between intent and execution.
20 Mar 2026
Cross-Model Review: Why Model Diversity Beats Model Capability
When AI models review each other's work, independence matters more than intelligence. The same principle that makes external audit valuable makes cross-model review sharper than same-family review.
20 Mar 2026
Stop Theorizing About Your Prompts
LLMs are the cheapest experimental subjects in history. Why aren't you testing?
18 Mar 2026
Optimize for Routing, Not Tokens
With 1M context windows, token savings are rounding error. The real metric is P(right tool | user intent) — does your agent reach for the right tool at the right moment?
17 Mar 2026
Good Enough Parrots
The philosophical debate about whether LLMs understand is orthogonal to whether they're useful for knowledge extraction.
14 Mar 2026
The Expert Illusion
Why 'you are an expert' is the most popular and least useful prompt engineering technique
14 Mar 2026
Don't Be Impressed by Fluency
AI can reproduce smart arguments on demand. I'm not sure that's different from thinking. But the uncertainty itself is worth sitting with.
11 Mar 2026
The $1 Billion Bet Against LLMs
One of the architects of modern deep learning just raised $1B on the thesis that token prediction can't reach real reasoning. Here's what he's proposing instead — and why it matters even if he's wrong.
11 Mar 2026
The First Datapoint
An AI agent ran unsupervised for two days and found twenty improvements to another model's training. Not an AGI claim. A rate claim.
11 Mar 2026
Language Is the Medium, Not the Purpose
We called them language models and spent years confused about why they could reason. The name stuck to the interface, not the mechanism.
11 Mar 2026
LLMs Are Better at Editing Than Writing
Ask an AI to write from scratch and you get the average of the corpus. Give it something rough and it amplifies what's already there. The workflow implications are significant.
7 Mar 2026
The Silent Stall: Debugging GPT-5.4-Pro's Responses API
Three hours of debugging revealed two non-obvious behaviours about GPT-5.4-Pro that aren't in the docs: a minimum token budget requirement and a wall-clock timeout gap in Rust async code.
7 Mar 2026
What it actually takes to run an AI agent in a bank
The resistance to AI agents in banking isn't mostly cultural. It's infrastructure — and the gap is more interesting than the politics.
6 Mar 2026
AI Evals: Why Teams Build Metrics Before They've Read a Trace
Most teams build evaluators before reading a single trace. The sequence that actually works is the opposite: observe, categorise, then measure.

Posts about llm

The dispatch layer was eating the quality, not the model We blamed the LLM for a 54% task failure rate. The real culprit was seven layers of dispatch infrastructure between intent and execution.

Cross-Model Review: Why Model Diversity Beats Model Capability When AI models review each other's work, independence matters more than intelligence. The same principle that makes external audit valuable makes cross-model review sharper than same-family review.

Stop Theorizing About Your Prompts LLMs are the cheapest experimental subjects in history. Why aren't you testing?

Optimize for Routing, Not Tokens With 1M context windows, token savings are rounding error. The real metric is P(right tool | user intent) — does your agent reach for the right tool at the right moment?

Good Enough Parrots The philosophical debate about whether LLMs understand is orthogonal to whether they're useful for knowledge extraction.

The Expert Illusion Why 'you are an expert' is the most popular and least useful prompt engineering technique

Don't Be Impressed by Fluency AI can reproduce smart arguments on demand. I'm not sure that's different from thinking. But the uncertainty itself is worth sitting with.

The $1 Billion Bet Against LLMs One of the architects of modern deep learning just raised $1B on the thesis that token prediction can't reach real reasoning. Here's what he's proposing instead — and why it matters even if he's wrong.

The First Datapoint An AI agent ran unsupervised for two days and found twenty improvements to another model's training. Not an AGI claim. A rate claim.

Language Is the Medium, Not the Purpose We called them language models and spent years confused about why they could reason. The name stuck to the interface, not the mechanism.

LLMs Are Better at Editing Than Writing Ask an AI to write from scratch and you get the average of the corpus. Give it something rough and it amplifies what's already there. The workflow implications are significant.

The Silent Stall: Debugging GPT-5.4-Pro's Responses API Three hours of debugging revealed two non-obvious behaviours about GPT-5.4-Pro that aren't in the docs: a minimum token budget requirement and a wall-clock timeout gap in Rust async code.

What it actually takes to run an AI agent in a bank The resistance to AI agents in banking isn't mostly cultural. It's infrastructure — and the gap is more interesting than the politics.

AI Evals: Why Teams Build Metrics Before They've Read a Trace Most teams build evaluators before reading a single trace. The sequence that actually works is the opposite: observe, categorise, then measure.

The dispatch layer was eating the quality, not the model
We blamed the LLM for a 54% task failure rate. The real culprit was seven layers of dispatch infrastructure between intent and execution.

Cross-Model Review: Why Model Diversity Beats Model Capability
When AI models review each other's work, independence matters more than intelligence. The same principle that makes external audit valuable makes cross-model review sharper than same-family review.

Stop Theorizing About Your Prompts
LLMs are the cheapest experimental subjects in history. Why aren't you testing?

Optimize for Routing, Not Tokens
With 1M context windows, token savings are rounding error. The real metric is P(right tool | user intent) — does your agent reach for the right tool at the right moment?

Good Enough Parrots
The philosophical debate about whether LLMs understand is orthogonal to whether they're useful for knowledge extraction.

The Expert Illusion
Why 'you are an expert' is the most popular and least useful prompt engineering technique

Don't Be Impressed by Fluency
AI can reproduce smart arguments on demand. I'm not sure that's different from thinking. But the uncertainty itself is worth sitting with.

The $1 Billion Bet Against LLMs
One of the architects of modern deep learning just raised $1B on the thesis that token prediction can't reach real reasoning. Here's what he's proposing instead — and why it matters even if he's wrong.

The First Datapoint
An AI agent ran unsupervised for two days and found twenty improvements to another model's training. Not an AGI claim. A rate claim.

Language Is the Medium, Not the Purpose
We called them language models and spent years confused about why they could reason. The name stuck to the interface, not the mechanism.

LLMs Are Better at Editing Than Writing
Ask an AI to write from scratch and you get the average of the corpus. Give it something rough and it amplifies what's already there. The workflow implications are significant.

The Silent Stall: Debugging GPT-5.4-Pro's Responses API
Three hours of debugging revealed two non-obvious behaviours about GPT-5.4-Pro that aren't in the docs: a minimum token budget requirement and a wall-clock timeout gap in Rust async code.

What it actually takes to run an AI agent in a bank
The resistance to AI agents in banking isn't mostly cultural. It's infrastructure — and the gap is more interesting than the politics.

AI Evals: Why Teams Build Metrics Before They've Read a Trace
Most teams build evaluators before reading a single trace. The sequence that actually works is the opposite: observe, categorise, then measure.