The Template Is the Schema

12 Apr 2026 · 5 min read ·

I shipped seven PyPI releases of a CV generation tool in one afternoon. Each release unblocked one piece of my own CV landing correctly in the Capco global template. By the end, I had learned something I should have known sooner: in template-guided synthesis, the template is the schema. You cannot invent formatting the source file does not authorise. You can only preserve what is already there.

My tool recombinase populates a PowerPoint CV template from a YAML record. You write one YAML file per consultant, the tool walks the template’s shapes, overwrites the example text, saves the output. Simple in theory. Brutal in practice — because every version fell over on a different piece of the template’s visual structure.

The first fix was a font flattening bug. The template had a 7pt font on certain cells, but my writer was silently flattening run-level formatting to defaults, so every populated cell rendered at Word’s default 18pt. The next was a missing sections primitive for the Key Competencies cell, which has four subheadings each with a variable-length bullet list — I had been rendering it as a flat twelve-item list with the wrong visual structure. Then came the dual-run-br idiom, which surprised me most. A PowerPoint cell containing bold role text above italic years text can be authored in two completely different ways at the XML level. Press Enter and you get two separate paragraphs. Press Shift-Enter and you get one paragraph with two runs separated by a line break element, each carrying its own formatting. Visually identical. Structurally unrelated. My writer only handled the first case.

Then a default rename, then extending the dual-run handling to list shapes, then a diagnostic tool that reports the internal structure of every cell in a table so I could see why certain cells were not firing, and finally the hardest one: segmenting runs by line break boundary instead of walking them flat. The template had cells with three runs and one break because someone had edited the cell and inserted a space with slightly different formatting, splitting one bold run into two adjacent bold runs. My writer walked runs flat and assigned the wrong content to the wrong formatting. The fix was to group runs into segments separated by breaks, then assign one part per segment.

Seven releases. Each driven by a specific piece of visual structure the template already contained but my tool did not know how to preserve.

After the seventh release I sat back and looked at the pattern. Every release had the same shape: template has feature X, tool does not preserve X, user hits it, I ship a fix that adds tolerance for X. At no point did I ever invent formatting. The tool only ever preserved what was already in the XML of the source cell. The pptx file is a live prototype of the output deck. It carries the visual schema — shape positions, table dimensions, picture crops — and the formatting schema — fonts, weights, italics, bullets, indents, soft breaks, run structure. My tool does not generate from scratch. It clones the template slide, walks the cloned XML, and overwrites text content in place. Every visual property that ends up in the output must already exist somewhere in the source template.

This is backwards from how most templating tools work. Jinja into HTML, Handlebars into Markdown, string format into text — all of those take a plain skeleton and synthesise the output. The programmer controls formatting from the outside. PowerPoint templates cannot be controlled from the outside. If your template cell has a single run with no italic markup, no amount of cleverness in your code will render italic text there. You have to go into PowerPoint, add an italic run, save the template, and then your tool can preserve it. The fix is in the template file, not the code.

The wrong response to “my tool does not handle this template variation” is to add infinite tolerance to the code. You end up with a writer that handles every possible authoring mode and still produces mediocre output on edge cases. The right response is a two-sided contract. The writer gets tolerant to reasonable authoring variation — segment runs by soft break, handle two-run and three-run cells identically, clone paragraph prototypes for list items. But the template must still be disciplined. If the template’s cells are authored inconsistently — one with Shift-Enter, one with hard Enter, one with three adjacent bold runs, one with five paragraphs of layout-level styling — no writer can infer a consistent intent. The template is the intent. Writer tolerance reduces the need for template discipline, but it does not eliminate it. Template discipline reduces the need for writer tolerance, but it does not eliminate that either. They meet in the middle.

Every template-driven tool needs a diagnostic primitive. It is not a feature, it is the floor. If you cannot see the structure of the template, you cannot reason about why your tool is or is not preserving it. My inspect command reports per-cell paragraph counts, run counts, break counts, and flags cells matching the multi-run-br idiom. Before that tool existed I was debugging blind — regenerate, open PowerPoint, squint, guess. With it, the fix was obvious within thirty seconds.

The tool is recombinase on PyPI. Not a product — just an author who got tired of manually editing consultant CVs and built a template-guided pptx synthesiser that eventually learned to preserve bold-italic-soft-break cells without losing anyone’s font size.