The Template Is the Schema • Terry Li

I shipped seven PyPI releases of a CV generation tool in one afternoon. Each release unblocked one piece of my own CV landing correctly in the Capco global template. By the end, I’d learned something I should have known sooner: in template-guided synthesis, the template is the schema. You can’t invent formatting the source file doesn’t authorise. You can only preserve what’s already there.

Here’s the loop. I have a PowerPoint CV template Capco expects all its consultants to use. It has named shapes, a table of recent projects, a “Key Competencies” cell with four subheadings, and a background list with a very specific visual idiom: bold role + soft-break + italic years. Like this:

AGM & Head of Data Science, China CITIC Bank International (3 years, 2022–2026)

My tool recombinase populates this template from a YAML record. You write one YAML file per consultant, the tool walks the template’s shapes, overwrites the example text, saves the output. Simple in theory. Brutal in practice — because every version of the tool fell over on a different piece of the template’s visual structure.

Version by version

v0.1.13 — fixed the Role column font. The template had a 7pt font on certain cells, but my writer was silently flattening run-level rPr to defaults, so every populated cell rendered at Word’s default 18pt. I’d hit this same bug on shape placeholders in v0.1.10 and forgot to apply the fix to table cells.

v0.1.14 — added sections: config for the Key Competencies cell. Four subheadings (FS Industry, Functional, Technical, Methodical), each with a variable-length list of bullets. Previously I’d been rendering it as a flat 12-item list. Wrong visual structure. I needed a new primitive: capture the header profile from paragraph 0, the bullet profile from paragraph 1, then emit alternating profiles per section.

v0.1.15 — preserved the dual-run-br idiom. This one surprised me. A PowerPoint cell containing “Bold Role\n(Italic Years)” can be authored in two completely different ways at the XML level. If you press Enter, you get two separate <a:p> paragraphs. If you press Shift+Enter, you get ONE paragraph with <a:r> + <a:br/> + <a:r>, each run carrying its own <a:rPr>. Visually identical. Structurally unrelated. My writer only handled the first case.

v0.1.16 — renamed cv-data/ to data/. Pre-1.0 default rename with no backward compat. The “cv-” prefix was redundant once the parent directory (capco-cv/) carried the semantic, and data/ generalised to non-CV packs.

v0.1.17 — extended the dual-run-br handling to list shapes. The Background placeholder has three bullets, each itself a multi-run-br paragraph. I had to capture the template’s first paragraph as a prototype and clone it per item, overwriting <a:t> text in each run while preserving <a:rPr> and <a:br/>.

v0.1.18 — made recombinase inspect drill into table cells, reporting per-cell paras=N runs=M brs=K and starring cells matching the multi-run-br idiom. This was a diagnostic tool. I needed it to see WHY certain cells weren’t firing.

v0.1.19 — segmented runs by <a:br/> boundary instead of walking them flat. This was the trickiest one. Terry’s template had cells with runs=3 brs=1 — three runs before-and-after a single soft break. The extra run came from an editing artifact: PowerPoint had split the bold primary line across two adjacent bold runs because someone had edited the cell and inserted a space with slightly different formatting. My v0.1.15 writer walked runs flat and assigned [primary, secondary, ""], which pushed “secondary” into the second bold run and cleared the italic. Wrong output. The fix was to group runs into segments separated by <a:br/>, then assign one part per segment.

Seven releases. Each driven by a specific piece of visual structure Terry’s template already contained but my tool didn’t know how to preserve.

The reframe

After v0.1.19, I sat back and looked at the pattern. Every release had the same shape: template has feature X, tool doesn’t preserve X, user hits it, I ship a fix that adds tolerance for X. At no point did I ever invent formatting. The tool only ever preserved what was already in the XML of the source cell.

That’s when the phrase crystallised: the template is the schema. The pptx file is a live prototype of the output deck. It carries the visual schema (shape positions, table dimensions, picture crops) AND the formatting schema (fonts, weights, italics, bullets, indents, soft breaks, run structure). My tool doesn’t generate from scratch. It clones the template slide, walks the cloned XML, and overwrites text content in-place. Every visual property that ends up in the output must already exist somewhere in the source template.

This is backwards from how most templating tools work. Jinja-into-HTML, handlebars-into-Markdown, string-format-into-text — all of those take a plain skeleton and synthesise the output. The programmer controls formatting from the outside.

PowerPoint templates can’t be controlled from the outside. If your template cell has a single <a:r> run with no italic markup, no amount of cleverness in your code will render italic text there. You have to go into PowerPoint, add an italic run, save the template, and THEN your tool can preserve it. The fix is in the template file, not the code.

The corollary: push tolerance into the writer AND discipline into the template

Here’s where it gets subtle. The wrong response to “my tool doesn’t handle this template variation” is to add infinite tolerance to the code. You end up with a writer that handles every possible authoring mode — hard break, soft break, inline split runs, mixed pPr at different levels, autofit shrinking — and still produces mediocre output on edge cases.

The right response is a two-sided contract. The writer gets tolerant to reasonable authoring variation (segment runs by soft break, handle 2-run and 3-run cells identically, clone paragraph prototypes for list items). But the template must still be disciplined. If the template’s column-0 cells in the recent-projects table are authored inconsistently — one with Shift+Enter, one with hard Enter, one with three adjacent bold runs, one with five paragraphs of layout-level styling — no writer can infer a consistent intent. The template IS the intent.

The fix for Terry’s CV session was actually in BOTH places. I shipped v0.1.19 to handle the 3-run case correctly. Terry opened PowerPoint and rebuilt the column-0 cells to use consistent Shift+Enter + bold + italic structure. Both were necessary. Writer tolerance reduces the need for template discipline, but it doesn’t eliminate it. Template discipline reduces the need for writer tolerance, but it doesn’t eliminate that either. They meet in the middle.

The diagnostic primitive

The thing that unblocked the final few iterations wasn’t a writer fix. It was a diagnostic tool. v0.1.18’s per-cell inspect output lets you see the XML structure of every cell in a table — paras=N runs=M brs=K. When Terry ran it on his template and pasted the output:

(1,0) paras=1 runs=2 brs=1 chars=65 ★
(2,0) paras=1 runs=2 brs=1 chars=84 ★
(3,0) paras=1 runs=3 brs=1 chars=82 ★
(4,0) paras=1 runs=2 brs=1 chars=56 ★
(5,0) paras=1 runs=3 brs=2 chars=64 ★

I could see immediately which cells would render correctly (rows 1, 2, 4) and which wouldn’t (rows 3, 5). The diagnostic exposed the structural variation that was invisible in the rendered output. Without it, I was debugging blind — regenerate, open PowerPoint, squint, guess. With it, the fix was obvious within thirty seconds.

Every template-driven tool needs this kind of diagnostic primitive. It’s not a feature, it’s the floor. If you can’t see the structure of the template, you can’t reason about why your tool is or isn’t preserving it.

Takeaway

Template-guided synthesis is a small, sharp design discipline. It’s tempting to fight the metaphor — to add knobs and config and “inject” operations until the tool feels like a generator. That path produces fragile tools that break on every new template.

The opposite discipline works better: trust the template, walk its structure, overwrite text content in-place, and surface diagnostics so the template’s structure is visible to the human doing the authoring. The template is the schema. The tool’s job is to preserve it, not replace it.

Seven PyPI releases in one afternoon is what it takes when you’re learning this in real time. The next template I hit, I’ll know to inspect the XML first, fix the template second, and only touch the tool code if the template’s already disciplined and something genuinely structural needs new primitives.

The tool: recombinase on PyPI. Public, MIT, personal infrastructure. Not a product — just an author who got tired of manually editing consultant CVs and built a template-guided pptx synthesiser that eventually learned to preserve bold+italic+soft-break cells without losing anyone’s font size to the default.