Document as Code in the Age of AI - Part 3

Chris McNeilly
2 days ago
8 min read

Part 3: From Theory to Results

What happened when one team put Document as Code into practice

Setting the Stage

Last year I joined a consumer health and wellness startup as part of their AI transformation — about twenty people, moving fast, shipping constantly. The kind of product where getting the user experience right is existential, not aspirational.

Our setup was typical for the stage. PRDs in Confluence. Design mocks in Figma, linked in tickets but never exported or versioned. Code in GitHub. The engineers were already using AI coding tools — Claude, Cursor — and getting real value from them.

But the familiar drift had set in. PRDs described features as they were imagined, not as they were built. Mocks showed screens that didn't quite match what shipped. And the AI coding assistants, for all their capability, were working without the full picture — generating code that was technically sound but sometimes aimed at the wrong target.

Then runway got tight, and the CEO issued a mandate: the entire company needed to embrace AI and become 300% more productive. Not a suggestion. A survival requirement.

That kind of pressure forces clarity. We looked at our AI tools and realized the bottleneck wasn't the models — it was the context. Copilot could autocomplete code, but it couldn't read the PRD in Confluence. Claude could generate entire features, but it couldn't see the Figma mocks. Powerful tools, context-starved.

The decision wasn't complicated: bring everything into the repo. PRDs and briefs first, as markdown files versioned with the code. Then exported design mocks. Give the AI — and the humans — the full picture.

Not everyone was enthusiastic. Changing workflows that feel comfortable never goes over easily, and some people left rather than adapt. But the leadership messaging stayed consistent: this is how we work now, and here's why. The "why" was clear enough — survival — that we aligned quickly.

What Actually Happened

The first thing we noticed was mundane but significant: PRDs stayed accurate.

When a brief lives in Confluence, updating it feels like a separate chore — something you'll get to later, after the sprint, when things calm down. Things never calm down. When the brief lives in the same branch as the feature code, updating it stops being a separate task. The PR that ships the feature includes the updated spec. Drift gets caught in review instead of compounding quietly until someone notices three months later.

AI-drafted specs improved in a way that surprised even the people who'd pushed for the change. Product managers started using AI to generate first drafts of briefs and PRDs, and because the AI could see the actual codebase, those drafts came back referencing real APIs, actual data models, current constraints. Not aspirational documents that needed heavy revision once engineering pointed out what was actually feasible — grounded specs from the start.

Ticket creation changed just as much. We used MCP to connect AI to Jira, and with the co-located PRD as the source of truth, AI-generated tickets came back with full descriptions — not just subject-line summaries that left engineers guessing at scope. Acceptance criteria were specific. Epics were broken out logically, with dependencies identified as sequential or parallel, which meant the team could coordinate work across multiple engineers without the usual "wait, I thought you were doing that part" confusion. It's a small thing that compounds: when every ticket is complete and organized from the start, sprint planning stops being a translation exercise and starts being an actual planning exercise.

Pull request reviews changed character. They'd always been about "does this code work?" Now reviewers could also ask "does this code match what we said we'd build?" and answer that question without switching to Confluence or hunting down the latest version of a Google Doc. The context was right there in the PR.

On the code side, the shift was measurable. We used an independent code tracking solution to monitor output: release velocity improved by 300%. Every engineer on the team landed in the top 10% of that vendor's leaderboard. Not because they were grinding harder — because they were building with better context. Fewer "we built the wrong thing" bugs made it to QA. The bugs that did surface were real edge cases, the kind you expect in software development, not misalignment errors that shouldn't have made it past a first review.

Test generation improved because the AI understood what features were supposed to do, not just what the function signatures looked like. Coverage went up without proportional effort.

The design side took longer to click. We integrated Figma through MCP, so AI could reference the designs — but we didn't get the mocks co-located in the repo alongside the code. It was halfway there. AI could see what the screens were supposed to look like, which was a real improvement over verbal descriptions, and frontend implementation quality went up as a result. But without the designs versioned in the branch, we didn't get the full feedback loop on the visual side — no automated drift detection between what was designed and what was built. The "that's not what the Figma shows" conversations in QA declined, but they didn't disappear.

The thing we didn't anticipate was what happened once upstream team members got comfortable in the repo. Designers who'd been exporting mocks started using AI to implement straightforward design changes directly — spacing fixes, color updates, layout tweaks that would have previously sat in a ticket waiting for an engineer to pick them up. Product managers started spinning up v1 implementations of simpler features, using AI to translate their own specs into working code. Not production-grade, not replacing engineers — but functional enough to validate an idea or unblock a conversation. The traditional handoff points between product, design, and engineering didn't disappear, but they got blurrier in a good way. People who understood the what and the why could now contribute to the how, because the repo had become a shared workspace and AI had lowered the barrier to contribution.

What We Learned

Using AI to lower the barrier for non-engineers worked better than anyone expected. Product managers didn't need to learn Git or markdown in depth — we used AI assistants to handle the mechanics and focused on content. Two years ago, this approach would have required serious training. Now the onramp is gentle enough that adoption happened in weeks.

The cultural piece was harder than expected — which, if you read Part 2, shouldn't be surprising. You can't announce this kind of shift once and expect it to stick. Our leadership had to keep reinforcing why it mattered, repeatedly, until the new workflow stopped feeling new and started feeling like how things are done. The initial enthusiasm fades fast if nobody sustains it.

If we could do it over, we'd invest more in bringing along the people who were resistant. Some left, and at the time it felt like an inevitable cost of a hard pivot. In hindsight, I'm not sure it had to be. The ones who stayed adapted faster than anyone predicted — which makes me think the ones who left might have too, given more time and a longer runway for the transition. Losing experienced people is expensive in ways that don't show up in velocity metrics, and urgency isn't always a good reason to let that happen without a fight.

What About MCP?

I get this question every time I talk about co-location. Why move anything? Why not just use MCP servers to connect your AI tools directly to Confluence, Figma, Jira — let the AI pull context from wherever documents already live? Everyone stays in their comfort zone. No cultural change required.

It's a reasonable instinct, and MCP is genuinely useful. I'm not arguing against it. If your AI coding assistant can reach into Confluence and read the PRD, that's better than it having no context at all. Use MCP. Set up those connections.

But it doesn't get you what co-location gets you, and the gap matters.

The most obvious issue is versioning. When you branch a feature in Git, everything in that branch goes with it — the code, the tests, and if you've co-located, the brief and the designs. Two engineers working on competing approaches to the same feature can each have a version of the PRD that reflects their direction. When you merge, conflicts surface. History is preserved. You can look back six months later and understand exactly what the spec said when that code was written.

Confluence doesn't branch. Figma doesn't branch. MCP gives your AI access to the current version of a document — which might have been updated yesterday for a completely different initiative. There's no guarantee the spec your AI reads today is the one that was relevant when the feature branch was created last week.

Then there's the review workflow. One of the biggest wins we experienced was pull requests that included the brief alongside the code. Reviewers could see intent and implementation together and ask: does this match? When the PRD lives in Confluence, even with MCP making it accessible to AI, the human reviewer still has to context-switch. Open the PR in one tab, find the right Confluence page in another, hope they're looking at the right version. The integration is mechanical, not structural.

And the feedback loop — the thing this entire series is about — doesn't close. MCP is read access. It lets AI consume documents from external tools. But the virtuous cycle requires docs and code evolving together, in the same commits, through the same review process. When an engineer discovers a constraint that changes the spec, co-location means the PRD updates in the same PR. With MCP, someone has to remember to go update the Confluence page separately. We already know how that goes.

The "let people stay comfortable" argument sounds compassionate, but it preserves the exact structural problem that causes misalignment in the first place. The comfort is in the tool. The cost is in the disconnection. MCP bridges that gap for AI — and that's valuable — but it doesn't bridge it for the humans, the processes, or the version history.

Use MCP for what it's good at: pulling in broad organizational context, referencing architecture docs that don't change often, accessing information that genuinely doesn't belong in a feature repo. But for the artifacts that are tightly coupled to active development — the brief, the spec, the designs — co-location is the better answer. Not because MCP doesn't work, but because it solves a narrower problem than people think it does.

Results

We tracked our output using an independent engineering analytics platform — one that connects to GitHub, Jira, Confluence, Slack and other development tools and measures work automatically. This matters because the 300% number isn't self-reported, and it isn't based on lines of code or raw commit counts. The platform scores pull request complexity on a 0–10 scale, analyzes the substance of engineering output over time, and benchmarks teams against industry peers. It distinguishes between a trivial config change and a meaningful architectural contribution, which means the velocity metric reflects actual work, not activity.

By that measure, release velocity tripled. Every engineer on the team landed in the top 10% of the platform's industry benchmark — not one standout performer dragging up the average, but a cohort-wide lift. The platform's delivery quality metrics told a consistent story: rework ratio dropped, bug density declined, and the ratio of new value to fix-and-maintain work shifted meaningfully toward new value.

PRD creation time was cut roughly in half. Code review cycles shortened because reviewers stopped spending time on "what is this supposed to do?" — the answer was in the PR.

The CEO's 300% mandate — issued as a survival requirement — was met. Not through longer hours or unsustainable effort, but through a structural change that made everyone's work compound instead of drift.

Closing the Series

Our experience is the proof point for what we've been exploring across all three posts. AI turns Document as Code from a good idea into a great one.

The feedback loop is real. When docs, designs, and code live together, they improve each other — and AI accelerates that cycle in ways that weren't possible even recently.

The path is accessible. You don't need new tools or a massive migration project. You need a clear reason, leadership that will stick with it past the initial enthusiasm, and a willingness to let AI handle the parts that used to make this approach impractical.

Start with one feature. Prove the value. Expand from there.

This is Part 3 of a three-part series on Document as Code in the Age of AI. Part 1: Why Co-Location Changes Everything covers the strategic case. Part 2: Making It Real provides the practical implementation guide.