PDF
DOCX
Confluence
SharePoint
Slack
Google Docs

The Knowledge Substrate

What your AI stands on determines what it can do.

Scroll down

The self-driving knowledge base

Background agents work across the entire SDLC. But your knowledge isn't ready for them.

Software delivery was designed around constraints of humans at keyboards. Now, agents run autonomously across thousands of repos in the background. This trend is already in motion at companies like Stripe, Ramp, and Spotify.

But agents don't just consume code. They consume knowledge — policies, processes, institutional wisdom. And that knowledge is scattered across a dozen tools in formats AI can't read. The bottleneck has moved. It's no longer the model, the prompt, or the pipe. It's the substrate underneath.

The empirical proof

In February 2026, Cursor ran an experiment.

Hundreds of AI agents. Seven continuous days. Ten million tool calls. One thousand commits per hour. They built a web browser.

The most important finding wasn't about the agents. It was about what they stood on.

When the codebase was a monolith — messy, tangled, poorly structured — agent throughput collapsed. Agents wandered. Output quality degraded.

When the codebase was restructured — modular, clean interfaces, well-specified boundaries — throughput multiplied.

The agents didn't change. The models didn't change. The substrate changed.

Monolith
Fragmented. Bottlenecked.
Modular
Clean flow. Linear scaling.

— Cursor, "Towards Self-Driving Codebases," February 2026

The bridge

Cursor proved this for code. The same physics applies to knowledge.

Your policies. Your processes. Your institutional wisdom — scattered across a dozen tools in formats AI can't read.

Code substrate

Knowledge substrate

Monolith codebase
→ Agents wander, output degrades
Fragmented wikis + file shares
→ AI produces contradictory answers
Modular architecture
→ Throughput multiplies
Unified, structured knowledge base
→ AI answers become authoritative
Well-specified interfaces
→ Agents know their boundaries
Clear ownership + version control
→ AI knows what's current
Native format (source code)
→ Zero conversion overhead
Native format (Markdown)
→ Zero format tax

The intent gap

Context tells agents what to know. Intent tells agents what to want.

Nate B. Jones's framework for the evolution of AI input identifies four disciplines that have diverged from what we used to call "prompting": prompt craft, context engineering, intent engineering, and specification engineering. Each operates at a different altitude. Each requires the one below it.

The critical insight: context engineering tells agents what to know. Intent engineering tells agents what to want. You can have perfect context and terrible intent alignment. Klarna proved it — their AI agent resolved 2.3 million conversations in the first month, slashed resolution times, and projected $40 million in savings. Then customer satisfaction cratered because the agent was optimizing for speed when the organizational intent was relationship quality.

The Klarna trap is universal. Every organization's actual intent — the tradeoff hierarchies, decision boundaries, and values that experienced employees carry intuitively — is invisible to AI. It travels through what Jones describes as tacit knowledge — what a recent analysis of AI's impact on organizational systems calls the invisible operating system: the vast substrate of assumptions, social contracts, and unstated values that human participants process automatically.

AI doesn't have this operating system. It processes exactly what it's given. And right now, there is no infrastructure designed to make organizational intent visible, structured, and accessible to AI agents. The knowledge management tools organizations use today — Confluence, SharePoint, Notion, Google Docs — were built for humans to read. Not for AI to act on.

The discipline of making organizational intent machine-readable demands a knowledge substrate that doesn't exist yet.

— Jones, "Intent Engineering" and "The Specification Is the Prompt Now," 2026; "The Invisible Operating System," February 2026

The format tax

Every query pays a conversion toll

Every time an AI agent queries knowledge stored in Word documents, PDFs, or HTML-based wikis, the content must first be converted. That conversion is expensive, lossy, and compounding.

Query
Convert
Ingest
Retrieve
Answer
Tables break ↓
Context lost ↓
DOCX, PDF, HTML input Native Markdown: zero tax →

The cost

95%

of enterprise AI pilots fail to achieve rapid revenue acceleration.

MIT NANDA "GenAI Divide" Report, 2025

3.3%

Copilot adoption after two years and $60M in TV advertising.

450 million seats. 15 million takers.

Recon Analytics, January 2026

3.6 hrs

per employee per day searching for information.

At 1,000 employees, that's $5–10M/year in lost productivity.

Bloomfire / Pryon

64%

of enterprises cite integration complexity as the top AI adoption challenge.

Capgemini World Quality Report, 2025

40–70%

estimated token cost savings when AI queries native Markdown versus converted documents.

The format tax is real. And it compounds.

Why it fails

The three architectural failures

Every enterprise has the same idea: "Just connect AI to what we already have." It doesn't work. Here's why.

The false summit
"Just bolt AI on top"
Copilot: 3.3% adoption
Substrate restructuring
Single source of truth
01

Authority reconciliation is impossible

When the same question returns different answers from Salesforce, Confluence, and a shared drive PDF — which one is right? MCP can connect to all three. It cannot tell you which to trust. Your "Q3 revenue" means something different in every tool that stores it. Not through a pipe. Only through a substrate with a single source of truth.

02

Token economics compound catastrophically

Every additional knowledge source connected via MCP multiplies the context window load. Five tools times tool discovery overhead times cross-referencing equals burning tokens before retrieving a single useful fact. At enterprise scale, this isn't a rounding error — it's the majority of your AI spend.

03

The format tax is real

Every AI query against a Word document, PDF, or HTML-based wiki requires conversion. That conversion is lossy. Tables break. Context is lost. Hallucinations increase.

Native Markdown## Q3 Results Revenue: $4.2M Growth: 18% QoQ | Region | Rev | | EMEA | $1.8M |
Converted from DOCXQ 3 R esu lt s Rev enue : $4 .2 Gro wth : ??% [TABLE PARSE ERROR] R e g i o n...

MCP is a pipe. You still need clean, structured, authoritative knowledge before any pipe is useful.

The solution frame

The properties of a healthy knowledge substrate

A substrate that AI can operate on effectively has four structural properties.

Versioned

Every change tracked, attributable, reversible. Not "version history." True version control — branching, merging, blame, diff. You should know who changed what, when, and why. AI should too.

Authoritative

One source of truth. Not five wikis with overlapping, contradictory content. Clear ownership. Clear canonical status. When AI retrieves an answer, there's no ambiguity about whether it's current.

Native format

AI-readable without conversion. No format tax. No lossy extraction. Content that was born structured stays structured.

Accessible

A substrate only AI can read is a database. A substrate only humans can read is a wiki. A healthy substrate serves both — humans through an interface designed for them, AI through protocols designed for it.

The gap

Where current tools fall short

AI-readiness of knowledge format →
Non-technical accessibility →
The gap.
No one is here yet.
Notion
Proprietary blocks. No Git backing. Limited compliance.
Confluence
Proprietary format. Weak versioning. Not AI-native.
SharePoint
Poor UX. Proprietary format. Copilot failing.
GitBook
Developer docs only. No compliance. Not enterprise KM.
PolicyTech
No versioning. No AI-readiness. Closed format.
Glean
AI layer. Doesn't change the substrate underneath.

The invisible cost

Why the substrate doesn't exist yet

Organizations have operated on fragmented knowledge for decades. Humans compensated.

They inferred context from hallway conversations. They filled gaps with tribal knowledge. They navigated ambiguity through social cues and institutional memory.

This worked because humans carry an invisible operating system — a vast substrate of tacit assumptions, social contracts, and unstated values that they process automatically.

AI doesn't have this operating system. It processes exactly what it's given.

But why was the substrate never built?

Not because organizations were lazy. Because knowing the cost exists and feeling it when you're making the choice are two different things.

Forty-five minutes to update the documentation feels expensive. Five minutes correcting someone who followed the old process feels free.

But five minutes, repeated across thirty people over twelve months, is an order of magnitude more than the forty-five minutes would have cost.

Nobody opens that account.

0 hidden cost fragments

Concentrated costs feel larger than distributed costs, even when the distributed costs are vastly greater. Each small fragment registers as negligible. The visible cost — the sprint commitment, the feature delay — drowns out the invisible cost that appears in no tracking system.

Each decision to skip documentation gets its own mental account. Nobody aggregates them into a quarterly total.

Each generation of employees accepts the current state as baseline. The degradation is normalized. New hires have no reference for what "good" looked like.

And the people who bear the cost of today's undocumented knowledge — future employees, future teams, future AI systems — are psychologically distant. Research shows the brain treats them as strangers.

This is why the substrate doesn't exist. Not laziness. Not ignorance. A systematic failure of perception that compounds across every organization, every year, invisibly.

The invisible operating system stayed invisible because the cost of leaving it that way was itself invisible.

The organizations that thrive in the AI era will be the ones that make as much of this invisible substrate as possible explicit, structured, and accessible.

This is the explication imperative.

Not because it's a best practice. Because AI has made the invisible cost visible — and the cost turns out to be catastrophic.

AI doesn't just need the substrate. AI may be the first tool capable of showing us what the substrate's absence has been costing all along.

The comprehensive account that no human brain could maintain — the aggregate of every outdated document, every undocumented process, every lost institutional memory — can now be opened.

The question is whether we open it into infrastructure designed for it, or bolt the revelation onto the same fragmented tools that created the problem.

The knowledge substrate is the infrastructure problem of the AI era.

We're still building this argument. If you see what we see — or if you think we're wrong — we'd like to hear from you.

Read the thesis The research behind the argument The self-driving codebase Intent engineering

Further reading

[7]
Intent EngineeringNate B. Jones