Case study
Designing documentation for retrieval, not just reading
As LLMs became increasingly integrated into support and product workflows, I noticed a recurring pattern of technically correct documentation still producing unreliable AI answers.
I led an initiative to rethink documentation structure through the lens of retrieval and chunking, creating a "chunkability" framework that evaluated whether content could survive fragmented retrieval while still remaining understandable, trustworthy, and actionable in isolation.
Impact
Made documentation survive fragmented retrieval
Reduced major AI inaccuracies by up to 83%
After restructuring documentation using the chunkability framework, several test sets saw major inaccuracies drop from 6/10 responses to as low as 0-1/10 across multiple LLMs.
Improved complete-answer rates from 0/10 to 7/10
In one enterprise connectivity test set, models that initially failed nearly every retrieval question became consistently reliable after structural documentation updates.
Created a reusable framework for AI-readable documentation
Built a 'chunkability' audit system that evaluated documentation against retrieval-focused criteria like identity persistence, table survivability, scope locality, and boundary clarity.
Reframed hallucinations as information architecture problems
The project demonstrated that many unreliable AI responses were caused less by missing information and more by documentation structure that failed under fragmented retrieval.
The problem
Documentation that worked for humans was failing under fragmented retrieval
As the company expanded AI-assisted support and retrieval workflows, documentation quality became increasingly difficult to evaluate. Pages that worked well for human readers often produced incomplete or misleading AI responses once ingested into retrieval systems.
Large comparison tables lost context when chunked. Generic headings like Selectors or Allow became meaningless in isolation. Important constraints were frequently separated from the actions they governed. Even when the correct information existed, models struggled to consistently retrieve and reconstruct it.
The challenge was compounded by the lack of shared standards for AI-readable documentation. Most existing content guidance focused on readability, completeness, or writing quality rather than retrieval survivability.
The solution
A chunkability framework for AI-readable documentation
To better understand why technically accurate documentation was producing unreliable AI answers, I researched retrieval-augmented generation systems, chunking strategies, and retrieval behavior in modern LLM pipelines. I also completed specialized coursework focused on RAG architecture, retrieval evaluation, and context management.
Using that research, I developed a chunkability evaluation framework designed to test whether documentation could remain understandable and trustworthy after retrieval fragmentation. Instead of evaluating content purely for readability, the framework evaluated how well information survived chunk isolation, missing surrounding context, truncated tables, and retrieval ordering issues.
I then built an AI-assisted auditing skill that analyzed existing documentation against retrieval-focused criteria and generated structured findings, risk summaries, severity ratings, and proposed rewrites tied directly to retrieval failure patterns.
To complement the auditing workflow, I developed a remediation skill that translated findings into retrieval-friendly structural revisions. This created a repeatable feedback loop where documentation could be audited, revised, retested against LLM prompts, and continuously improved based on measurable answer quality outcomes.
Identity persistence
Each retrievable section carries enough product, version, or surface context to identify itself in isolation.
Heading clarity
Headings describe the specific product, action, or outcome a user might search for.
Table survivability
Captions and row labels preserve meaning when tables are chunked without headers or surrounding text.
Scope locality
Prerequisites, warnings, and constraints stay near the action or decision they govern.
Procedural atomicity
Each step contains one action and one outcome, with branching logic clearly separated.
Query-term alignment
Documentation uses formal product terms alongside the language users are likely to search with.
Example comparison
Before
After
Metrics
Small structural changes produced large answer-quality gains
The work showed that AI answer quality could improve dramatically without rewriting the underlying technical content. In several test sets, restructuring documentation around chunkability reduced major inaccuracies from 6/10 responses to as low as 0-1/10 across multiple LLMs.
In one enterprise connectivity test set, complete-answer rates improved from 0/10 to 7/10 after structural documentation updates. The improvements came from making product identity, scope, constraints, and relationships more resilient when retrieved as fragments.
Reflection
AI-ready documentation is fundamentally an information architecture problem
Here are principles I would carry into any content system that needs to remain accurate when retrieved, fragmented, or read out of order:
01
AI-ready documentation is fundamentally an information architecture problem
The biggest lesson from this work was that retrieval quality depends heavily on structure, not just writing quality. Documentation now has to function both as a human reading experience and as a machine-readable knowledge system.
02
Retrieval failures are often predictable
Hallucinations consistently mapped back to structural weaknesses like anonymous headings, fragmented tables, missing scope, or disconnected constraints.
03
Small structural changes can dramatically improve AI reliability
Some of the highest-impact fixes were deceptively lightweight: adding product identity to section openers, rewriting generic headings, adding table captions, moving constraints earlier, and using more retrieval-oriented phrasing.