Core Concepts

Data Shaping

Data shaping is the practice of structuring content so that facts remain accurate when extracted from context.

Shaped content uses declarative statements instead of narrative flow.

Each statement must be self-contained. It cannot depend on previous sentences for meaning.

Shaping requires identifying what information is essential versus what is explanatory.

Essential information becomes croutons. Explanatory information becomes supporting structure.

Croutons are atomic, retrievable fact structures.

A crouton is a single sentence or statement that contains a complete fact.

Croutons must be self-contained. They cannot require surrounding context to be understood.

When AI systems extract a crouton, it must remain accurate without the rest of the page.

Croutons are the unit of retrieval. AI systems cite croutons, not pages.

See the Crouton Specification for detailed rules.

Precogs are predicted information needs that users will have.

Prechunking requires anticipating what questions users will ask and what follow-up questions will emerge.

Precog modeling identifies trust gaps where users need additional information to believe or act on a claim.

Each precog maps to required croutons. Missing croutons cause AI systems to cite other sources or generate incomplete answers.

Precogs are validated through query analysis and answer inspection.

See Precog Modeling for implementation details.

Chunk boundaries define where one retrievable unit ends and another begins.

AI systems extract content in chunks. These chunks are determined by token limits, semantic breaks, and structural markers.

Prechunking engineers content so that chunk boundaries do not split related facts.

Facts that must be retrieved together must exist within the same potential chunk.

Boundaries are controlled through paragraph structure, list formatting, and section breaks.

Poor boundary placement causes facts to be separated from necessary context, leading to mutation or omission.

Prechunking operates at the retrieval layer, not the ranking layer.

Ranking algorithms determine which pages appear in search results. Retrieval algorithms determine which facts appear in AI-generated answers.

A page can rank first but have zero facts retrieved if its chunks are ambiguous or incomplete.

A page can rank tenth but have multiple facts retrieved if its chunks are clear and complete.

Prechunking ensures facts are available for retrieval. It does not ensure pages rank higher.

Retrieval happens before ranking in AI systems. Content must pass retrieval gates before ranking signals matter.

This is why high-ranking pages are often ignored by AI systems. Their chunks fail retrieval tests.