Prechunking Workflow

Intent Decomposition

Intent decomposition breaks user information needs into discrete questions.

Start with primary queries. Identify what users are asking and why they are asking it.

Decompose primary queries into follow-up questions. What information will users need next?

Identify trust gaps. What information is required for users to believe claims?

Map each decomposed intent to required information. This becomes the crouton inventory.

Decomposition is validated through query data, answer inspection, and user research.

Crouton Inventory

Crouton inventory is the list of atomic facts required to answer decomposed intents.

Each intent maps to specific croutons. Missing croutons cause incomplete or incorrect answers.

Inventory is created by mapping decomposed intents to required facts.

Each crouton must meet the crouton specification: atomic, self-contained, explicit.

Inventory is organized by intent and dependency. Related croutons are grouped for chunk boundary planning.

Inventory gaps are identified by comparing required croutons to existing content.

Data Shaping

Data shaping transforms narrative content into declarative croutons.

Existing content is audited for crouton compliance. Non-compliant statements are identified and refactored.

New content is written as croutons from the start, not as narrative later shaped.

Shaping requires removing narrative connectors, replacing pronouns with explicit nouns, and splitting compound facts.

Shaped content uses declarative statements. Each statement is a complete, standalone fact.

Shaping is validated through crouton specification compliance checks.

Structured Publishing

Structured publishing organizes croutons into pages while preserving chunk boundaries.

Related croutons are grouped within potential chunk boundaries. Dependencies are kept together.

Chunk boundaries are controlled through paragraph structure, list formatting, and section breaks.

Structured data reinforces croutons where possible. Schema markup provides additional retrieval signals.

Publishing validates that croutons can be extracted accurately without surrounding context.

Structured publishing ensures pages function as containers while chunks function as retrievable assets.

Retrieval Validation

Retrieval validation tests whether croutons are actually retrieved by AI systems.

Validation methods include:

  • AI answer inspection to confirm croutons appear in generated responses
  • Citation tracking to verify croutons are attributed correctly
  • Extraction testing to ensure facts remain accurate when isolated
  • Chunk boundary testing to confirm related facts are retrieved together
  • Competitive comparison to identify missing or inferior croutons

Failed validation requires revision of croutons, chunk boundaries, or data shaping.

Validation is ongoing. AI systems evolve, requiring continuous monitoring and adjustment.

Workflow Iteration

The prechunking workflow is iterative, not linear.

Intent decomposition reveals new information needs. Precog modeling identifies gaps. Crouton inventory expands.

Retrieval validation reveals failures. Failed retrievals require returning to data shaping or intent decomposition.

Content audits discover non-compliant statements. These require refactoring back through data shaping.

Iteration continues until validation passes and retrieval goals are met.

Workflow documentation captures patterns and decisions for consistency across content systems.