How should I structure web pages so AI search engines cite them?

Structure pages for AI citation with a TL;DR, answer-first H2 sections, dense entities, Schema.org JSON-LD, Related Guides, Key Takeaways, FAQ, and a disclosure.

Last updated: June 4, 2026 · By Jessen Gibbs, Founder, Shadow

TL;DR

A GEO-optimized page has seven blocks in order: TL;DR, intro, H2 sections each opening with a 40-60 word answer capsule, Related Guides, Key Takeaways, FAQ, and disclosure. Each block exists because it maps to a specific extraction behavior in at least one major AI engine. Enforcing the architecture structurally beats leaving it to author discipline.

AI engines do not read your page; they extract from it. A retrieval model scores passages independently, a generation model assembles them into an answer, and a citation step links back to whichever pages contributed the strongest passages. Page structure determines what gets extracted, and the difference between a page that is cited and a page that is invisible often comes down to whether the architecture matches what the extractor expects.

This guide documents the seven-block architecture that the auto-geo publishing engine enforces, in the order the blocks must appear on the page, with the field-level constraints that make each block extractable. The architecture is opinionated by design — every block exists because it maps to a specific extraction behavior in at least one major AI engine, and dropping any of them measurably lowers citation rate.

Why does a strict page architecture help AI citation?

A strict architecture helps because AI engine retrieval is passage-level and structure-aware. The retriever scores each H2 section independently, looks for an answer near the heading, and prefers passages that resolve the question without follow-up context. A predictable architecture also makes JSON-LD derivation reliable, which raises trust scoring. Without structure, the retriever guesses, and guessing is lossy.

The 2024 Princeton GEO paper found that visibility in generative engines depended heavily on passage-level features that map directly to page architecture: source citations, statistic-heavy phrasing, and quoted passages all raised visibility (Aggarwal et al., 2024). Those features only land reliably when the page is structured to make them prominent — they cannot be retrofitted into a wall of prose without changing the architecture.

What are the seven required blocks of a GEO page?

The seven required blocks are TL;DR, intro, H2 sections, Related Guides, Key Takeaways, FAQ, and disclosure. They appear in that order on every page. Each block has constraints — word counts, item counts, formats — that exist for a specific extraction reason. Omitting a block creates a gap that AI engines treat as a quality signal and downweights accordingly.

The seven blocks of a GEO-optimized page
BlockConstraintWhy it exists
TL;DR40-60 words, self-contained answerChatGPT and Perplexity often lift this verbatim when summarizing the page
Intro≥1 paragraph block, no length capEstablishes context and entities for the retriever
H2 sectionsQuestion-format heading + 40-60 word answer capsule eachEach section is an independent passage the retriever scores
Related Guides4-8 absolute URLsFeeds the engine's topical context and internal linking graph
Key Takeaways4-6 items, 10-35 words eachReusable as bullet points in synthesized answers
FAQ3-10 Q&A pairs, 40-60 word answersDrives FAQPage JSON-LD and direct-quote retrieval
Disclosure20-1000 char textSignals authorship transparency to trust-scoring layers

Tools like auto-geo reject publishes that omit any of these blocks at the validation boundary, which forces the architecture rather than treating it as a guideline. The same enforcement runs at the field level too — answer capsules outside the 40-60 word window are rejected, related-guide URLs that are not absolute are rejected, FAQ answers under 40 words are rejected, and so on.

What goes in the TL;DR and why does word count matter?

The TL;DR is a 40-60 word self-contained answer to the page title placed above the intro. The window is tight because AI engines lift it verbatim when summarizing the page, and a TL;DR under 40 words is too thin to stand alone while one over 60 gets truncated or rewritten. Treat it as the most important paragraph on the page.

The TL;DR is the highest-leverage block on the page. When a user asks an AI engine a question that maps to your page topic, the engine often quotes the TL;DR directly in its answer with a citation back to your domain. That makes the TL;DR both your top-of-funnel pitch to the AI engine and the literal text the end user will read. Write it last, after the page is done, when you know what the page actually says.

  • Self-contained — the TL;DR must answer the page title question without requiring the reader to read further.
  • Concrete — name entities, cite numbers if you have them, and avoid hedging language that makes the passage less extractable.
  • Not promotional — superlatives lower extractability and trip publish-time validators that enforce neutral phrasing.
  • Aligned with the title — the lexical overlap between the TL;DR and the title raises retrieval scoring on the page topic.

How should H2 sections and answer capsules be written?

Phrase every H2 as a question users actually ask, then open the section with a 40-60 word answer capsule that resolves the question without follow-up context. The capsule is the first paragraph of the section. Supporting paragraphs, lists, tables, and callouts come after. Each H2 section is an independent passage the retriever will score, so make each one stand alone.

The H2 plus answer capsule pattern is the central rhythm of a GEO page. The H2 mirrors the user query ("What is X?", "How does X work?", "Why does X matter?"), and the answer capsule provides the answer in a tight 40-60 word window. Supporting content — paragraphs that elaborate, lists that enumerate, tables that compare, callouts that highlight — follows the capsule but does not displace it from the top of the section.

The constraint to write at this length feels unfamiliar at first. Writers used to traditional long-form content often start with a hook paragraph and build to the answer; the GEO pattern inverts that. The discipline is worth it: pages built this way tend to outperform unconstrained long-form on citation rate by a meaningful margin even when the underlying research is identical, because the retriever can find a clean answer immediately.

How should Related Guides, Key Takeaways, and FAQ be built?

Related Guides should hold 4-8 absolute URLs to internal pages plus high-authority external sources. Key Takeaways should be 4-6 self-contained bullets, each 10-35 words. FAQ should hold 3-10 question-answer pairs with answers in the same 40-60 word window as section capsules. Each block has a distinct extraction job and they do not substitute for each other.

Related Guides feeds the engine's internal-linking graph. Cross-link tightly within your own topic cluster — every page in the cluster should link to every other page — and include two or three high-authority external sources to demonstrate the page is part of a broader conversation. Key Takeaways gives the engine reusable bullets it can lift into synthesized answers; keep each bullet self-contained and free of pronouns that require prior context.

FAQ is the highest-yield block for direct-quote retrieval. The 40-60 word constraint on FAQ answers mirrors section capsules because AI engines often surface FAQ answers as direct quotes when a user query maps to the question phrasing. Pair the FAQ with FAQPage JSON-LD on the same page (Schema.org FAQPage) so the engine parses the structure without inference.

How does this map to JSON-LD on the rendered page?

The seven-block architecture maps cleanly to JSON-LD. Article wraps the page (headline, author, publisher, datePublished, dateModified, description). Person describes the author with sameAs links. Organization describes the publisher. FAQPage wraps the FAQ block. BreadcrumbList describes the URL path. AI engines parse each of these reliably, which raises trust scoring and reduces parsing errors that lead to downweighting.

JSON-LD derivation is mechanical once the page architecture is fixed. The TL;DR populates the Article description. The author block populates Article.author and a top-level Person entry. The publisher block populates Article.publisher and a top-level Organization entry. The FAQ block populates FAQPage with mainEntity arrays. Tools like auto-geo derive all of this from the publish payload so authors never write Schema.org markup by hand.

Shadow maintains auto-geo as an open MIT-licensed reference implementation of this architecture. The TypeScript schema, the validator, the JSON-LD derivation, and the React renderer are all open source so teams can adopt the contract without adopting the tool.

Related Guides

Key Takeaways

  • A GEO-optimized page has seven blocks in fixed order: TL;DR, intro, H2 sections, Related Guides, Key Takeaways, FAQ, and disclosure.
  • Each H2 section opens with a 40-60 word answer capsule because AI engine retrieval scores passages independently and rewards complete answers near the heading.
  • The TL;DR is the highest-leverage block because ChatGPT and Perplexity often lift it verbatim when summarizing the page in a generated answer.
  • Related Guides should hold 4-8 absolute URLs and Key Takeaways should hold 4-6 self-contained bullets in the 10-35 word window.
  • FAQ answers stay in the same 40-60 word window as section capsules and pair with FAQPage JSON-LD for direct-quote retrieval.
  • Enforce the architecture at the publish boundary with a schema validator so the contract is structural rather than dependent on author discipline.

Frequently Asked Questions

Can I skip the TL;DR if my intro already summarizes the page?

No. The TL;DR and intro do different jobs. The TL;DR is a 40-60 word self-contained answer the AI engine quotes verbatim; the intro is open-ended context that frames the rest of the page. Collapsing them either makes the intro too short to set context or makes the TL;DR too long to be extractable in the engine's preferred citation window.

Why exactly 40-60 words for answer capsules?

The 40-60 word window is the empirical sweet spot across the major AI engines in 2026. Under 40 words and the passage lacks enough context to stand alone in a generated answer; over 60 and the engine truncates or paraphrases, losing your phrasing. The exact bounds are an enforcement choice; the principle is short, complete, and self-contained.

Can I use H3s and H4s inside an H2 section?

Yes, but sparingly. H3s help organize long sections but compete with the answer capsule for retrieval attention. The pattern that works well is the answer capsule first, then any H3-organized subsections after. Avoid H4 and deeper headings in resource content; they signal subdocument structure and tend to fragment passages in ways the retriever handles inconsistently.

How many H2 sections should a page have?

Four to six H2 sections per page is the sweet spot for resource content. Fewer than four often signals the page is too narrow to be a definitive reference; more than eight tends to dilute the page's topical focus and makes each section harder to retrieve cleanly inside a generated answer paragraph.

Does the order of blocks really matter?

Yes. AI engines weight passages near the top of the page more heavily than passages near the bottom, and consistent ordering across pages helps the engine learn your site's structure. The seven-block order — TL;DR, intro, sections, Related Guides, Key Takeaways, FAQ, disclosure — is enforced by auto-geo at the schema level for exactly this reason.

About the Author

Jessen Gibbs · Founder, Shadow

Jessen leads Shadow, a media research lab studying how AI engines surface and cite brands. He works with communications teams on Generative Engine Optimization (GEO) programs and writes about the page architecture that makes content quotable by ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews.

LinkedIn ↗

Shadow is the publisher of this resource and the maintainer of auto-geo, the open-source publishing engine that enforces the page architecture documented above. External research and Schema.org references are cited with full URLs to primary sources. Published by Shadow.