Blogs

Information Gain: The Metric That Determines If AI Cites You

Q: What is information gain in SEO?

Information gain is the measure of new information a piece of content adds beyond what already exists on the web. AI engines weigh this heavily when deciding which sources to cite. Google patented the concept in 2022 (US11354342B1) and it is now central to ChatGPT, Perplexity, and AI Overviews citation logic.

Q: How is information gain different from EAT or content quality?

EAT measures author credibility and site authority. Quality measures readability and depth. Information gain measures novelty: does this content give the reader or AI engine something they cannot get elsewhere? You can have strong EAT and still score low on information gain if you are just summarising.

Q: Does this mean I need to publish original research?

Yes, at least for flagship content. Original surveys, first-party data analysis, proprietary benchmarks, expert interviews with named practitioners, and case-specific teardowns all score high on information gain. Reformatted blog posts do not.

Q: How much original research do I need per quarter?

For a mid-market B2B content programme, 2 to 4 pieces of original research per quarter is realistic. Pair each with 6 to 10 derivative content pieces (blogs, social, video, LinkedIn posts) that each reference back to the original. Over a year, 12 to 16 flagship pieces anchor a library of 60 to 120 derivative assets.

April 28, 2026

Enoch Pakanati

Need help with B2B Marketing?

Let the smarketers’ team drive your pipeline with data-led campaigns and AI-powered growth strategies.

Summarize and analyze this article with

In 2015, Brian Dean coined ‘skyscraper content’: find the best-ranking article in your category, write a longer and more comprehensive version, and out-rank it. The technique worked for 7 years. It stopped working in 2023.

The reason: AI engines read the web differently than Google’s pre-LLM crawler did. They look for information gain. A longer version of content that already exists is information redundant, not information gained. Skyscraper content is now invisible to ChatGPT, Perplexity, and AI Overviews.

Google's 2022 information gain patent (US11354342B1) describes a scoring mechanism for content novelty. ChatGPT, Perplexity, and Gemini all weigh information gain in their retrieval layers. Pages scoring high on information gain are cited 3 to 6x more often than pages with identical structural SEO but low information gain (The Smarketers GEO Audit Q1 2026).

What information gain actually measures

Information gain is the amount of new, useful information a piece of content adds beyond what is already available on other URLs covering the same topic. Three components:

Novelty. Claims, data, frameworks, or perspectives that do not appear on competing pages.
Specificity. Concrete examples, named case studies, measurable results, and primary source citations.
Extractability. The novel information is stated clearly enough for an AI engine to extract and attribute to your source.

A page that scores high on all three gets cited. A page high on 1 and 2 but low on 3 gets read by humans but not cited by AI. A page high on 3 but low on 1 and 2 is just a well-structured summary of what everyone else said, and it gets ignored.

Why skyscraper content fails

Skyscraper content optimises for length and comprehensiveness. Under pre-LLM SEO, that was a legitimate signal. A 3,000-word guide that covered more angles than the 1,500-word competitor looked more authoritative and Google ranked it higher.

AI engines do not care about length. They care about which source added new information. A 600-word post with a novel data point cites better than a 3,000-word post that rehashes published statistics.

Worse, longer skyscraper content often dilutes the novel elements inside it. The new insight in paragraph 4 gets buried under 2,400 words of generic category context. AI engines’ answer extraction pulls from pages with dense, extractable novelty, not from pages that bury it.

What high information gain content looks like

Original research

First-party surveys, benchmark reports, platform audits, interview series. Example: a 500-URL benchmark of ChatGPT citation patterns scores dramatically higher than a blog post summarising second-hand statistics on ChatGPT.

Proprietary frameworks

Named frameworks backed by your own field data. The Smarketers’ ‘3-Pillar Hybrid GTM‘ is proprietary because it is defined by us and cited with attribution. A generic ‘content marketing framework’ is not.

Named expert perspectives

Quoted insights from named practitioners with credentials. ‘According to Jane Smith, VP of SEO at Salesforce…’ scores higher than ‘Industry experts suggest…’. AI engines can attribute named sources.

Case-specific teardowns

Detailed walkthroughs of real implementations with real numbers. Anonymised is acceptable; vague is not. A case that says ‘a B2B SaaS with $32M ARR improved win rate from 18% to 31% in 2 quarters by doing X’ scores far higher than ‘many companies see improvement’.

Contrarian analysis

Arguments that push back on a widely-held view, with evidence. If the consensus is ‘SDR teams drive enterprise pipeline’ and you argue ‘SDR teams are cost-inefficient above $100K ACV’, with data, AI engines surface your perspective because it fills a gap in the canonical answer.

THE PRACTICAL TEST: IF YOUR PAGE WAS DELETED FROM THE WEB TOMORROW, WOULD ANY UNIQUE INFORMATION BE LOST? IF YES, IT HAS INFORMATION GAIN. IF NO, AI ENGINES WILL CITE THE PAGES THAT DO.

How to audit existing content for information gain

Take your top 20 highest-traffic pages. For each, answer 5 questions:

Does this page contain original data, frameworks, or named expert perspectives that do not appear on competitor pages?
Does the core novel claim appear in the first 300 words in an extractable form?
Is the author named, credentialed, and recognisable in the category?
Are primary citations used (5+) rather than aggregator links? 5. If this page disappeared, would anyone notice?

Pages with 4+ yes answers are high-info-gain. Pages with 2 or fewer are redundant and should be rewritten or merged.

The 2026 content mix

Effective B2B content programmes in 2026 look roughly like this:

20% flagship high-info-gain pieces (original research, proprietary frameworks, named expert interviews). These are the anchor citations.

50% derivative content that references the flagship pieces (summaries, social posts, video scripts, email content, LinkedIn articles). These amplify the anchor and feed the citation ecosystem.

20% product and solution content (use case pages, integration guides, competitive comparisons). These convert once AI engines surface your brand.

10% news and commentary (industry analysis, trend responses). These feed Perplexity’s recency bias and LinkedIn engagement.

The content team implication

High information gain demands different resources. A 2020 B2B content team was 3 writers, 1 editor, 1 designer. A 2026 content team should be 2 writers, 1 editor, 1 designer, 1 research analyst, and regular contract access to named subject matter experts willing to be attributed. The research analyst is the new critical role: they run surveys, analyse first-party data, and produce the flagship insights that the rest of the team derives from.

Teams without a research analyst default to summarising others’ work. That produces low-info-gain content and invisible AI citation profiles.

Frequently Asked Questions

What is information gain in SEO?

Information gain is the measure of new information a piece of content adds beyond what already exists on the web. AI engines weigh this heavily when deciding which sources to cite. Google patented the concept in 2022 (US11354342B1) and it is now central to ChatGPT, Perplexity, and AI Overviews citation logic.

How is information gain different from EAT or content quality?

EAT measures author credibility and site authority. Quality measures readability and depth. Information gain measures novelty: does this content give the reader or AI engine something they cannot get elsewhere? You can have strong EAT and still score low on information gain if you are just summarising.

Does this mean I need to publish original research?

Yes, at least for flagship content. Original surveys, first-party data analysis, proprietary benchmarks, expert interviews with named practitioners, and case-specific teardowns all score high on information gain. Reformatted blog posts do not.

How can I tell if my content has information gain?

Three tests. First, can I paste the same claims or data into a dozen other articles on the topic? If yes, low information gain. Second, does the content answer a question the reader cannot find answered elsewhere? Third, if the content was removed from the web, would anything material be lost? High information gain content passes all three tests.

How much original research do I need per quarter?

For a mid-market B2B content programme, 2 to 4 pieces of original research per quarter is realistic. Pair each with 6 to 10 derivative content pieces (blogs, social, video, LinkedIn posts) that each reference back to the original. Over a year, 12 to 16 flagship pieces anchor a library of 60 to 120 derivative assets.

Audit your top 20 pages for information gain

The Smarketers audit your highest-traffic URLs against a 5-point information gain rubric and return a prioritised rewrite list. Typical audit takes 5 business days. Request via DM or email.

April 28, 2026

Enoch Pakanati

Blogs

Information Gain: The Metric That Determines If AI Cites You

Need help with B2B Marketing?

Summarize and analyze this article with

What information gain actually measures

Why skyscraper content fails

What high information gain content looks like

Original research

Proprietary frameworks

Named expert perspectives

Case-specific teardowns

Contrarian analysis

THE PRACTICAL TEST: IF YOUR PAGE WAS DELETED FROM THE WEB TOMORROW, WOULD ANY UNIQUE INFORMATION BE LOST? IF YES, IT HAS INFORMATION GAIN. IF NO, AI ENGINES WILL CITE THE PAGES THAT DO.

How to audit existing content for information gain

The 2026 content mix

The content team implication

Frequently Asked Questions

Audit your top 20 pages for information gain

Are you looking for ways to elevate your growth marketing efforts?

Schedule a free 30-minute analysis of your marketing initiatives with a senior Smarketer.

rELATED BLOGS

The Growth Grader

LET’S TALK!

Quick Links

Industries

Services

Insights

Blogs

Information Gain: The Metric That Determines If AI Cites You

Table of Contents

Need help with B2B Marketing?

Summarize and analyze this article with

What information gain actually measures

Why skyscraper content fails

What high information gain content looks like

Original research

Proprietary frameworks

Named expert perspectives

Case-specific teardowns

Contrarian analysis

THE PRACTICAL TEST: IF YOUR PAGE WAS DELETED FROM THE WEB TOMORROW, WOULD ANY UNIQUE INFORMATION BE LOST? IF YES, IT HAS INFORMATION GAIN. IF NO, AI ENGINES WILL CITE THE PAGES THAT DO.

How to audit existing content for information gain

The 2026 content mix

The content team implication

Frequently Asked Questions

Audit your top 20 pages for information gain

Are you looking for ways to elevate your growth marketing efforts?

Schedule a free 30-minute analysis of your marketing initiatives with a senior Smarketer.

rELATED BLOGS

The Growth Grader

LET’S TALK!