The 11% Citation Overlap Problem: Why Cross-Engine Averaging Fails

By Nathan Hill-Haimes·2026-05-12·Methodology: v1.0

When ChatGPT and Perplexity are asked the same query, only 11% of cited domains overlap (ziptie.dev, 2026). This third-party finding validates why the AnswerGraph engine measures each engine independently.

Key findings

11% domain overlap between ChatGPT and Perplexity for identical queries (ziptie.dev, 2026)
Cross-engine composite scores have no statistical validity
Engine-specific measurement is the only defensible approach

The measurement

The ziptie.dev cross-platform citation study probed 1,200 commercial-intent queries simultaneously on ChatGPT and Perplexity.

For each query, all domains cited in the response were recorded. The Jaccard similarity coefficient between the ChatGPT citation set and the Perplexity citation set was then calculated.

Results

The median Jaccard similarity was 0.11 (11%). This means that for a typical query, 89% of the domains cited by one engine are not cited by the other.

This finding has a direct methodological consequence: any measurement system that averages citation rates across engines produces a composite score with no statistical meaning. You cannot meaningfully average across distributions with 89% divergence.

Implications for measurement

Engine-specific metrics are mandatory. A brand's citation share on ChatGPT is a different measurement from its citation share on Perplexity. Combining them into one number destroys information.
"AI visibility scores" are methodologically indefensible. Any product offering a single cross-engine score is producing a number that cannot be reproduced, cannot be decomposed, and cannot be attributed to any specific engine behaviour.
Engine-specific measurement is the only defensible approach. Cross-engine comparisons should be presented as separate measurements, never averaged.

Where the AnswerGraph engine's own findings will live

The AnswerGraph panel measures four engines independently. Once enough observations accumulate to produce its own cross-engine overlap analysis, those findings will be published here with full data. Until then, the ziptie.dev study remains the best public evidence for why per-engine measurement is necessary.