How we measure AI visibility
Every number we publish has a method behind it. This page documents our prompt sampling approach, engine coverage, citation logging, sentiment rubric and disclosure standards — in full.
We disclose every prompt, every engine, every sample size and every date window. A metric without disclosure is marketing, not measurement.
Measuring AI visibility is harder than it looks. Engine responses are non-deterministic. Prompt framing changes outcomes. Engine architectures evolve. Our methodology is designed to produce numbers that are reproducible, comparable over time and honest about their limitations.
01 — Scope & coverage
The AI Visibility Index currently covers 1,200 domains across 40 verticals and five answer engines: ChatGPT (GPT-4o), Perplexity Pro, Gemini 1.5 Pro, Microsoft Copilot and Google AI Overviews. Domain selection prioritises verticals where AI search has the highest buyer-journey penetration.
02 — Prompt construction
For each vertical, we build a prompt taxonomy covering three intent bands: discovery ("best tools for X"), comparison ("A vs B") and validation ("is A reliable / what does A cost"). Prompts are weighted by approximate real-world query frequency, sourced from keyword research and engine autocomplete data.
03 — Sampling protocol
Each prompt is run 5 times per engine to smooth non-determinism. Results are logged independently for each run. Final presence/citation/sentiment scores are the modal outcome across the five runs. Where runs diverge (≥2/5 disagreement), the domain is flagged for manual review.
04 — Citation logging
For each prompt-engine-run triplet, we record: (1) Presence — brand name mentioned in the response? (2) Citation — domain explicitly linked or attributed? (3) Position — first, second or later mention? (4) Sentiment — see section 05.
05 — Sentiment scoring
Sentiment is scored on a 4-point scale: Accurate-positive (correct facts, favourable framing), Accurate-neutral (correct facts, no framing), Inaccurate (factual errors or hallucinations), Absent. Scoring uses a calibrated LLM rubric, with 10% of samples reviewed by a human editor to maintain calibration.
06 — Composite score (0–100)
07 — Disclosure standards
Every published score includes: the prompt taxonomy version, the engines and model versions queried, the sample size, the date window, and the sentiment rubric version. We consider undisclosed AI visibility metrics to be unreliable and do not publish or reference them.
We will never publish a share-of-model figure without disclosing its prompt set, engine set, sample size and date window. Every statistic we publish is linked to its primary source or documented methodology. If we can't source it, we don't publish it.