Methodology v2.3 · June 2026

How we measure AI visibility

Every number we publish has a method behind it. This page documents our prompt sampling approach, engine coverage, citation logging, sentiment rubric and disclosure standards — in full.

AVG Research Team

Last updated · Jun 2026

Methodology v2.3

On this page

01 · Scope & coverage 02 · Prompt construction 03 · Sampling protocol 04 · Citation logging 05 · Sentiment scoring 06 · Composite score 07 · Disclosure standards

Principle

We disclose every prompt, every engine, every sample size and every date window. A metric without disclosure is marketing, not measurement.

Measuring AI visibility is harder than it looks. Engine responses are non-deterministic. Prompt framing changes outcomes. Engine architectures evolve. Our methodology is designed to produce numbers that are reproducible, comparable over time and honest about their limitations.

01 — Scope & coverage

The AI Visibility Index currently covers 1,200 domains across 40 verticals and five answer engines: ChatGPT (GPT-4o), Perplexity Pro, Gemini 1.5 Pro, Microsoft Copilot and Google AI Overviews. Domain selection prioritises verticals where AI search has the highest buyer-journey penetration.

02 — Prompt construction

For each vertical, we build a prompt taxonomy covering three intent bands: discovery ("best tools for X"), comparison ("A vs B") and validation ("is A reliable / what does A cost"). Prompts are weighted by approximate real-world query frequency, sourced from keyword research and engine autocomplete data.

Prompt standards

✓Prompts are phrased as a buyer would ask, not as a brand would prefer

✓No brand names in discovery prompts — buyer-intent neutral

✓Comparison prompts include at least three competitors

✓All prompts reviewed by a human editor before inclusion

✓Prompt set versioned and archived — historical comparisons use same prompts

03 — Sampling protocol

Each prompt is run 5 times per engine to smooth non-determinism. Results are logged independently for each run. Final presence/citation/sentiment scores are the modal outcome across the five runs. Where runs diverge (≥2/5 disagreement), the domain is flagged for manual review.

04 — Citation logging

For each prompt-engine-run triplet, we record: (1) Presence — brand name mentioned in the response? (2) Citation — domain explicitly linked or attributed? (3) Position — first, second or later mention? (4) Sentiment — see section 05.

05 — Sentiment scoring

Sentiment is scored on a 4-point scale: Accurate-positive (correct facts, favourable framing), Accurate-neutral (correct facts, no framing), Inaccurate (factual errors or hallucinations), Absent. Scoring uses a calibrated LLM rubric, with 10% of samples reviewed by a human editor to maintain calibration.

06 — Composite score (0–100)

Score composition

Answer presence (% of prompts)40% weight

Citation share (vs competitors)35% weight

Sentiment quality score25% weight

07 — Disclosure standards

Every published score includes: the prompt taxonomy version, the engines and model versions queried, the sample size, the date window, and the sentiment rubric version. We consider undisclosed AI visibility metrics to be unreliable and do not publish or reference them.

Our commitment

We will never publish a share-of-model figure without disclosing its prompt set, engine set, sample size and date window. Every statistic we publish is linked to its primary source or documented methodology. If we can't source it, we don't publish it.