The Loop03 — Aggregated Stats
Review · VerifiedUpdated 2026-05-03Verified against code 2026-05-03

Verified 2026-05-03 for the score model and integrity rules. Specific dimension weights in code may have shifted since last verification — re-check nudg3-workflows/services/scoring_service.py before externally publishing the weight breakdown.

03 — Aggregated Stats

The third stage of the loop. Every mention extracted in Stage 2 gets rolled up into the workspace-level metrics that customers and reports actually quote. This is where the platform produces its core optimisation signal.

The two headline scores

Visibility Score (0–100)

The golden optimisation metric. Deterministic. Based on 6 weighted dimensions using unbranded query data only:

  1. Mention frequency (30%)
  2. Position quality (25%)
  3. Sentiment (15%)
  4. Competitive standing (15%)
  5. Source quality (10%)
  6. Provider diversity (5%)

This is the number teams track week-over-week.

Audit Score (0–100)

The first-touch quality metric used in onboarding and free audits. Same six dimensions, same weights, same deterministic rule. The audit score is what’s surfaced on the Visibility Score banner in the dashboard.

Other aggregated metrics (Phase 6)

Per-report and per-period:

  • Share of voice — % of relevant responses where the workspace’s brand appears
  • Position premium — average position of the brand vs competitors when both appear
  • Co-mention analysis — which competitors get mentioned alongside the brand, and on which prompts
  • Sentiment by provider — sentiment breakdown across providers (catches “ChatGPT loves us, Perplexity is neutral”)
  • Query gaps — prompts where the brand doesn’t appear but should
  • Source diversity — how many distinct domains cite the brand

Scoring integrity (the rules that keep the score honest)

These are the invariants the platform enforces on every score calculation:

  1. 100% deterministic. No LLM involvement in the final calculation. The LLM may generate an estimate for its own context during report generation, but the system always overwrites it with the deterministic calculation.
  2. Unbranded and comparison query data only. Branded prompts (where the brand name appears in the question) produce inflated metrics that don’t reflect organic AI discoverability.
  3. Active prompts only. Inactive prompts are filtered out at the data-fetch layer (fetch_responses_node, enforced by PR #318 / v3.8.1) so retired prompts can’t contaminate scores.
  4. Change guards. Week-over-week deltas are suppressed when the scoring methodology changes between versions or when the active prompt set changes significantly (>20%). Detail in Historical Tracking.

Why determinism is the moat

Competitors that score with LLMs in the critical path produce numbers that drift between runs and across versions. Sales prospects don’t trust that. Our deterministic floor means a workspace’s score from yesterday and today is genuinely comparable, and movements over time are real. That’s what makes the score worth optimising against.

See also