Calibration dashboard

Calibration measures whether a goal's confidence scores can be trusted. A well-calibrated goal that says "85% confident" should be right about 85% of the time. The calibration page gives admins the data to know where each goal stands.

What you'll see

Cold-start gate — every goal goes through three states before calibration is meaningful:
- Collecting — not enough decisions yet.
- Warming — some signal, but unstable.
- Calibrated — enough decisions for the metrics to be reliable.
Reliability diagram — confidence buckets (10 of them) plotted against actual hit rate. A perfectly calibrated goal traces the diagonal reference line.
Expected Calibration Error (ECE) — single-number summary of how far the curve sits from the diagonal. Lower is better.
Precision / recall table — classic accuracy metrics, broken down by routing zone.
ECE trend — sparkline over 7d / 14d / 30d so you can see if the goal is getting better or drifting.

What you can do here

Filter to view org-wide calibration or per-goal calibration.

How to read it

A calibrated goal sitting tight to the diagonal: trust the routing thresholds. You can safely loosen Autonomous if precision is high.
Confidence over-stated (curve below the diagonal): the goal thinks it's confident more often than it should be. Tighten Advisory.
Confidence under-stated (curve above the diagonal): you're over- reviewing. Loosen Advisory.

Tip

Don't tune thresholds based on a goal still in collecting or warming. The numbers are noisy until the gate flips to calibrated.

What you'll see

What you can do here

How to read it

Tip

Related