Calibration measures whether a goal's confidence scores can be trusted. A well-calibrated goal that says "85% confident" should be right about 85% of the time. The calibration page gives admins the data to know where each goal stands.
What you'll see
- Cold-start gate — every goal goes through three states before
calibration is meaningful:
- Collecting — not enough decisions yet.
- Warming — some signal, but unstable.
- Calibrated — enough decisions for the metrics to be reliable.
- Reliability diagram — confidence buckets (10 of them) plotted against actual hit rate. A perfectly calibrated goal traces the diagonal reference line.
- Expected Calibration Error (ECE) — single-number summary of how far the curve sits from the diagonal. Lower is better.
- Precision / recall table — classic accuracy metrics, broken down by routing zone.
- ECE trend — sparkline over 7d / 14d / 30d so you can see if the goal is getting better or drifting.
What you can do here
- Filter to view org-wide calibration or per-goal calibration.
How to read it
- A calibrated goal sitting tight to the diagonal: trust the routing thresholds. You can safely loosen Autonomous if precision is high.
- Confidence over-stated (curve below the diagonal): the goal thinks it's confident more often than it should be. Tighten Advisory.
- Confidence under-stated (curve above the diagonal): you're over- reviewing. Loosen Advisory.
Tip
Don't tune thresholds based on a goal still in collecting or warming. The numbers are noisy until the gate flips to calibrated.