Model Calibration: Temperature Scaling, Platt Scaling, and ECE in Practice
A practical guide to model calibration for ML engineers: measuring calibration with ECE and reliability diagrams, fixing overconfidence with temperature scaling using LBFGS on a held-out validation set, Platt scaling for binary classification, non-parametric alternatives with isotonic regression and histogram binning, calibrating LLMs on multiple-choice benchmarks via log-likelihood scoring, and recalibrating after every fine-tuning step as part of the model release checklist.