Label Smoothing: When It Helps and When It Hurts
A practical guide to label smoothing for ML engineers: how soft targets prevent logit overconfidence, PyTorch implementation with nn.CrossEntropyLoss and a manual version for fine-grained control, the three settings where smoothing reliably helps (large-scale classification, seq2seq, small-data fine-tuning), why it actively hurts knowledge distillation, choosing smoothing values, and measuring calibration improvement with Expected Calibration Error.