Multi-Task Learning: Hard Parameter Sharing, Soft Sharing, and When It Beats Single-Task Models
A practical guide to multi-task learning for ML engineers: hard parameter sharing with task-specific heads, soft parameter sharing with cross-encoder regularisation, gradient cosine similarity for detecting negative transfer, homoscedastic uncertainty loss weighting, task sampling strategies, and an honest assessment of when multi-task training beats separate single-task baselines and when it does not.