parquet Archives - ML Journey

Optimizing Parquet Schemas for ML Training Performance

December 7, 2025 by Peter Song

Machine learning training on large datasets has become the bottleneck in modern ML workflows. While practitioners obsess over model architecture and hyperparameters, they often overlook a fundamental performance constraint: how quickly training data can be read from disk and fed into GPUs or CPUs. When training models on terabytes of data stored in Parquet files, … Read more