ML Journey

Saving and Loading Sklearn Models the Right Way

October 12, 2025 by Peter Song

Training machine learning models takes time and computational resources. Once you’ve built a model that performs well, the last thing you want is to retrain it from scratch every time you need to make predictions. Model persistence—saving trained models to disk and loading them later—is a fundamental skill in production machine learning. While scikit-learn makes … Read more

Logging Machine Learning Experiments with MLflow

October 12, 2025 by Peter Song

Machine learning development is inherently experimental. You try different algorithms, tweak hyperparameters, preprocess data in various ways, and iterate through dozens or even hundreds of model variations. Without systematic experiment tracking, this process becomes chaotic—you lose track of what worked, can’t reproduce promising results, and waste time re-running experiments you’ve already tried. MLflow provides a … Read more

Fine-Tuning HuggingFace Transformers in Jupyter Notebook

October 12, 2025 by Peter Song

Fine-tuning pre-trained transformer models has become the cornerstone of modern NLP development. While cloud-based platforms and production pipelines have their place, Jupyter Notebook remains the preferred environment for experimentation, rapid prototyping, and iterative model development. The interactive nature of notebooks combined with HuggingFace’s Transformers library creates a powerful combination for adapting state-of-the-art models to your … Read more

Airflow vs Prefect for Machine Learning Pipelines

October 12, 2025 by Peter Song

Building robust machine learning pipelines requires orchestration tools that can handle complex workflows, manage dependencies, and scale with your data science operations. Apache Airflow and Prefect have emerged as two leading contenders in this space, each bringing distinct philosophies and capabilities to ML pipeline orchestration. Understanding their core differences is essential for data science teams … Read more

MLOps Pipeline with Terraform and Kubernetes Step by Step

October 11, 2025 by Peter Song

Building a robust MLOps pipeline with Terraform and Kubernetes step by step has become essential for organizations seeking to deploy, manage, and scale machine learning models in production environments. This comprehensive approach combines Infrastructure as Code (IaC) principles with container orchestration to create scalable, reproducible, and maintainable ML workflows that can handle enterprise-grade workloads. The … Read more

How to Preprocess Text Data for Sentiment Analysis

October 11, 2025 by Peter Song

Text preprocessing is the invisible foundation upon which successful sentiment analysis models are built. Raw text data—whether from social media posts, customer reviews, or survey responses—arrives chaotic and inconsistent. Typos, slang, punctuation variations, and irregular capitalization create noise that can confuse machine learning models and degrade performance. The difference between a sentiment classifier achieving 75% … Read more

PyTorch vs TensorFlow for Beginners

October 11, 2025 by Peter Song

Choosing your first deep learning framework is one of the most consequential decisions you’ll make as a machine learning beginner. The framework you learn shapes how you think about neural networks, influences what resources and communities you can access, and determines how quickly you can move from tutorials to real projects. PyTorch and TensorFlow dominate … Read more

Documenting Machine Learning Experiments in Jupyter

October 11, 2025 by Peter Song

Machine learning experimentation is inherently messy. You try different architectures, tweak hyperparameters, preprocess data in various ways, and run countless experiments hoping to find that winning combination. Three months later, when you need to explain why a particular model works or reproduce your best result, you’re left staring at cryptic filenames and uncommented code blocks, … Read more

Encoding Categorical Variables for Deep Learning

October 10, 2025 by Peter Song

Deep learning models excel at processing numerical data, but real-world datasets often contain categorical variables that require special handling. Understanding how to properly encode categorical variables for deep learning is crucial for building effective neural networks that can leverage all available information in your dataset. Categorical variables represent discrete categories or groups rather than continuous … Read more

Encoding Categorical Variables for Machine Learning

October 11, 2025October 10, 2025 by Peter Song

Machine learning algorithms speak the language of numbers. Whether you’re training a neural network, fitting a decision tree, or building a linear regression model, your algorithm expects numerical inputs it can process mathematically. But real-world data rarely arrives in such a convenient format. Customer segments, product categories, geographical regions, and survey responses all come as … Read more