Peter Song, Author at ML Journey

Fine-Tuning HuggingFace Transformers in Jupyter Notebook

October 12, 2025 by Peter Song

Fine-tuning pre-trained transformer models has become the cornerstone of modern NLP development. While cloud-based platforms and production pipelines have their place, Jupyter Notebook remains the preferred environment for experimentation, rapid prototyping, and iterative model development. The interactive nature of notebooks combined with HuggingFace’s Transformers library creates a powerful combination for adapting state-of-the-art models to your … Read more

Airflow vs Prefect for Machine Learning Pipelines

October 12, 2025 by Peter Song

Building robust machine learning pipelines requires orchestration tools that can handle complex workflows, manage dependencies, and scale with your data science operations. Apache Airflow and Prefect have emerged as two leading contenders in this space, each bringing distinct philosophies and capabilities to ML pipeline orchestration. Understanding their core differences is essential for data science teams … Read more

MLOps Pipeline with Terraform and Kubernetes Step by Step

October 11, 2025 by Peter Song

Building a robust MLOps pipeline with Terraform and Kubernetes step by step has become essential for organizations seeking to deploy, manage, and scale machine learning models in production environments. This comprehensive approach combines Infrastructure as Code (IaC) principles with container orchestration to create scalable, reproducible, and maintainable ML workflows that can handle enterprise-grade workloads. The … Read more

How to Preprocess Text Data for Sentiment Analysis

October 11, 2025 by Peter Song

Text preprocessing is the invisible foundation upon which successful sentiment analysis models are built. Raw text data—whether from social media posts, customer reviews, or survey responses—arrives chaotic and inconsistent. Typos, slang, punctuation variations, and irregular capitalization create noise that can confuse machine learning models and degrade performance. The difference between a sentiment classifier achieving 75% … Read more

PyTorch vs TensorFlow for Beginners

October 11, 2025 by Peter Song

Choosing your first deep learning framework is one of the most consequential decisions you’ll make as a machine learning beginner. The framework you learn shapes how you think about neural networks, influences what resources and communities you can access, and determines how quickly you can move from tutorials to real projects. PyTorch and TensorFlow dominate … Read more

Documenting Machine Learning Experiments in Jupyter

October 11, 2025 by Peter Song

Machine learning experimentation is inherently messy. You try different architectures, tweak hyperparameters, preprocess data in various ways, and run countless experiments hoping to find that winning combination. Three months later, when you need to explain why a particular model works or reproduce your best result, you’re left staring at cryptic filenames and uncommented code blocks, … Read more

Encoding Categorical Variables for Deep Learning

October 10, 2025 by Peter Song

Deep learning models excel at processing numerical data, but real-world datasets often contain categorical variables that require special handling. Understanding how to properly encode categorical variables for deep learning is crucial for building effective neural networks that can leverage all available information in your dataset. Categorical variables represent discrete categories or groups rather than continuous … Read more

Encoding Categorical Variables for Machine Learning

October 11, 2025October 10, 2025 by Peter Song

Machine learning algorithms speak the language of numbers. Whether you’re training a neural network, fitting a decision tree, or building a linear regression model, your algorithm expects numerical inputs it can process mathematically. But real-world data rarely arrives in such a convenient format. Customer segments, product categories, geographical regions, and survey responses all come as … Read more

Grid Search vs Random Search vs Bayesian Optimization

October 10, 2025 by Peter Song

Machine learning models are only as good as their hyperparameters. Whether you’re building a neural network, training a gradient boosting model, or fine-tuning a support vector machine, selecting the right hyperparameters can mean the difference between a mediocre model and one that achieves state-of-the-art performance. Three primary strategies dominate the hyperparameter optimization landscape: grid search, … Read more

How Do I Interpret a Classification Model?

October 10, 2025 by Peter Song

Building a classification model is only half the battle—understanding how it makes decisions, why it succeeds or fails, and communicating its behavior to stakeholders requires mastering model interpretation. A model that achieves 95% accuracy might seem impressive until you discover it predicts the majority class for everything, or that its errors cluster in critical business … Read more