Peter Song, Author at ML Journey

Feature Selection vs Dimensionality Reduction

October 9, 2025 by Peter Song

In machine learning and data science, the curse of dimensionality poses a significant challenge. As datasets grow not just in volume but in the number of features, models become computationally expensive, prone to overfitting, and difficult to interpret. Two powerful approaches address this challenge: feature selection and dimensionality reduction. While both aim to reduce the … Read more

How Does PyTorch Handle Regression Losses?

October 9, 2025 by Peter Song

Regression problems form the backbone of countless machine learning applications, from predicting house prices to forecasting stock values and estimating continuous variables in scientific research. Unlike classification tasks that predict discrete categories, regression models predict continuous numerical values, requiring specialized loss functions that measure the discrepancy between predicted and actual values. PyTorch, one of the … Read more

Monitoring Embeddings Drift in Production LLM Pipelines

October 9, 2025 by Peter Song

In the rapidly evolving landscape of machine learning operations, monitoring embeddings drift in production LLM pipelines has become a critical concern for organizations deploying large language models at scale. As these systems process millions of queries daily, the quality and consistency of embeddings can significantly impact downstream applications, from semantic search to recommendation systems and … Read more

Best Jupyter Notebook Extensions for Data Science

October 9, 2025 by Peter Song

Jupyter Notebook has become the de facto standard for data science work, offering an interactive environment that seamlessly blends code, visualizations, and documentation. However, the default Jupyter installation, while powerful, lacks many features that can dramatically improve your productivity and workflow. This is where Jupyter Notebook extensions come in—community-developed add-ons that enhance functionality, streamline repetitive … Read more

How Do I Deploy ML Models in AWS Lambda?

October 9, 2025 by Peter Song

Deploying machine learning models in AWS Lambda has become increasingly popular among data scientists and engineers who want to create scalable, cost-effective inference endpoints. Lambda’s serverless architecture eliminates the need to manage infrastructure while automatically scaling based on demand. However, deploying ML models to Lambda comes with unique challenges around package size limits, cold starts, … Read more

Dealing With Missing Data in Real-World ML Projects

October 9, 2025 by Peter Song

Missing data is the silent saboteur of machine learning projects. While academic datasets come pristine and complete, real-world data is messy—filled with gaps, nulls, and inconsistencies that can derail even the most sophisticated models. I’ve seen projects fail not because of poor algorithm choices or insufficient computing power, but because missing data was handled carelessly … Read more

How to Normalize a Vector in Python

October 8, 2025 by Peter Song

Vector normalization is a fundamental operation in data science, machine learning, and scientific computing. Whether you’re preparing data for a neural network, calculating cosine similarity, or working with directional data, understanding how to normalize vectors in Python is essential. In this comprehensive guide, we’ll explore multiple approaches to vector normalization, from basic implementations to optimized … Read more

Gemini vs Claude for Enterprise AI

October 8, 2025 by Peter Song

The enterprise AI landscape has evolved dramatically in 2025, with two powerhouse models emerging as frontrunners for business applications: Google’s Gemini and Anthropic’s Claude. As organizations increasingly integrate artificial intelligence into their core operations, the choice between these platforms has become critical for enterprise success. This comprehensive analysis examines the key differentiators, strengths, and practical … Read more

How to Reduce Overfitting in Scikit-learn

October 8, 2025 by Peter Song

Overfitting is one of the most common challenges you’ll face when building machine learning models. It occurs when your model learns the training data too well—including its noise and peculiarities—resulting in poor performance on new, unseen data. If you’ve ever built a model that achieves 99% accuracy on training data but barely 60% on test … Read more

How to Normalize vs Standardize Data in Scikit-Learn

October 8, 2025 by Peter Song

Data scaling is one of those preprocessing steps that can make or break your machine learning model, yet it’s often treated as an afterthought. The terms “normalization” and “standardization” are frequently used interchangeably, but they’re fundamentally different transformations that serve different purposes. Understanding when to use each technique—and how to implement them correctly in scikit-learn—is … Read more