ML Journey

Batch Normalization vs Internal Covariate Shift

December 19, 2025 by Peter Song

When batch normalization was introduced in 2015 by Sergey Ioffe and Christian Szegedy, it revolutionized deep learning training. The paper claimed that batch normalization’s success stemmed from reducing “internal covariate shift”—a phenomenon where the distribution of layer inputs changes during training, forcing each layer to continuously adapt. This explanation became widely accepted in the deep … Read more

What Can Cursor AI Do For You?

December 19, 2025 by Peter Song

The landscape of software development has undergone a dramatic transformation with the emergence of AI-powered coding assistants, and Cursor AI stands at the forefront of this revolution. As developers worldwide grapple with increasingly complex codebases, tight deadlines, and the constant pressure to deliver high-quality software, Cursor AI has emerged as a powerful ally that fundamentally … Read more

How to Use Cursor AI for Python Machine Learning

December 19, 2025 by Peter Song

Cursor AI represents a paradigm shift in how developers write code, transforming the traditional IDE into an AI-powered development environment where natural language instructions generate complete code blocks, intelligent autocomplete predicts entire functions, and contextual understanding spans your entire project codebase. For Python machine learning practitioners, this translates into dramatically accelerated development workflows where you … Read more

Different Types of Vector Database

December 19, 2025 by Peter Song

The vector database landscape has exploded in recent years, driven by the AI revolution and the need to handle high-dimensional embeddings at scale. While all vector databases solve the fundamental problem of similarity search, they differ dramatically in architecture, capabilities, and ideal use cases. Understanding these differences is critical for selecting the right technology for … Read more

Early Stopping Strategies Based on Validation Curvature

December 19, 2025 by Peter Song

Training neural networks and iterative machine learning models involves a fundamental tension: models improve with more training iterations until they don’t, crossing an invisible threshold where continued training degrades generalization despite improving training performance. Early stopping—halting training before this degradation occurs—represents one of the most effective and widely used regularization techniques, yet the standard patience-based … Read more

Pruning Techniques for Decision Trees to Avoid Overfitting

December 18, 2025 by Peter Song

Decision trees possess a deceptive simplicity that masks a fundamental weakness: their natural inclination toward overfitting. Left unchecked, a decision tree will grow until it perfectly memorizes every training example, creating a leaf node for each observation and achieving 100% training accuracy while generalizing poorly to new data. This overfitting manifests as excessively complex trees … Read more

When to Use Vector Database

December 17, 2025 by Peter Song

Vector databases have emerged as essential infrastructure for modern AI applications, but understanding when they’re the right choice requires moving beyond the hype. While traditional databases excel at exact matches and structured queries, vector databases solve a fundamentally different problem: finding similarity in high-dimensional spaces. This comprehensive guide explores the specific scenarios where vector databases … Read more

What is RAG and Generative AI?

December 17, 2025 by Peter Song

Generative AI represents a paradigm shift in artificial intelligence where models create new content—text, images, code, or audio—rather than simply classifying or predicting from existing data, with large language models like GPT-4 and Claude exemplifying this capability through their ability to generate human-like text, answer questions, and engage in complex reasoning. Yet these powerful models … Read more

K-Means vs K-Nearest Neighbor: Two Fundamentally Different Algorithms

December 17, 2025 by Peter Song

Despite their confusingly similar names and shared use of the letter “k,” k-means and k-nearest neighbor (KNN) represent fundamentally different machine learning paradigms that solve completely different problems through entirely distinct mechanisms. K-means is an unsupervised clustering algorithm that discovers natural groupings in unlabeled data by iteratively assigning points to cluster centers and updating those … Read more

What Does K Mean in Clustering?

December 17, 2025 by Peter Song

The letter “k” appears constantly in clustering discussions, from algorithm names like k-means to evaluation metrics and parameter tuning guidance. For newcomers to machine learning and data science, this ubiquitous letter can seem mysterious—a variable that everyone uses but few explain clearly. Yet understanding what k represents and why it matters is fundamental to effectively … Read more