Peter Song, Author at ML Journey

Evaluating LLM Performance with Perplexity and ROUGE Scores

November 16, 2025 by Peter Song

Large language models have transformed natural language processing, but their impressive capabilities mean nothing without robust evaluation methods that quantify performance objectively and comparably across models. While human evaluation remains the gold standard for assessing output quality, subjective assessments don’t scale to the thousands of model variants, hyperparameter configurations, and training checkpoints that modern LLM … Read more

Exploring Correlation vs Causation in Real-World Datasets

November 16, 2025 by Peter Song

The distinction between correlation and causation represents one of the most critical—yet frequently misunderstood—concepts in data analysis, with real-world consequences ranging from misguided business decisions to harmful public policies. When ice cream sales and drowning deaths both increase during summer months, the correlation is undeniable, yet no one seriously argues that ice cream causes drowning. … Read more

End-to-End Streaming Architecture with Kinesis and Glue

November 16, 2025 by Peter Song

Modern applications generate continuous streams of data—clickstream events from websites, IoT sensor readings, transaction logs, application metrics, and real-time user interactions—that demand immediate processing and analysis to extract timely insights. Building robust streaming architectures that ingest, transform, and analyze this data at scale while maintaining reliability and cost-efficiency presents significant engineering challenges that Amazon Web … Read more

What is “Large” in Large Language Model?

November 16, 2025 by Peter Song

The term “Large Language Model” has become ubiquitous in discussions about artificial intelligence, yet the meaning of “large” remains surprisingly unclear to many. Is it about physical size? Computational power? The amount of text processed? Understanding what makes these models “large” matters not just for technical comprehension but for grasping their capabilities, limitations, costs, and … Read more

How to Clean Messy Data Without Losing Your Sanity

November 16, 2025 by Peter Song

Data cleaning—the process of detecting and correcting corrupt, inaccurate, or inconsistent records from datasets—consumes up to 80% of data scientists’ time according to industry surveys, yet receives far less attention than modeling techniques or algorithms. The frustration of encountering dates formatted three different ways in the same column, names with random capitalization and special characters, … Read more

How to Detect Bias in Large Language Models

November 16, 2025 by Peter Song

Large language models have become integral to applications ranging from hiring tools and customer service to content generation and decision support systems, making the detection of bias within these models not just an academic concern but a critical operational requirement. Bias in LLMs—systematic unfairness or prejudice reflected in model outputs—can perpetuate discrimination, reinforce stereotypes, and … Read more

What is Change Data Capture in Data Engineering

November 16, 2025 by Peter Song

In the world of data engineering, keeping data synchronized across multiple systems is one of the most challenging tasks organizations face. As businesses grow and their data infrastructure becomes more complex, the need to track and propagate changes efficiently becomes critical. This is where Change Data Capture (CDC) emerges as a fundamental technique that has … Read more

Large Language Models vs Generative AI

November 16, 2025 by Peter Song

The terms “Large Language Model” and “Generative AI” dominate contemporary technology discussions, often used interchangeably despite representing fundamentally different concepts. This conflation obscures important distinctions that matter for understanding capabilities, limitations, and appropriate applications of these technologies. Generative AI represents a broad category of artificial intelligence systems capable of creating new content—text, images, music, video, … Read more

Large Language Models vs NLP

November 16, 2025 by Peter Song

The terms “Large Language Model” and “Natural Language Processing” are often used interchangeably in casual conversation, creating confusion about their actual relationship. This conflation obscures important distinctions that matter for understanding both the capabilities and limitations of modern language technologies. Natural Language Processing represents a broad field of study focused on enabling computers to understand, … Read more

Top 5 Large Language Models

November 16, 2025 by Peter Song

The landscape of large language models has evolved dramatically, with several sophisticated models now competing for dominance across different use cases, performance benchmarks, and accessibility options. Choosing the right LLM for your needs requires understanding not just raw capabilities but also practical considerations like cost, availability, specialized strengths, and integration complexity. The top models excel … Read more