Peter Song, Author at ML Journey

Snowflake vs Redshift: Comprehensive Comparison for Cloud Data Warehousing

November 21, 2025 by Peter Song

Choosing the right cloud data warehouse can make or break your organization’s analytics strategy. Two platforms dominate this space: Snowflake and Amazon Redshift. Both promise scalability, performance, and the ability to handle massive datasets, yet they take fundamentally different approaches to architecture, pricing, and operations. Understanding these differences is critical for making an informed decision … Read more

What Are Unigrams and Bigrams?

November 21, 2025 by Peter Song

In the world of natural language processing and text analysis, understanding how words relate to each other is fundamental. Whether you’re building a search engine, analyzing sentiment in customer reviews, or developing a language model, you need ways to break down and analyze text systematically. This is where unigrams and bigrams—collectively part of a concept … Read more

Is PyTorch Good for Deep Learning?

November 21, 2025 by Peter Song

Deep learning has transformed the technology landscape, powering everything from voice assistants to autonomous vehicles. At the heart of this revolution are frameworks that make building and training neural networks accessible to researchers and developers. Among these tools, PyTorch has emerged as one of the most popular choices. But is PyTorch truly good for deep … Read more

Partitioning Strategies in Data Lakes: When and Why They Matter

November 21, 2025 by Peter Song

Data lakes have become the backbone of modern data architectures, storing petabytes of raw, semi-structured, and structured data in their native formats. Yet as these repositories grow exponentially, a critical challenge emerges: how do you efficiently query and analyze massive datasets without scanning through terabytes of irrelevant information? This is where partitioning strategies become not … Read more

What is Responsible AI & Trustworthy AI?

November 21, 2025 by Peter Song

Artificial intelligence has become deeply woven into the fabric of our daily lives, from the recommendations we receive on streaming platforms to the medical diagnoses that inform our healthcare decisions. Yet as AI systems grow more powerful and pervasive, a critical question emerges: how do we ensure these technologies serve humanity’s best interests while minimizing … Read more

Jupyter Notebook Shortcuts Every Data Engineer Should Know

November 21, 2025 by Peter Song

Data engineers spend countless hours in Jupyter Notebook—exploring data structures, prototyping ETL pipelines, debugging transformations, and documenting workflows. Yet most operate far below their potential efficiency, repeatedly reaching for the mouse to perform actions that could be accomplished with simple keystrokes. Mastering Jupyter shortcuts isn’t about memorizing obscure commands; it’s about internalizing the patterns that … Read more

Online vs Offline Feature Drift: Silent Killer of ML Model Performance

November 20, 2025 by Peter Song

Machine learning models fail in production not because they were poorly trained, but because the world they operate in changes while they remain static. Feature drift—the divergence between training data distributions and production data distributions—manifests differently depending on whether features are computed offline during training or online during inference. Understanding this distinction is critical for … Read more

Exploring AI Models in Jupyter Notebook: From ChatGPT to LangChain

November 18, 2025 by Peter Song

The convergence of interactive computing environments and advanced AI models has opened remarkable possibilities for developers, researchers, and data scientists. Jupyter Notebook, long celebrated for its role in data analysis and scientific computing, has evolved into a powerful playground for experimenting with cutting-edge language models. Whether you’re building conversational AI applications, prototyping RAG systems, or … Read more

The Future of MCP in OpenAI Ecosystems

November 17, 2025 by Peter Song

In March 2025, OpenAI officially adopted the Model Context Protocol (MCP), integrating the standard across its products including the ChatGPT desktop app, OpenAI’s Agents SDK, and the Responses API. This decision marks a watershed moment in the artificial intelligence industry—the world’s leading AI company embracing an open standard created by its primary competitor, Anthropic. The … Read more

Responsible AI Practices for LLM Projects

November 16, 2025 by Peter Song

Large language models have transitioned from research curiosities to production systems affecting millions of users across applications ranging from customer service chatbots to code generation tools to medical information systems. This rapid deployment creates urgent responsibility for practitioners to implement safeguards preventing harm while maximizing benefits, yet many teams lack concrete frameworks for operationalizing ethical … Read more