Peter Song, Author at ML Journey - Page 3 of 165

Best Python Libraries for Machine Learning

February 1, 2026 by Peter Song

Python has become the de facto language for machine learning, and for good reason. Its clean syntax, extensive ecosystem, and powerful libraries make it the top choice for data scientists, ML engineers, and researchers worldwide. Whether you’re building your first classification model or deploying sophisticated deep learning systems at scale, Python’s ML libraries provide the … Read more

How to Fix Jupyter Notebook Kernel Errors

February 1, 2026 by Peter Song

Few things frustrate data scientists and developers more than settling in for a productive coding session only to encounter the dreaded “Kernel Error” message in Jupyter Notebook. Your notebook won’t execute cells, or worse, it crashes mid-analysis after you’ve been working for hours. The kernel—the computational engine that executes your code—has failed, and your workflow … Read more

CPU vs GPU vs TPU: When Each Actually Makes Sense

January 31, 2026 by Peter Song

The machine learning hardware landscape offers three major options: CPUs, GPUs, and TPUs. Marketing materials suggest each is revolutionary, benchmarks show all three crushing specific workloads, and confused developers end up choosing hardware based on what’s available rather than what’s optimal. A startup spends $50,000 on TPUs for a model that would run faster on … Read more

How Agents Decide What Tool to Call

January 31, 2026 by Peter Song

The promise of AI agents is autonomy—systems that reason about tasks, select appropriate tools, and execute multi-step workflows without constant human guidance. But watch an agent in action and you’ll often see baffling tool selection: calling a web search when a calculator would work, invoking database queries for information in recent conversation, or repeatedly choosing … Read more

Managing Python Dependencies for ML Projects

January 31, 2026 by Peter Song

Machine learning projects fail more often from dependency conflicts than from model performance issues. A colleague’s training script crashes with cryptic NumPy errors. Your production deployment breaks because PyTorch installed a different CUDA version. A model that worked perfectly last month refuses to train after updating a single package. These scenarios plague ML teams daily … Read more

Setting Up a Reproducible ML Dev Environment

January 31, 2026 by Peter Song

“It works on my machine” is the death knell of collaborative machine learning projects. A model that trains perfectly on your laptop fails mysteriously on a colleague’s workstation. Results you achieved last month become impossible to replicate this week. Production deployment requires weeks of debugging environment differences. These scenarios repeat endlessly in ML teams lacking … Read more

How to Evaluate Agentic AI Systems in Production

January 31, 2026 by Peter Song

The landscape of artificial intelligence has evolved dramatically from simple prediction models to sophisticated agentic systems that can perceive their environment, make decisions, and take actions autonomously. Unlike traditional AI systems that merely respond to inputs, agentic AI actively pursues goals, adapts to changing conditions, and operates with varying degrees of independence. As organizations increasingly … Read more

Why Stateless Agents Don’t Work

January 31, 2026 by Peter Song

The appeal of stateless agent architectures is undeniable. No state management complexity, no memory overhead, no synchronization issues, perfect horizontal scaling. Each request arrives, the agent reasons, executes actions, returns results, and forgets everything. This simplicity seduces developers building AI agent systems, particularly those experienced with stateless web services where this pattern succeeds brilliantly. Yet … Read more

Designing Local LLM Systems for Long-Running Tasks

January 31, 2026 by Peter Song

Local LLM applications face unique challenges when tasks extend beyond simple queries and responses. Analyzing hundreds of documents, generating comprehensive reports, processing entire codebases, or conducting multi-hour research requires architectures fundamentally different from chat interfaces. These long-running tasks introduce concerns about reliability, progress tracking, resource management, and graceful failure handling that quick queries never encounter. … Read more

How Local LLM Apps Handle Concurrency and Scaling

January 31, 2026 by Peter Song

Running large language models locally creates unique challenges that cloud-based APIs abstract away. When you call OpenAI’s API, their infrastructure handles thousands of concurrent requests across distributed servers. But when you’re running Llama or Mistral on your own hardware, every concurrent user competes for the same GPU, the same memory, and the same processing power. … Read more