ML Journey

How to Build a Semantic Search Engine with Vector Databases

October 1, 2025 by Peter Song

Traditional keyword-based search engines often fall short when users search for concepts rather than exact terms. If someone searches for “canine companions” in a pet database, they might miss results about “dogs” entirely. This is where semantic search engines powered by vector databases revolutionize information retrieval by understanding meaning rather than just matching words. Semantic … Read more

How to Optimize Pandas Performance on Large Datasets

October 1, 2025 by Peter Song

Working with large datasets in pandas can quickly become a performance bottleneck if not handled properly. As data volumes continue to grow, the difference between optimized and unoptimized pandas code can mean the difference between analysis that completes in minutes versus hours. This comprehensive guide explores proven strategies to dramatically improve pandas performance when dealing … Read more

How Does LoRA Work in LLMs

October 1, 2025 by Peter Song

The democratization of large language models faces a significant challenge: fine-tuning these massive neural networks requires enormous computational resources and memory that most organizations and individual researchers simply don’t have access to. Enter LoRA (Low-Rank Adaptation), an elegant solution that has revolutionized how we adapt pre-trained language models for specific tasks. This technique allows you … Read more

How to Handle Long Context Windows in LLMs

October 1, 2025 by Peter Song

Large Language Models have evolved dramatically over the past few years, with one of the most significant advancements being the expansion of context windows. Modern LLMs can now process tens of thousands or even hundreds of thousands of tokens in a single conversation, opening up unprecedented possibilities for complex tasks. However, with great power comes … Read more

Reducing Bias in LLMs Training Data

October 1, 2025 by Peter Song

Large language models have become integral to countless applications, from hiring tools and medical diagnostics to content generation and customer service. Yet these powerful systems inherit and often amplify the biases present in their training data, leading to outputs that can perpetuate stereotypes, discrimination, and unfair treatment. A model trained on biased data doesn’t just … Read more