embedding Archives - ML Journey

Optimizing Embedding Generation Throughput for Large Document Stores

December 27, 2025 by Peter Song

When you’re sitting on a corpus of 10 million documents and need to generate embeddings for vector search, semantic analysis, or RAG systems, raw throughput becomes your primary concern. A naive implementation processing documents one at a time might take weeks to complete, consuming compute resources inefficiently and delaying your project timeline. Optimizing embedding generation … Read more

Vector Embeddings Explained: How They Power Recommendations and Search

November 21, 2025 by Peter Song

When Netflix suggests a movie you’ll love, when Spotify creates a personalized playlist, or when Google returns exactly the document you needed despite your imprecise query, vector embeddings are quietly working behind the scenes. This technology has become fundamental to modern AI applications, enabling machines to understand meaning rather than just matching keywords. Yet for … Read more

Understanding Tokenization and Embeddings in LLMs

November 13, 2025 by Peter Song

Large language models have transformed how we interact with AI, but their impressive capabilities rest on two fundamental processes that most users never see: tokenization and embeddings. Understanding tokenization and embeddings in LLMs is essential for anyone working with these systems, whether you’re optimizing API costs, debugging unexpected behavior, or building applications that leverage language … Read more

Best Practices for Using Embeddings in Recommender Systems

September 8, 2025August 25, 2025 by Peter Song

Recommender systems have evolved dramatically over the past decade, transitioning from simple collaborative filtering approaches to sophisticated deep learning architectures that leverage embeddings to capture complex user-item relationships. Embeddings have become the cornerstone of modern recommendation engines, enabling systems to understand nuanced patterns in user behavior and item characteristics that traditional methods often miss. At … Read more

Visualize Word2Vec Embeddings with t-SNE

September 8, 2025August 4, 2025 by Peter Song

Word embeddings have revolutionized how we represent language in machine learning, and Word2Vec stands as one of the most influential techniques in this space. However, understanding these high-dimensional representations can be challenging without proper visualization tools. This is where t-SNE (t-Distributed Stochastic Neighbor Embedding) becomes invaluable, offering a powerful way to visualize word2vec embeddings in … Read more

How to Get Word Embeddings from Word2Vec: Step-by-Step Guide

July 4, 2025November 13, 2024 by Peter Song

Word embeddings are essential in Natural Language Processing (NLP) for transforming text into a form that machines can understand. Among the various methods for generating word embeddings, Word2Vec is one of the most popular, thanks to its ability to capture semantic relationships between words. Knowing how to obtain and use Word2Vec embeddings is a valuable … Read more

What is Embedding in Machine Learning?

July 4, 2025May 11, 2024 by Peter Song

In this article, we will aim to provide a comprehensive understanding of embedding in machine learning. It will cover the fundamental concepts of embedding, explore different types of embeddings such as categorical embedding and word embedding, discuss techniques for creating embeddings, and examine their applications across various domains. Furthermore, the article will address the challenges … Read more