mljourney, Author at ML Journey

Label Smoothing: When It Helps and When It Hurts

May 11, 2026 by mljourney

A practical guide to label smoothing for ML engineers: how soft targets prevent logit overconfidence, PyTorch implementation with nn.CrossEntropyLoss and a manual version for fine-grained control, the three settings where smoothing reliably helps (large-scale classification, seq2seq, small-data fine-tuning), why it actively hurts knowledge distillation, choosing smoothing values, and measuring calibration improvement with Expected Calibration Error.

How to Use Ollama with LangChain

May 10, 2026 by mljourney

A complete guide to using Ollama as the LangChain backend: installing langchain-ollama, using OllamaLLM and ChatOllama with system and human messages, building LCEL chains with prompt templates and StrOutputParser, a full RAG pipeline using OllamaEmbeddings with nomic-embed-text and Chroma vectorstore, adding conversation memory with InMemoryChatMessageHistory and RunnableWithMessageHistory, and creating a tool-using ReAct agent with LangGraph.

ColBERT and Late Interaction Retrieval: How It Works and When to Use It

May 10, 2026 by mljourney

A practical guide to ColBERT late interaction retrieval for ML engineers: how MaxSim scoring over per-token embeddings outperforms single-vector bi-encoders, using RAGatouille for indexing and search, two-stage retrieval with bi-encoder first stage plus ColBERT reranking, fine-tuning ColBERT on domain-specific query-document triples with RAGTrainer, and when to use bi-encoder vs ColBERT vs cross-encoder for different RAG pipeline architectures.

How to Compare Two Documents with a Local LLM

May 9, 2026 by mljourney

A practical guide to comparing documents with a local LLM using Ollama: a general compare_documents function with focus parameter, structured diff output using Pydantic with additions, removals, modifications, conflicts, and summary fields, a chunked comparison approach for long documents that exceed the context window, question-answering across two documents simultaneously, and specific use cases where local inference is essential including legal contracts, research papers, and policy documents.

Hard Negative Mining for Embedding Model Training

May 9, 2026 by mljourney

A practical guide to hard negative mining for ML engineers training embedding models: why random negatives produce weak gradient signal, BM25-mined hard negatives with rank_bm25, embedding-mined negatives with FAISS and sentence-transformers, cross-encoder filtering to identify the hardest candidates, training with MultipleNegativesRankingLoss, and iterative mining pipelines used by state-of-the-art models like E5 and BGE.

How to Use Ollama with Go

May 8, 2026 by mljourney

A complete guide to the official Ollama Go library: installing with go get, streaming chat with the callback handler, accumulating a non-streaming response, raw generate completion, generating embeddings with nomic-embed-text, listing and pulling models with progress callbacks, connecting to a remote Ollama server with a custom client URL, and building a full multi-turn CLI chatbot with conversation history.

How to Use HuggingFace Fast Tokenizers Efficiently

May 8, 2026 by mljourney

A practical guide to HuggingFace fast tokenizers for ML engineers: how Rust-backed fast tokenizers differ from slow Python tokenizers, using offset mappings for NER and QA span alignment, high-throughput batched tokenisation with datasets.map and multiprocessing, sliding window tokenisation for long documents with stride and overflow, training a custom BPE vocabulary with the tokenizers library, and debugging gotchas around special tokens and sequence pair handling.

How to Use Local AI with Obsidian: Smart Notes Without the Cloud

May 7, 2026 by mljourney

A guide to connecting Ollama with Obsidian for fully local AI-assisted note-taking: the Ollama community plugin for per-note summarisation and action item extraction, Smart Connections plugin for semantic indexing of the entire vault with nomic-embed-text and vault-wide RAG chat, Text Generator plugin via the OpenAI-compatible endpoint, a practical meeting notes workflow, building a queryable personal knowledge base, hardware recommendations, and getting started with just two ollama pulls.

IA3 vs LoRA: Choosing a Parameter-Efficient Fine-Tuning Method

May 7, 2026 by mljourney

A practical comparison of IA3 and LoRA for ML engineers: how IA3 activation scaling works versus LoRA weight updates, when each method wins (data volume, task type, adapter size), implementing IA3 with HuggingFace PEFT for classification and causal LM tasks, combining IA3 with 4-bit quantisation on consumer GPUs, and a decision framework for choosing between PEFT methods in production fine-tuning projects.

How to Use Ollama with JavaScript and Node.js

May 6, 2026 by mljourney

A complete guide to the official Ollama npm package in Node.js: installing with npm/yarn/bun, generate and chat with stream:false and stream:true, multi-turn CLI chatbot with readline, generating embeddings and computing cosine similarity, model management including pull with progress, delete, and ps, connecting to a remote Ollama server with a custom client, structured output using Zod schema passed directly to the format parameter, and image input for vision models.