ML Journey

How to Evaluate a RAG Pipeline: Metrics, Tools, and What to Fix

April 18, 2026 by mljourney

A practical guide to RAG evaluation for ML engineers: decomposing retrieval and generation quality, RAGAS metrics including context precision, context recall, faithfulness and answer relevancy, diagnosing low retrieval recall with chunking and re-ranking fixes, diagnosing generation faithfulness failures, and building an automated production eval pipeline with online and offline metrics.

Continue vs GitHub Copilot: Which AI Coding Assistant Is Better?

April 17, 2026 by mljourney

A practical comparison of Continue and GitHub Copilot for VS Code developers: setup requirements and time to first completion, completion quality for everyday tasks vs complex problems, chat features including Continue’s @codebase semantic search across your entire project vs Copilot’s open-file context, privacy implications of cloud vs local processing, cost breakdown for individuals and teams, the hybrid Continue+cloud API approach, IDE support across editors, and guidance on which tool to choose based on your specific priorities.

Transformer Models for Time Series Forecasting: TFT, PatchTST, and iTransformer

April 17, 2026 by mljourney

A practical guide to transformer-based time series forecasting: Temporal Fusion Transformer for multivariate problems with rich covariates and probabilistic output, PatchTST for long-horizon univariate forecasting via patch tokenisation, iTransformer for dense multivariate problems via inverted attention, when to use each, and why you should always benchmark against simple baselines first.

Gemma 3: Google’s Multimodal Local LLM Explained

April 16, 2026 by mljourney

A practical guide to running Google’s Gemma 3 locally with Ollama: the 1B, 4B, 12B, and 27B variants and their VRAM requirements, native multimodal image analysis at every size above 1B, CLI and Python usage including image inputs, how Gemma 3 4B compares to Llama 3.2 8B on reasoning tasks, the 12B as a multimodal sweet spot, 27B for frontier-class local quality on Apple Silicon, configuring a 32K context Modelfile, strong multilingual support, and how to choose between Gemma 3 and other local model families.

How to Build LLM Guardrails for Production Applications

April 16, 2026 by mljourney

A practical guide to LLM guardrails for production: input guardrails for PII detection and prompt injection blocking, output guardrails for policy compliance and schema validation, Guardrails AI and NeMo Guardrails frameworks, latency-aware architecture with layered synchronous and async checks, and tool call validation for agentic systems.

How to Summarise YouTube Videos Locally with Ollama

April 15, 2026 by mljourney

A practical guide to summarising YouTube videos locally with Ollama and the YouTube Transcript API: fetching transcripts without an API key, basic summarisation in bullet points, extracting video IDs from any URL format, six summary formats including TL;DR, study notes, and Q&A generation, handling long videos with chunk summarisation, a complete command-line tool with argparse and streaming output, working around videos without captions using Whisper, and choosing the right model for different content types.

Feature Engineering for Tabular Data: Techniques That Actually Matter in Production

April 15, 2026 by mljourney

A practical guide to feature engineering for tabular ML in production: numerical transforms and when they matter, target encoding without leakage, interaction and ratio features, cyclical datetime encoding, rolling aggregation features with correct temporal windowing, and building reproducible sklearn pipelines that produce identical outputs at training and serving time.

How to Use Ollama with JavaScript and Node.js

April 14, 2026 by mljourney

A complete guide to using Ollama from JavaScript and Node.js: installing the official ollama npm package, chat completions with system prompts and options, streaming responses with async iterators, text generation for classification, generating embeddings with cosine similarity, managing models programmatically, building a streaming Express SSE endpoint, consuming the stream from browser JavaScript, connecting to a remote Ollama host, multi-turn conversation with history, TypeScript types, and when to use the native JS library versus the OpenAI SDK.

How to Use Multi-Armed Bandits for A/B Testing and Online Decision Making

April 14, 2026 by mljourney

A practical guide to multi-armed bandit algorithms for ML engineers: epsilon-greedy, Thompson Sampling with Beta-Binomial conjugate priors, UCB1 with regret bounds, contextual bandits with LinUCB, when to use bandits vs A/B tests, and implementing persistent bandit state in production with Redis.

Ollama Keep-Alive and Model Preloading: Eliminate Cold Start Latency

April 13, 2026 by mljourney

A practical guide to eliminating Ollama cold-start latency: how keep-alive works and why it matters, setting keep_alive per-request to -1 for permanent loading or 0 for immediate unloading, setting OLLAMA_KEEP_ALIVE globally, pre-loading models at application startup with a minimal dummy request, running multiple models simultaneously with OLLAMA_MAX_LOADED_MODELS, inspecting loaded models and VRAM usage via /api/ps, manually unloading models to free VRAM, and recommended settings for interactive chat, batch processing, multi-model RAG, and low-VRAM machines.