How to Draft Emails with a Local LLM

A practical guide to email drafting with a local LLM: a basic draft_email function with tone and length parameters, subject line generation, drafting replies to existing emails with intent specification, batch personalisation of email templates for multiple recipients with company and role context, and a reference of effective tone and style prompt parameters for reliably varied email outputs.

How to Filter and Deduplicate Pretraining Data for LLMs

A practical guide to LLM pretraining data pipelines: language identification with FastText, heuristic quality filtering using character-to-word ratios, symbol ratios, and repeated line detection, perplexity-based filtering with KenLM to catch templated and garbled text, MinHash LSH deduplication with datasketch, exact substring deduplication with suffix arrays, building a full pipeline with HuggingFace datatrove including Gopher and C4 quality filters, training a fastText classifier for quality scoring, and balancing the data mix across web, books, and code sources.

How to Stream Ollama Responses over WebSockets

A complete guide to streaming Ollama token output to browser clients via WebSocket: why WebSockets suit interactive AI chat better than SSE, a FastAPI WebSocket endpoint using run_in_executor for sync Ollama, a fully async version using httpx streaming, a vanilla JS browser client with real-time token display and stop button, and a multi-client broadcast connection manager for shared AI sessions.

Model Merging: Weight Averaging, TIES, and DARE Explained

A practical guide to model merging for ML engineers: how linear weight averaging and model soups work, computing and applying task vectors, TIES merging with trimming and sign election to resolve conflicts between task vectors, DARE with random dropout and rescaling before merging, combining DARE with TIES for large task vectors, using mergekit with a YAML config for production merges, SLERP for smoother two-model interpolation, and a decision guide for choosing between merging methods based on task overlap and fine-tuning intensity.

How to Use Ollama with PHP and Laravel

A complete guide to integrating Ollama in PHP and Laravel: using openai-php/client pointed at the Ollama v1 endpoint, a Laravel AiService class for summarisation and classification, streaming SSE responses with response()->stream(), generating embeddings, and a Laravel Queue job with timeout and retry configuration for background AI processing.

LLM Routing in Production: Balancing Cost and Quality with Model Cascades

A practical guide to LLM routing for ML engineers: building an embedding-based classifier router that adds under 15ms latency, generating training labels via model-agreement scoring with an LLM judge, implementing a cascade router that tries the cheap model first and escalates on low confidence, calibrating the routing threshold empirically from a quality-cost tradeoff curve, tracking cost savings versus counterfactual all-expensive routing, and deciding between a trained classifier versus cascade based on query distribution stability and labelling budget.

How to Use Ollama with Elixir and Phoenix

A complete guide to Ollama integration in Elixir and Phoenix: a clean Ollama client module using Req with pattern-matched responses, single and multi-turn chat, embedding generation, a Phoenix controller for AI endpoints, real-time streaming tokens with Phoenix LiveView using Req streaming callbacks, and background AI jobs with Oban including automatic retries.

Curriculum Learning: How to Train Models on Easy Examples First

A practical guide to curriculum learning for ML engineers: implementing a CurriculumSampler with linear competence scheduling, scoring example difficulty by model loss, text length, or label noise, a full training loop that advances the curriculum each epoch, self-paced learning with a dynamic loss threshold that adapts to model capability, difficulty scoring for LLM instruction fine-tuning, and the specific settings where curriculum learning provides consistent benefit versus where it adds complexity without gain.

How to Use Ollama with Ruby and Rails

A complete guide to integrating Ollama in Ruby and Rails: the ruby-ollama gem for chat, generate, and embeddings, a Faraday-based OllamaClient for direct HTTP calls, a Rails service object with summarise and classify methods, and Sidekiq background jobs for async document processing with retry handling.

How to Use PyTorch Lightning Fabric for Distributed Training

A practical guide to PyTorch Lightning Fabric for ML engineers: how Fabric wraps DDP and FSDP boilerplate while keeping your training loop intact, migrating a plain PyTorch loop in 6 line changes, switching between single-GPU, DDP, FSDP, and multi-node strategies by changing one argument, gradient accumulation with no_backward_sync, gradient clipping with mixed precision handled automatically, distributed-safe checkpointing with fabric.save and fabric.load, and aggregating metrics across ranks with all_reduce.