How to Use Ollama with Elixir and Phoenix

A complete guide to Ollama integration in Elixir and Phoenix: a clean Ollama client module using Req with pattern-matched responses, single and multi-turn chat, embedding generation, a Phoenix controller for AI endpoints, real-time streaming tokens with Phoenix LiveView using Req streaming callbacks, and background AI jobs with Oban including automatic retries.

Curriculum Learning: How to Train Models on Easy Examples First

A practical guide to curriculum learning for ML engineers: implementing a CurriculumSampler with linear competence scheduling, scoring example difficulty by model loss, text length, or label noise, a full training loop that advances the curriculum each epoch, self-paced learning with a dynamic loss threshold that adapts to model capability, difficulty scoring for LLM instruction fine-tuning, and the specific settings where curriculum learning provides consistent benefit versus where it adds complexity without gain.

How to Use Ollama with Ruby and Rails

A complete guide to integrating Ollama in Ruby and Rails: the ruby-ollama gem for chat, generate, and embeddings, a Faraday-based OllamaClient for direct HTTP calls, a Rails service object with summarise and classify methods, and Sidekiq background jobs for async document processing with retry handling.

How to Use PyTorch Lightning Fabric for Distributed Training

A practical guide to PyTorch Lightning Fabric for ML engineers: how Fabric wraps DDP and FSDP boilerplate while keeping your training loop intact, migrating a plain PyTorch loop in 6 line changes, switching between single-GPU, DDP, FSDP, and multi-node strategies by changing one argument, gradient accumulation with no_backward_sync, gradient clipping with mixed precision handled automatically, distributed-safe checkpointing with fabric.save and fabric.load, and aggregating metrics across ranks with all_reduce.

How to Deploy Ollama on Kubernetes

A production guide to running Ollama on Kubernetes: a complete Deployment manifest with GPU resource limits, PersistentVolumeClaim for model storage, and ClusterIP Service, pulling models via kubectl exec, an init container pattern that pre-pulls models before the main container starts, exposing via NGINX Ingress with long proxy timeouts, a CPU-only configuration for clusters without GPUs, and deploying via the otwld/ollama-helm community Helm chart.

Model Calibration: Temperature Scaling, Platt Scaling, and ECE in Practice

A practical guide to model calibration for ML engineers: measuring calibration with ECE and reliability diagrams, fixing overconfidence with temperature scaling using LBFGS on a held-out validation set, Platt scaling for binary classification, non-parametric alternatives with isotonic regression and histogram binning, calibrating LLMs on multiple-choice benchmarks via log-likelihood scoring, and recalibrating after every fine-tuning step as part of the model release checklist.

How to Add Tracing to Ollama with OpenTelemetry

A practical guide to instrumenting Ollama calls with OpenTelemetry: setting up the OTel SDK with OTLP exporter and Jaeger, a traced_chat wrapper function that records model name, token counts, latency, and tokens per second as span attributes, tracing a full RAG pipeline with nested spans for embedding, retrieval, and generation, viewing traces in the Jaeger UI, and exporting to cloud backends including Honeycomb and Grafana Tempo.

LLM Watermarking: How It Works and What It Means for Production Systems

A practical guide to LLM watermarking for ML engineers: how green/red list watermarking biases token sampling at decoding time, PyTorch implementation of the Kirchenbauer et al. scheme with detection via z-score, integrating the logit processor into a generation loop, the main attack surfaces (paraphrasing, substitution, translation), quality vs delta tradeoffs for code versus prose tasks, and the regulatory context under the EU AI Act and voluntary provider commitments.

How to Use Ollama with Celery for Async AI Tasks

A complete guide to using Celery and Redis with Ollama for async AI background processing: why slow LLM inference belongs in a task queue, defining Celery tasks for text summarisation and document classification with Pydantic structured output, starting workers with configurable concurrency, a FastAPI integration with immediate task ID response and polling endpoint, and batch processing with celery group for parallel document processing.

How to Use einops for Cleaner Tensor Operations in PyTorch

A practical guide to einops for ML engineers: rearrange for readable dimension splitting, merging, and transposing with named axes, reduce for explicit pooling over named dimensions, repeat as a drop-in for unsqueeze/expand chains, einsum with readable named contractions, the Rearrange nn.Module layer for use in Sequential and torch.compile, and a complete ViT patch embedding implementation using einops throughout.