Tabby: The Self-Hosted Coding Assistant That Beats Copilot for Completions

A complete guide to Tabby, the self-hosted coding assistant built specifically for inline tab completions: how it differs from Continue and why dedicated completion models are faster and more accurate, Docker installation with NVIDIA GPU, choosing between StarCoder2 and DeepSeek-Coder models, VS Code, Neovim and JetBrains plugin setup, Docker Compose for persistent deployment, repository indexing for codebase-aware completions, monitoring acceptance rates in the built-in dashboard, and when to use Tabby versus Continue.

Ollama REST API Reference: Every Endpoint with Examples

A complete reference for the Ollama REST API: listing and managing models with /api/tags, /api/pull, /api/delete, and /api/copy, running chat completions and raw text generation with /api/chat and /api/generate, generating embeddings with /api/embeddings, inspecting running models and VRAM usage with /api/ps, getting model details and Modelfiles with /api/show, creating custom models programmatically with /api/create, all key inference options, parsing the streaming response format including performance statistics, and health check patterns.

Mistral Nemo 12B: What It Is and When to Use It Locally

A practical guide to Mistral Nemo 12B for local use: what makes it distinctive including a native 128K context window, strong multilingual support, and the efficient Tekken tokeniser, hardware requirements at each quantisation level, running it with Ollama and configuring a 32K context Modelfile, benchmarking it against Llama 3.2 8B for your specific tasks, the four scenarios where its VRAM premium is justified, and how it compares to Mistral 7B and Mixtral 8x7B.

How to Fine-Tune Embedding Models with Contrastive Learning

A practical guide to fine-tuning embedding models with contrastive learning: the Multiple Negatives Ranking Loss objective, building training datasets with synthetic query generation, hard negative mining, Matryoshka Representation Learning for flexible dimensions, evaluation with InformationRetrievalEvaluator, and when domain adaptation is actually worth the engineering cost.

How to Use Local AI with Obsidian: Smart Notes with Ollama

A practical guide to connecting local LLMs to Obsidian for AI-powered note-taking: setting up Smart Connections with nomic-embed-text embeddings to search and chat across your entire vault, using the Ollama plugin for in-note summarisation and writing assistance, the BMO Chatbot for a persistent sidebar assistant, practical workflows for daily note processing, research synthesis, idea generation, and meeting note cleanup, and context window configuration for long notes.

How to Use torch.profiler to Find Training Bottlenecks

A practical guide to using torch.profiler for GPU bottleneck analysis: profiler setup and schedule configuration, reading TensorBoard traces, identifying DataLoader gaps and CPU-GPU sync stalls, memory profiling, CUDA kernel occupancy, distributed training profiling, and a step-by-step workflow for translating profiler output into concrete training optimizations.

How to Use Ollama Vision Models for Local Image Analysis

A practical guide to running vision models locally with Ollama: available models including LLaVA, moondream2, Qwen2.5-VL and Gemma 3, using vision models from the CLI with image flags, calling them via the Ollama Python library and the OpenAI-compatible API with base64-encoded images, batch processing a folder of images, and practical use cases including screenshot OCR, chart analysis, document data extraction, and inventory cataloguing.