How to Set Up Open WebUI with Ollama (Complete Guide)

A complete setup guide for Open WebUI with Ollama: installing via Docker with a single run command, pip installation without Docker, connecting to Ollama and troubleshooting disconnection issues, switching and pulling models from the UI, setting system prompts and custom personas, uploading documents for local RAG, accessing Open WebUI from other devices on your network, keeping conversations across updates, and the most useful settings to configure for a single-user local setup.

How to Write Triton Kernels for PyTorch

A practical guide to writing GPU kernels with OpenAI Triton: the tile-based programming model, a minimal working kernel, fused softmax, autotuning block sizes, 2D matrix kernels, autograd integration, debugging with the interpreter, and performance profiling against the memory roofline.

Best Coding LLMs to Run Locally in 2026

A practical guide to the best coding LLMs for local use in 2026: Qwen2.5-Coder 7B, 14B and 32B as the overall best across VRAM tiers, DeepSeek-Coder-V2 as a fast MoE option, Codestral 22B for fill-in-the-middle completions, hardware requirements at each tier from 8GB to 24GB VRAM, setting up Continue in VS Code with a local Ollama model, recommended Modelfiles with coding-optimised parameters, and how to choose the right model for your hardware and workflow.

How to Use Ollama Modelfile: Custom Models, System Prompts, and Parameters

A practical guide to Ollama Modelfiles: creating custom named models with persistent system prompts, setting temperature, context window, stop sequences and other inference parameters, four ready-to-use Modelfile templates for code review, JSON output, document summarisation, and low-RAM setups, using custom models through the Ollama REST API, seeding few-shot examples with MESSAGE, exporting and sharing Modelfiles with teammates, and the most common gotchas around context window memory, stop tokens, and reproducibility.