Generative AI Archives - Page 16 of 70

How to Build a Telegram Bot with Ollama

May 13, 2026 by mljourney

A complete guide to building an Ollama-powered Telegram bot with python-telegram-bot: a basic bot that responds to messages with typing indicator, per-user conversation history with trimming, /start, /clear, and /model commands, restricting access to specific Telegram user IDs, and running the bot as a persistent systemd service alongside Ollama.

How to Use W&B Sweeps for Hyperparameter Search

May 13, 2026 by mljourney

A practical guide to W&B Sweeps for ML engineers: how the sweep controller and agent architecture works, configuring Bayesian vs random vs grid search with the right parameter distributions, writing sweep-compatible training scripts that read from wandb.config, running parallel agents across multiple GPUs and SLURM clusters, using Hyperband early termination to save compute, interpreting parallel coordinates plots, and avoiding common pitfalls like over-broad search spaces and treating the best sweep run as a final result without seed averaging.

How to Use Ollama with the Cursor Editor

May 12, 2026 by mljourney

A complete guide to using Ollama as the AI backend in the Cursor code editor: configuring the OpenAI-compatible endpoint in Cursor settings, choosing fast small models for Tab completion versus larger models for Chat, creating a Modelfile with 16K context for better code responses, using Cursor Chat and Cmd+K inline editing with a local model, performance tips including model preloading and code-specialised model selection, and privacy implications of the fully local setup for proprietary codebases.

How to Count Tokens and Estimate LLM Costs Before You Ship

May 12, 2026 by mljourney

A practical guide to LLM token counting and cost estimation for ML engineers: accurate token counting with tiktoken for OpenAI models and the Anthropic token counting API for Claude, building a multi-provider cost estimator with current pricing, pre-flight checks to catch context window overflows and budget breaches before API calls, and a production logging wrapper for per-request cost attribution and identifying expensive outlier requests.

How to Add Image Captioning to Your App with a Local LLM

May 11, 2026 by mljourney

A practical guide to local image captioning and visual AI with Ollama: pulling LLaVA, moondream, and Gemma 3 vision models, captioning images with base64 encoding, visual question answering for receipts and diagrams, batch processing a folder of images to CSV, extracting text from photos with OCR-style prompts, structured image classification with Pydantic and Literal types, and a comparison of LLaVA vs moondream vs Gemma 3 for different vision tasks.

Label Smoothing: When It Helps and When It Hurts

May 11, 2026 by mljourney

A practical guide to label smoothing for ML engineers: how soft targets prevent logit overconfidence, PyTorch implementation with nn.CrossEntropyLoss and a manual version for fine-grained control, the three settings where smoothing reliably helps (large-scale classification, seq2seq, small-data fine-tuning), why it actively hurts knowledge distillation, choosing smoothing values, and measuring calibration improvement with Expected Calibration Error.

How to Use Ollama with LangChain

May 10, 2026 by mljourney

A complete guide to using Ollama as the LangChain backend: installing langchain-ollama, using OllamaLLM and ChatOllama with system and human messages, building LCEL chains with prompt templates and StrOutputParser, a full RAG pipeline using OllamaEmbeddings with nomic-embed-text and Chroma vectorstore, adding conversation memory with InMemoryChatMessageHistory and RunnableWithMessageHistory, and creating a tool-using ReAct agent with LangGraph.

ColBERT and Late Interaction Retrieval: How It Works and When to Use It

May 10, 2026 by mljourney

A practical guide to ColBERT late interaction retrieval for ML engineers: how MaxSim scoring over per-token embeddings outperforms single-vector bi-encoders, using RAGatouille for indexing and search, two-stage retrieval with bi-encoder first stage plus ColBERT reranking, fine-tuning ColBERT on domain-specific query-document triples with RAGTrainer, and when to use bi-encoder vs ColBERT vs cross-encoder for different RAG pipeline architectures.

How to Compare Two Documents with a Local LLM

May 9, 2026 by mljourney

A practical guide to comparing documents with a local LLM using Ollama: a general compare_documents function with focus parameter, structured diff output using Pydantic with additions, removals, modifications, conflicts, and summary fields, a chunked comparison approach for long documents that exceed the context window, question-answering across two documents simultaneously, and specific use cases where local inference is essential including legal contracts, research papers, and policy documents.

Hard Negative Mining for Embedding Model Training

May 9, 2026 by mljourney

A practical guide to hard negative mining for ML engineers training embedding models: why random negatives produce weak gradient signal, BM25-mined hard negatives with rank_bm25, embedding-mined negatives with FAISS and sentence-transformers, cross-encoder filtering to identify the hardest candidates, training with MultipleNegativesRankingLoss, and iterative mining pipelines used by state-of-the-art models like E5 and BGE.