Tabby: The Self-Hosted Coding Assistant

Tabby is an open-source, self-hosted coding assistant that runs entirely on your own hardware. It provides the same inline code completion experience as GitHub Copilot — triggered as you type, accepting suggestions with Tab — but with a local LLM serving the completions and all code staying on your machine. This guide covers setting up Tabby with Ollama as the backend, integrating it with VS Code and JetBrains IDEs, and configuring it for your hardware.

What Tabby Does

Tabby serves code completions via an HTTP API. Your IDE extension connects to it, sends the code context around your cursor, and receives a completion suggestion in response. The completions appear as ghost text inline in your editor, identical to Copilot’s UX. Unlike Copilot, the model runs locally — no code is sent to GitHub’s servers, no subscription is required, and the completions work offline. Tabby also supports repository-level context indexing, allowing it to provide completions informed by your entire codebase rather than just the current file.

Installation

# macOS
brew install tabbyml/tabby/tabby

# Linux (pre-built binary)
curl -fsSL https://github.com/TabbyML/tabby/releases/latest/download/tabby_x86_64-manylinux2014.tar.gz | tar xz
sudo mv tabby /usr/local/bin/

# Or via Docker
docker pull tabbyml/tabby

# Verify installation
tabby --version

Running Tabby with a Local Model

# Run with a built-in code model (Tabby downloads it automatically)
tabby serve --model TabbyML/DeepseekCoder-1.3B

# Or for a larger model on better hardware
tabby serve --model TabbyML/CodeLlama-7B

# With GPU (NVIDIA)
tabby serve --model TabbyML/DeepseekCoder-1.3B --device cuda

# With GPU (Apple Silicon)
tabby serve --model TabbyML/DeepseekCoder-1.3B --device metal

Tabby serves on port 8080 by default. The admin UI is at http://localhost:8080. First run downloads the model weights automatically — DeepseekCoder 1.3B is about 1GB and runs well on CPU for fast completions.

VS Code Integration

  1. Install the Tabby VS Code extension from the marketplace (search “Tabby”)
  2. Open VS Code settings and search for “Tabby”
  3. Set the server endpoint to http://localhost:8080
  4. Generate an API token in the Tabby admin UI under Settings → API Tokens
  5. Enter the token in VS Code settings

After connecting, inline completions appear automatically as you type. Press Tab to accept, Escape to dismiss, or Alt+] / Alt+[ to cycle through alternatives if multiple are offered.

JetBrains Integration

Install the Tabby plugin from JetBrains Marketplace (File → Settings → Plugins → search “Tabby”). Configuration is the same as VS Code — set the server endpoint and API token. Completions work in all JetBrains IDEs: IntelliJ, PyCharm, WebStorm, GoLand, and others.

Model Selection Guide

Tabby’s built-in model library includes models specifically trained for code completion. For production use, the recommended models by hardware tier:

  • CPU only (8GB RAM): TabbyML/DeepseekCoder-1.3B — fast completions, reasonable quality
  • GPU 4–8GB VRAM: TabbyML/CodeLlama-7B or TabbyML/DeepseekCoder-6.7B — high quality completions
  • GPU 16GB+ VRAM: TabbyML/CodeLlama-13B — best quality for complex completions

Smaller models have lower latency, which matters more for inline completions than for chat — you want suggestions appearing within 500ms or they disrupt typing flow. On CPU, the 1.3B model typically responds in 200–400ms. The 7B model on a good GPU responds in 100–200ms.

Repository Context Indexing

# Index a local repository for context-aware completions
tabby scheduler --now

# Or configure in tabby's config file (~/.tabby/config.toml)
# [repositories]
# [[repositories.local]]
# dir = "/path/to/your/project"

With repository context enabled, Tabby retrieves relevant code snippets from your codebase when generating completions. This significantly improves completion quality for project-specific patterns, internal APIs, and custom abstractions — the model can suggest completions that use your actual function names and patterns rather than generic code.

Running as a Service

# systemd service (Linux)
sudo nano /etc/systemd/system/tabby.service

# [Unit]
# Description=Tabby Coding Assistant
# After=network.target
# 
# [Service]
# ExecStart=/usr/local/bin/tabby serve --model TabbyML/DeepseekCoder-1.3B --device cpu
# Restart=always
# User=YOUR_USER
# 
# [Install]
# WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable tabby
sudo systemctl start tabby

Tabby vs GitHub Copilot: The Real Comparison

The honest comparison between Tabby and GitHub Copilot comes down to privacy and cost versus quality and convenience. Copilot has the advantage of being trained on vastly more code data, including private repository patterns filtered through GitHub’s training pipeline, and the infrastructure behind it provides extremely low latency completions from large models. Tabby running on consumer hardware with a 1–7B model will produce lower quality completions than Copilot in most cases. The question is whether the trade-off is worth it for your situation.

For developers working with proprietary codebases, the privacy argument is compelling. With Copilot, code context from your editor is sent to GitHub’s servers for each completion request — that means proprietary algorithms, unreleased features, security-sensitive code, and internal API designs are transmitted to a third party. With Tabby, completions happen entirely locally; no code leaves your machine. For companies with strict data handling requirements, security policies, or air-gapped development environments, local completion is not just preferable but necessary. Tabby provides a path to AI-assisted coding in those environments that Copilot cannot match.

The cost comparison is straightforward over a multi-year horizon. Copilot runs approximately $100–200 per developer per year for individual and team tiers. A developer spending $200 on hardware to run Tabby (added to an existing developer machine) breaks even in one to two years, after which local completions cost only electricity. For teams of 10 developers, the hardware investment to run a shared Tabby server pays back in under a year compared to team Copilot subscriptions, with better privacy guarantees.

Completion Quality Tips

Getting the best completions from Tabby requires understanding how it reads context. Tabby sends the code before and after your cursor (prefix and suffix) to the model. Clear, descriptive variable names and function signatures improve completion quality significantly — the model infers intent from the names it sees. Well-written comments above a function tell Tabby what you are about to implement, guiding the completion toward the intended implementation rather than a generic pattern. Writing the function signature with typed parameters before triggering completions typically produces better completions than starting from a blank line. These are the same practices that improve any LLM coding assistance, but they matter more with a smaller local model that has less capacity to infer intent from sparse context.

Team Deployment on a Shared Server

For a development team, running a single Tabby instance on a shared server is more efficient than each developer running their own. A server with a good GPU (RTX 3080 or better) runs the 7B model at low latency and serves all team members simultaneously through the same HTTP API. Each developer connects their IDE extension to the server’s IP address rather than localhost. The admin UI provides usage statistics, API token management, and model configuration in one place. Shared repository context indexing means all team members benefit from the same codebase context without each setting up their own index. The operational overhead is similar to running any shared internal tool — a systemd service, occasional model updates, and monitoring that the service is running.

Integrating Repository-Level Context Effectively

The repository context feature is Tabby’s most significant differentiator from basic local completion. Without context, the model generates completions based only on what is in the current file — it does not know about your other modules, your internal utilities, or your project conventions. With repository indexing, Tabby retrieves semantically relevant snippets from across your codebase and includes them in the prompt alongside your current code. The practical effect: completions start using your actual internal function names, following your project’s error handling patterns, and referencing your real data structures instead of generic placeholder code. For large codebases with complex internal APIs, this feature alone closes much of the quality gap with Copilot.

Configure the scheduler to run regularly to keep the index fresh as your codebase evolves. Daily or on-commit indexing via a git hook keeps completions current without significant overhead on typical-sized projects. The index is stored locally and rebuild time scales with codebase size — for most projects under 500K lines of code, a full reindex takes under a minute.

Privacy and Security Considerations

Tabby’s local deployment means your code never leaves your network. For team deployments on an internal server, traffic between developer machines and the Tabby server stays on your internal network. For additional security, restrict the Tabby server port to internal network access only using firewall rules, use TLS (configurable in Tabby’s settings) for encrypted traffic even on internal networks, and rotate API tokens periodically using the admin UI. Tabby does not log completion request content by default — it logs usage statistics (token counts, latency) but not the actual code that was sent. Verify this in the Tabby documentation for your specific version, as privacy settings may evolve across releases.

Getting Started in 15 Minutes

Install Tabby, run tabby serve --model TabbyML/DeepseekCoder-1.3B, install the VS Code extension, connect it to localhost:8080 with a token from the admin UI, and open a code file to test completions. The whole setup takes 15 minutes including model download time. Once completions are working, evaluate quality on your typical coding tasks for a week before deciding whether to upgrade to a larger model. Most developers find the 1.3B model sufficient for completing obvious patterns and boilerplate, with the 7B model providing noticeably better quality on complex logic and less common patterns.

Tabby vs Continue: Choosing Your Local AI Coding Tool

Tabby and Continue solve different parts of the local AI coding workflow. Tabby specialises in inline completion — the ghost text that appears as you type, completing the current line or block. Continue is a chat-first assistant that also provides inline editing commands. Most developers who use both find they are complementary: Tabby handles the passive, always-on completion layer that triggers automatically, while Continue handles active, conversational assistance for more complex tasks like refactoring, explaining code, or generating new functions from a description. If you are choosing just one, Tabby is better if you primarily want Copilot-style completion; Continue is better if you primarily want chat-based assistance with some completion capability. The ideal local coding setup uses both together, which requires Ollama for Continue and Tabby’s own model management for completions — both running on the same machine without conflict.

Keeping Tabby Updated

Tabby releases new versions regularly with improved models, bug fixes, and new features. Update via the same method you used to install — brew upgrade tabbyml/tabby/tabby on macOS, or download a new binary on Linux. Model weights persist across updates in the ~/.tabby/models directory and do not need to be re-downloaded unless Tabby releases a new model version you want to try. Check the Tabby release notes before upgrading on a team server to catch any breaking changes to the API or configuration format. The admin UI shows your current version and available updates when you connect to localhost:8080 — a practical first check before starting a coding session if you want to stay current.

Tabby is actively developed and the model quality for code completion has improved significantly with each major release. Re-evaluating the model selection every few months — trying a new release against your actual daily coding tasks — is the most reliable way to ensure you are getting the best completions your hardware can deliver. The community forums and GitHub discussions are useful sources of model recommendations specific to different languages and hardware configurations, often more targeted than the official documentation for finding the best setup for your specific stack — small investments in staying current with Tabby releases consistently pay off in meaningfully better completions over time, with each new model release raising the quality ceiling for what local completion can achieve.

Leave a Comment