Continue is a free, open-source VS Code and JetBrains extension that turns your local Ollama models into a fully-featured coding assistant — inline tab completions, a chat sidebar for code questions, slash commands for common tasks, and the ability to reference your codebase as context. Everything runs locally: your code never leaves your machine. This guide covers the complete setup and the most useful features.
Installation
Install Continue from the VS Code marketplace. Search for “Continue” by Continue.dev, or install it from the command line:
code --install-extension Continue.continue
After installation a Continue icon appears in the left sidebar (a chat bubble icon). Click it to open the Continue panel. On first launch it will ask you to choose a provider — select Ollama. Make sure Ollama is running with at least one model pulled before proceeding.
Configuring Continue with Ollama
Continue’s configuration lives in ~/.continue/config.json (macOS/Linux) or %APPDATA%\Continue\config.json (Windows). The minimal config to use Ollama for both chat and autocomplete:
{
"models": [
{
"title": "Llama 3.2 8B",
"provider": "ollama",
"model": "llama3.2:8b",
"apiBase": "http://localhost:11434"
},
{
"title": "Qwen2.5-Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen2.5-Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
},
"embeddingsProvider": {
"provider": "ollama",
"model": "nomic-embed-text",
"apiBase": "http://localhost:11434"
}
}
After saving the config, reload VS Code and the models appear in the Continue panel’s model dropdown. The tabAutocompleteModel is used for inline completions (Tab key); the models array is used for the chat sidebar. Using a coding-specialist model (Qwen2.5-Coder) for completions and a general model (Llama 3.2) for chat is a good default split.
Tab Autocomplete
Once configured, Continue shows ghost-text suggestions as you type — press Tab to accept, Escape to dismiss, or keep typing to ignore. The suggestions are generated by the tabAutocompleteModel using fill-in-the-middle (FIM) mode, which means the model sees both the code before and after the cursor when generating completions. This is significantly better than simple prefix-only completion for cases like filling in the middle of a function or adding a parameter to an existing call.
If completions feel slow, the bottleneck is almost always the model inference speed. Switch to a smaller, faster model for autocomplete — qwen2.5-coder:1.5b or deepseek-coder:1.3b give sub-second completions on a mid-range GPU while still being useful for common patterns. Use the larger 7B model for the chat sidebar where latency matters less.
# Pull a fast small model specifically for autocomplete
ollama pull qwen2.5-coder:1.5b
"tabAutocompleteModel": {
"title": "Qwen2.5-Coder 1.5B (fast)",
"provider": "ollama",
"model": "qwen2.5-coder:1.5b",
"apiBase": "http://localhost:11434"
}
The Chat Sidebar
Open the Continue panel with Cmd+L (Mac) or Ctrl+L (Windows/Linux). You can ask questions about code, request explanations, ask for refactors, or get help debugging. The key power feature is context — you can add specific files, functions, or selections to the chat context so the model has the relevant code to work with.
Keyboard shortcuts for adding context:
Cmd+Shift+L— add the current file to context- Highlight code then
Cmd+L— add the selected code to context - Type
@filein the chat to reference a specific file - Type
@codebaseto search across your whole codebase (uses embeddings)
Codebase Indexing with Embeddings
When you set an embeddingsProvider in the config, Continue indexes your entire codebase using those embeddings. This enables the @codebase context reference — type it in chat and Continue retrieves the most semantically relevant files and functions for your question. Run ollama pull nomic-embed-text first, then trigger indexing from the Continue panel by clicking the database icon or using the command palette: Continue: Index Codebase.
Indexing a medium-sized codebase (a few thousand files) takes a few minutes the first time. Subsequent indexing is incremental and runs automatically when files change. The index is stored locally in ~/.continue/index.
Slash Commands
Continue has built-in slash commands for common tasks. Type / in the chat input to see the full list. The most useful ones:
/edit— make targeted edits to highlighted code based on your instruction/comment— add docstrings and comments to selected code/test— generate unit tests for selected code/share— export the current conversation to a markdown file
# Example workflow:
# 1. Highlight a function
# 2. Open Continue panel (Cmd+L)
# 3. Type: /edit add error handling for the case where the input is None
# Continue streams the edited version directly into your file
Custom Slash Commands
You can define your own slash commands in the config for workflows you repeat frequently:
{
"slashCommands": [
{
"name": "review",
"description": "Review selected code for bugs and improvements",
"prompt": "Review the following code. List any bugs, edge cases not handled, and suggestions for improvement. Be specific and reference line numbers where relevant."
},
{
"name": "explain",
"description": "Explain selected code simply",
"prompt": "Explain what this code does in plain English. Assume the reader is a developer but is unfamiliar with this specific codebase."
}
]
}
Using Multiple Models
The models array in the config can include as many models as you want. Switch between them in the Continue panel using the model dropdown at the bottom of the chat. This lets you use different models for different tasks without leaving VS Code — a fast small model for quick questions, a larger model for complex refactoring, and optionally a cloud model (OpenAI, Anthropic) as a fallback when the local model is not sufficient.
{
"models": [
{
"title": "Qwen2.5-Coder 7B (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
},
{
"title": "Claude Sonnet (cloud fallback)",
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"apiKey": "sk-ant-your-key"
}
]
}
JetBrains Support
Continue also works in JetBrains IDEs (IntelliJ, PyCharm, WebStorm, GoLand). Install it from the JetBrains Marketplace by searching “Continue”. The configuration file is shared with the VS Code version — the same ~/.continue/config.json is used — so if you have already configured Continue for VS Code, JetBrains picks up the same models and settings automatically.
Troubleshooting
The most common issue is the Continue panel showing “No models configured” or failing to connect. Check that Ollama is running (ollama serve or the Ollama app), that the model name in your config exactly matches an installed model (check with ollama list), and that the apiBase URL is correct. Open Continue’s output log in VS Code (View → Output → Continue) to see detailed error messages. If autocomplete is enabled but not appearing, check that ghost text is enabled in VS Code settings: "editor.inlineSuggest.enabled": true.
Why Continue + Ollama Is Worth the Setup
The combination of Continue and Ollama gives you a coding assistant that is genuinely comparable to GitHub Copilot in day-to-day utility, with three significant advantages: it is free, it is private (no code sent to external servers), and it is customisable to your exact workflow. GitHub Copilot charges a monthly subscription and sends your code to Microsoft’s servers for inference. Continue with a local Ollama model costs nothing beyond the hardware you already own and processes everything on your machine. For developers working on proprietary code, unreleased projects, or anything under an NDA, this is not just a cost consideration — it is a compliance requirement that a cloud-based coding assistant cannot meet.
The quality gap between local and cloud coding assistants has also narrowed substantially. Qwen2.5-Coder 7B on a modern GPU produces completions that are indistinguishable from Copilot for the majority of routine coding tasks — completing function bodies, adding error handling, generating docstrings, writing simple tests. The cloud model still has an edge on very complex multi-file reasoning and highly specialised tasks, but for the 80% of coding assistance that is pattern completion and boilerplate generation, a well-configured local model is more than sufficient.
Context Window Management
One setting worth tuning for coding use is contextLength in the model config. This controls how many tokens of context Continue passes to the model — including the conversation history, any referenced files, and the codebase search results. The default is usually conservative to keep inference fast, but for complex coding questions that involve long files, increasing it significantly improves answer quality.
{
"models": [
{
"title": "Qwen2.5-Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"contextLength": 32768,
"completionOptions": {
"temperature": 0.2,
"maxTokens": 2048
}
}
]
}
Set contextLength to match the num_ctx you have configured in your Ollama Modelfile or the model’s default context length — going above what Ollama has allocated for the model results in context being silently truncated. For Qwen2.5-Coder 7B the default is 32K tokens, which is enough for most real codebases when combined with semantic search to retrieve only the relevant files rather than loading everything at once.
Team Setup: Sharing a Continue Configuration
Continue’s config file is just JSON, making it easy to standardise across a team. Commit a config.json to your repository’s .continue/ directory — Continue automatically picks up a project-level config from this location when you open the project in VS Code, overriding the user-level config for that workspace. This lets teams define a shared set of models, custom slash commands, and context settings that are version-controlled alongside the code they support.
A practical team config might include: the specific local model version to use (pinned, not latest), custom slash commands for the project’s review checklist and test generation patterns, and the shared Ollama server URL if the team is running inference on a shared GPU machine rather than each developer’s laptop. This last setup — a shared Ollama server that multiple developers point their Continue instances at — scales surprisingly well for small teams and eliminates the need for each developer to have a GPU-equipped machine to benefit from local LLM coding assistance.
Comparison with Other Local Coding Assistants
Continue is the most popular open-source coding assistant extension, but it is not the only one. Codeium has a free tier with cloud inference. Tabby is a self-hosted alternative with its own inference server. Copilot Chat with a local backend is possible via Continue’s proxy capabilities. Among these, Continue with Ollama has the best combination of active development, broad model support, and true local inference with no external dependencies. Tabby is worth considering if you want a dedicated coding-optimised inference server rather than Ollama’s general-purpose approach, but for most developers already using Ollama, adding Continue to VS Code is the path of least resistance to a capable local coding assistant.
Getting the Most Out of Continue Day-to-Day
A few habits make Continue significantly more useful in practice. First, get used to using Cmd+L to add context before asking questions — Continue without context is much weaker than Continue with the relevant file loaded. Second, use /edit rather than asking the model to describe changes: it is faster and applies the change directly to your file rather than requiring you to copy-paste. Third, when autocomplete suggestions are not what you want, type a comment above the code you are about to write describing what you want — Continue’s FIM model will use the comment as a strong signal for the completion. Something as simple as # parse the CSV and return a list of dicts with header names as keys dramatically improves the relevance of the suggestion that follows. Fourth, keep the autocomplete model small and fast — it runs on every keystroke, so a 1.5B model that responds in 200ms feels much more natural than a 7B model that takes 2 seconds, even if the 7B output is marginally better. Reserve the larger model for the chat sidebar where the latency is acceptable. These four habits combined make the difference between Continue feeling like a useful tool and feeling like background noise.