How to Use Ollama with the Cursor Editor

Cursor is an AI-first code editor built on VS Code that supports custom AI backends. By pointing Cursor at your local Ollama instance, you can use any Ollama model for code completion, chat, and inline edits — with no API keys, no monthly subscription, and all code staying on your machine. This guide covers the complete Cursor + Ollama setup and the key workflows.

Why Cursor with Ollama

Cursor’s default AI features (Tab completion, Chat, Cmd+K inline editing) use Anthropic and OpenAI models via cloud APIs. Switching to Ollama replaces all of these with a local model: no code sent to external servers, no per-token billing, and offline capability. The trade-off is quality — local 7–13B models are less capable than Claude 3.5 Sonnet or GPT-4o for hard coding tasks — but for the majority of everyday coding tasks (autocomplete, boilerplate generation, explaining code, writing tests), a well-configured Qwen2.5-Coder 7B or CodeLlama 13B delivers results that are genuinely useful and much faster than waiting for cloud responses when you have good local hardware.

Configuring Cursor to Use Ollama

Cursor supports OpenAI-compatible API endpoints. Since Ollama exposes an OpenAI-compatible API at /v1, you can configure Cursor to use it directly:

  1. Open Cursor Settings (Cmd+, on Mac, Ctrl+, on Windows)
  2. Navigate to ModelsOpenAI API Key
  3. Set API Key to ollama (any non-empty string works)
  4. Set Base URL to http://localhost:11434/v1
  5. Under Models, type the model name exactly as it appears in ollama list (e.g. qwen2.5-coder:7b)
  6. Click Verify to confirm the connection

If Verify succeeds, Cursor’s Chat panel will use your local Ollama model. For Tab completion (inline autocomplete), Cursor uses a separate model — configure it under Tab Completion settings and point it at the same endpoint with a fast model.

Best Models for Cursor

# For Chat (quality matters more than speed)
ollama pull qwen2.5-coder:7b    # Best 7B coding model
ollama pull codellama:13b        # Good quality, larger
ollama pull mistral-nemo:12b     # Versatile, handles code well

# For Tab completion (speed matters more than quality)
ollama pull qwen2.5-coder:1.5b  # Very fast, decent quality
ollama pull deepseek-coder:1.3b  # Fast, purpose-built for completion

Use a small fast model (1.3–1.5B) for Tab completion so suggestions appear within 200–500ms without disrupting typing rhythm. Use a larger model (7B+) for Chat where you trigger it intentionally and can tolerate 5–15 second response times for complex questions.

Cursor Chat with Local Model

Open the Cursor Chat panel (Cmd+L) and type questions about your code — the local model responds using the code context Cursor automatically includes. Useful prompts with a local model:

  • “Explain what this function does” (select the function first with Cmd+Shift+L to add to context)
  • “Write a unit test for the selected code”
  • “Refactor this to use async/await”
  • “What could cause this error?” (paste an error message)
  • “Add type hints to this Python function”

Cmd+K Inline Editing

Cmd+K opens an inline edit prompt in Cursor. Select a code block, press Cmd+K, type what you want (“convert to list comprehension”, “add error handling”, “optimise this loop”), and the model rewrites the selection. With a local 7B coding model this works well for targeted rewrites of 10–50 lines — less well for complex multi-file refactors that require understanding more context than the local model’s context window allows.

Configuring Context Window

Create an Ollama Modelfile to set a larger context window for your Chat model — Cursor sends significant code context and larger windows produce better responses:

cat > Modelfile.cursor << 'EOF'
FROM qwen2.5-coder:7b
PARAMETER num_ctx 16384
PARAMETER temperature 0.2
EOF
ollama create cursor-coder -f Modelfile.cursor

Then set cursor-coder as the Chat model in Cursor settings. The 16K context window allows Cursor to include more of your codebase in each request, improving the relevance of responses for questions that span multiple files or require understanding broader project context.

Performance Tips

For the best Cursor + Ollama experience: pre-load your Chat model so it is in VRAM before you start a session (ollama run cursor-coder "" & in a terminal); use a code-specialised model rather than a general chat model (Qwen2.5-Coder significantly outperforms Llama 3.2 on coding tasks despite similar parameter counts); and disable features you do not use to reduce API calls — Cursor has several always-on AI features that generate background requests, and with a local model each one counts against your VRAM's capacity to hold the model loaded.

Privacy and Offline Use

With Ollama as the backend, Cursor's AI features process your code entirely locally. No code is sent to Anthropic, OpenAI, or any other external service. This makes the setup suitable for proprietary codebases, client work with strict NDA requirements, or development in environments with restricted internet access. Cursor still phone homes for telemetry and licence validation, but the AI inference itself is local. For fully offline use, pull your models before going offline — once pulled, Ollama serves them from local storage with no network access required.

Local vs Cloud AI in Cursor: An Honest Assessment

The honest comparison: cloud models (Claude 3.5 Sonnet, GPT-4o) are meaningfully better than local 7–13B models for hard coding tasks — complex architecture questions, debugging subtle multi-file issues, generating code in obscure frameworks. If you are doing this kind of work regularly, the cloud models' quality advantage is real and worth paying for. The local model advantage comes from three places: privacy (proprietary code never leaves your machine), cost (no per-token billing for heavy users), and speed (local inference on a good GPU can be faster than round-trip cloud API latency for shorter completions). For many everyday coding tasks — explaining code, writing boilerplate, generating tests for straightforward functions, simple refactors — a local 7B code model is genuinely sufficient, and the privacy and cost benefits make local the better default.

A practical approach many developers use is configuring Cursor with a local model as the primary backend and keeping a cloud API key available for the hard tasks that genuinely need it. Cursor makes switching models easy — you can change the model mid-conversation from the Chat panel dropdown without any configuration changes. This hybrid approach — local for 80% of the work, cloud for the 20% that needs it — captures most of the privacy and cost benefits while preserving access to frontier model capability when it matters.

Cursor vs Continue for Local AI Coding

Both Cursor and Continue (covered in an earlier article on this blog) support local Ollama backends. The key difference is the editor: Cursor is a standalone editor (VS Code fork) with deep AI integration at every level of the UI, while Continue is a plugin that adds AI capabilities to existing VS Code or JetBrains installations. If you are already using VS Code and do not want to switch editors, Continue is the natural choice. If you are open to a different editor and want the most deeply integrated AI coding experience, Cursor's purpose-built AI features (Composer for multi-file edits, @codebase context for repository-wide Q&A, inline diff review) go beyond what Continue provides as a plugin. Both work with Ollama — the choice is about editor preference and how deeply you want AI integrated into your coding workflow.

Troubleshooting Common Issues

If Cursor cannot connect to Ollama: verify Ollama is running (curl http://localhost:11434/), check that the base URL in Cursor settings is exactly http://localhost:11434/v1 (not /api), and ensure the model name in Cursor settings exactly matches the name shown by ollama list (including the tag, e.g. qwen2.5-coder:7b not just qwen2.5-coder). If responses are slow: pre-load the model before starting Cursor (run a quick ollama run [model] "" &), use a smaller model for Tab completion, and ensure the model is running on GPU rather than CPU (check ollama ps — the PROCESSOR column should show GPU or Metal, not CPU). If the model is running on CPU unexpectedly, the model may be too large for your GPU's VRAM — try a smaller quantisation (Q4_K_M) or a smaller model.

Getting Started

Pull a code-specialised model (ollama pull qwen2.5-coder:7b), open Cursor Settings, set the base URL to http://localhost:11434/v1 with API key ollama, add your model name, and click Verify. Within five minutes you have a fully local AI coding assistant integrated into your editor. Evaluate quality on your typical coding tasks for a week to decide whether the local model meets your bar, and upgrade to a larger model or switch to cloud for tasks where it falls short. The zero-cost, fully private setup is worth trying even if you ultimately use it only for routine tasks while keeping cloud access for complex ones.

The Bigger Picture: AI Editors in 2026

The AI editor landscape in 2026 includes Cursor, GitHub Copilot in VS Code, Continue, Windsurf, and several others — all competing to be the primary AI-assisted coding interface for developers. What distinguishes Cursor's Ollama integration from most alternatives is that it supports local models as a first-class option rather than an afterthought. The same Chat panel, Cmd+K, and Tab completion features all work with a local model, giving you the full Cursor experience without any cloud dependency. For developers who have already adopted Cursor for its AI features but have concerns about code privacy, the Ollama integration provides a path to keeping the Cursor UX while moving inference local. And for developers who have not yet adopted an AI editor because of privacy concerns, Cursor with Ollama is the most integrated local AI coding experience currently available — more seamlessly integrated than Continue in VS Code, and with a better built-in interface than configuring raw API endpoints in other editors.

The trajectory of local model quality suggests that the quality gap between local and cloud models for coding tasks will narrow over the next year or two. Models like Qwen2.5-Coder 32B already approach GPT-4o quality on coding benchmarks for developers with hardware to run them. As smaller models improve and hardware becomes more capable, the case for local coding AI strengthens — not as a compromise compared to cloud, but as a genuine equal that also happens to be private, free to use, and available offline.

The combination of Cursor's polished AI editor UX and Ollama's local inference is currently the most developer-friendly path to private AI-assisted coding. Both tools are actively developed, both have strong communities, and the integration between them is stable and well-supported. Setting it up takes under 15 minutes, and the result is a coding environment where AI assistance is available without sending your code anywhere — a setup that respects both your productivity needs and your code's privacy.

As the AI editor space continues to evolve and local model quality improves, the Cursor + Ollama combination represents an increasingly compelling option: the best available AI editor experience combined with fully local, private inference. The gap between local and cloud model quality for routine coding tasks is already small enough that many developers find local models sufficient for their daily work, and that gap will continue to narrow as models improve. The setup described in this article is worth trying regardless of where you ultimately land on the local-versus-cloud trade-off — it costs nothing beyond your hardware and takes 15 minutes to configure.

Leave a Comment