How to Use Ollama with Emacs

Emacs is one of the oldest and most extensible editors in existence, and its Lisp-based extension system makes it a surprisingly capable platform for local AI integration. Connecting Emacs to Ollama gives you AI assistance that runs entirely on your own hardware — inline completions, multi-turn chat, code explanation, and documentation generation — all without sending data to a cloud service. This guide covers the main Emacs packages for Ollama integration and how to write your own Elisp functions for custom workflows.

Emacs’s design philosophy — everything is text, everything is extensible — maps well onto LLM integration. You can send any buffer, region, or minibuffer input to Ollama and insert the response anywhere in the editor. The async process handling in Emacs allows Ollama requests to run in the background without freezing the UI, and the buffer system provides a natural place to display streaming responses.

Prerequisites

You need Ollama installed and running with at least one model pulled. On the Emacs side, you need Emacs 29 or later — version 29 introduced native JSON support and improved process handling that most Ollama packages depend on. A package manager such as straight.el or use-package with MELPA configured makes installing the packages straightforward. The examples in this guide use use-package syntax, which works with both the built-in package manager and straight.el.

Option 1: ellama

ellama is the most fully-featured Emacs package for Ollama. It provides commands for chat, code improvement, translation, summarisation, and more — all accessible via an interactive command palette. It uses the llm Emacs library as its backend abstraction, which means switching from Ollama to another provider later requires only a configuration change rather than code changes.

(use-package ellama
  :ensure t
  :init
  (require 'llm-ollama)
  (setopt ellama-provider
          (make-llm-ollama
           :host "localhost"
           :port 11434
           :chat-model "llama3.2"
           :embedding-model "nomic-embed-text")))

With ellama installed, the most useful commands are M-x ellama-chat for an interactive session, M-x ellama-code-complete to complete code at point, M-x ellama-summarize to summarise the current buffer or region, and M-x ellama-improve-grammar for prose editing. All commands stream their responses into dedicated buffers that persist for the session, so you can review earlier exchanges without re-running them.

Option 2: gptel

gptel is a universal LLM client for Emacs that supports Ollama alongside OpenAI, Anthropic, and other providers. Its approach centres on a dedicated chat buffer where you write prompts and receive responses inline, similar to how you would use a messaging application. It also supports sending the current region or buffer as context with any prompt.

(use-package gptel
  :ensure t
  :config
  (gptel-make-ollama
   "Ollama"
   :host "localhost:11434"
   :stream t
   :models '(llama3.2
             qwen2.5-coder:7b
             nomic-embed-text))
  (setq gptel-model 'llama3.2
        gptel-backend (gptel-make-ollama
                       "Ollama"
                       :host "localhost:11434"
                       :stream t
                       :models '(llama3.2))))

Open a gptel chat buffer with M-x gptel. Type your prompt at the bottom of the buffer and press C-c RET to send it. The response streams in below, with the conversation history automatically included in subsequent prompts. Use M-x gptel-send from any buffer to send the current region or a prompted query and have the response inserted at point — useful for inline code generation without switching away from your working buffer.

The :stream t setting enables streaming responses, which makes the interaction feel significantly more responsive — you see tokens appearing as they are generated rather than waiting for the entire response. For Ollama in particular, streaming is strongly recommended because some models can take 10 to 30 seconds for long responses, and watching tokens arrive incrementally is much more pleasant than staring at a frozen buffer waiting for a result to appear all at once.

Option 3: org-ai

If you use Org Mode extensively, org-ai integrates LLM interaction directly into Org buffers. You write prompts as special Org blocks and the response appears inline in the document — a natural fit for note-taking, research, and literate programming workflows.

(use-package org-ai
  :ensure t
  :hook (org-mode . org-ai-mode)
  :config
  (setq org-ai-default-chat-model "llama3.2"
        org-ai-ollama-api-url "http://localhost:11434"))

In any Org buffer, insert a block with C-c M-a to get a #+begin_ai block. Type your prompt inside the block, then press C-c C-c to send it. The response appears below your prompt within the same block, and you can continue the conversation by adding more text below the response and running C-c C-c again. This threaded conversation style inside an Org document makes it easy to mix AI-generated content with your own notes and keep everything in a single file.

Writing Your Own Elisp Integration

For full control, writing a minimal Elisp integration is straightforward. Emacs’s url-retrieve and make-process provide async HTTP capabilities, but the simplest approach for Ollama is using curl via start-process — the same pattern used under the hood by many Emacs packages:

(defun ollama-chat (prompt callback)
  "Send PROMPT to Ollama and call CALLBACK with each token as it arrives."
  (let* ((body (json-encode
                `((model . "llama3.2")
                  (messages . [((role . "user") (content . ,prompt))])
                  (stream . t))))
         (proc (start-process
                "ollama-chat" nil
                "curl" "-sS" "-X" "POST"
                "http://localhost:11434/api/chat"
                "-H" "Content-Type: application/json"
                "-d" body)))
    (set-process-filter
     proc
     (lambda (_proc output)
       (dolist (line (split-string output "\n" t))
         (condition-case nil
             (let* ((data (json-read-from-string line))
                    (token (cdr (assoc 'content
                                      (cdr (assoc 'message data))))))
               (when token (funcall callback token)))
           (error nil)))))))

(defun ollama-insert-at-point (prompt)
  "Ask Ollama PROMPT and stream the response into the current buffer at point."
  (interactive "sAsk Ollama: ")
  (let ((buf (current-buffer))
        (pos (point)))
    (ollama-chat
     prompt
     (lambda (token)
       (with-current-buffer buf
         (save-excursion
           (goto-char pos)
           (insert token)
           (setq pos (point))))))))

The condition-case nil ... (error nil) wrapper around the JSON parsing silently ignores lines that cannot be parsed as JSON — this handles empty lines and the final statistics object Ollama sends after streaming completes. The save-excursion with position tracking ensures tokens are inserted at the correct location even if the user moves the cursor during generation.

Multi-Turn Chat Buffer

Extend the basic integration into a proper multi-turn chat interface using a dedicated buffer:

(defvar ollama--history '()
  "Conversation history as a list of (role . content) alists.")

(defun ollama-send-message (user-msg)
  "Add USER-MSG to history, send to Ollama, stream response into *Ollama Chat* buffer."
  (push `((role . "user") (content . ,user-msg)) ollama--history)
  (let* ((messages (vconcat (reverse ollama--history)))
         (body (json-encode
                `((model . "llama3.2")
                  (messages . ,messages)
                  (stream . t))))
         (buf (get-buffer-create "*Ollama Chat*"))
         (response-start nil))
    (with-current-buffer buf
      (goto-char (point-max))
      (insert (format "\nYou: %s\n\nAssistant: " user-msg))
      (setq response-start (point)))
    (let ((proc (start-process "ollama" nil "curl" "-sS" "-X" "POST"
                               "http://localhost:11434/api/chat"
                               "-H" "Content-Type: application/json"
                               "-d" body))
          (full-response ""))
      (set-process-filter
       proc
       (lambda (_p output)
         (dolist (line (split-string output "\n" t))
           (condition-case nil
               (let* ((data (json-read-from-string line))
                      (token (cdr (assoc 'content (cdr (assoc 'message data))))))
                 (when token
                   (setq full-response (concat full-response token))
                   (with-current-buffer buf
                     (save-excursion
                       (goto-char (point-max))
                       (insert token)))))
             (error nil)))))
      (set-process-sentinel
       proc
       (lambda (_p _e)
         (push `((role . "assistant") (content . ,full-response))
               ollama--history))))))

(defun ollama-chat-interactive ()
  "Open *Ollama Chat* and start a conversation."
  (interactive)
  (pop-to-buffer (get-buffer-create "*Ollama Chat*"))
  (let ((msg (read-string "You: ")))
    (ollama-send-message msg)))

The history is stored as a global variable for simplicity — in a more complete implementation you would make it buffer-local so each chat buffer maintains its own independent conversation. The sentinel function on the process appends the full assistant response to history once streaming completes, ensuring subsequent requests include the complete conversation context rather than just partial tokens.

Useful Keybindings

Bind the most common Ollama commands to memorable key sequences. The C-c a prefix is a reasonable choice — it is not claimed by most major modes and is easy to type:

(global-set-key (kbd "C-c a o") #'ollama-chat-interactive)
(global-set-key (kbd "C-c a i") #'ollama-insert-at-point)

;; Send region to Ollama with a custom prompt
(defun ollama-explain-region (start end)
  "Send selected region to Ollama asking for an explanation."
  (interactive "r")
  (let ((text (buffer-substring-no-properties start end)))
    (ollama-chat
     (format "Explain the following code:\n\n%s" text)
     (lambda (token)
       (with-current-buffer (get-buffer-create "*Ollama Explanation*")
         (goto-char (point-max))
         (insert token)
         (display-buffer (current-buffer)))))))

(global-set-key (kbd "C-c a e") #'ollama-explain-region)

The explain-region command creates a dedicated *Ollama Explanation* buffer and displays it automatically as tokens arrive. This keeps the explanation separate from your working buffer so you can read it alongside your code without the response interfering with your file. Use q to close the explanation buffer when you are done — standard Emacs buffer navigation applies.

Integrating with Company Mode

If you use Company Mode for completions, you can write a custom Company backend that sends the text before point to Ollama and returns the completion as a candidate. The backend calls Ollama synchronously (using url-retrieve-synchronously) to fit Company’s synchronous candidate retrieval model. Set company-idle-delay to nil and trigger completions manually with M-TAB to avoid firing Ollama requests on every keystroke, which would make the editor feel unresponsive. A manual trigger combined with a fast model like qwen2.5-coder:7b gives you AI completions that feel like a natural part of the Company workflow rather than an intrusive overlay.

Model Selection and Performance

The same model selection logic that applies to Neovim applies to Emacs. For interactive completions triggered manually, a 3B to 7B parameter model gives fast enough responses that the interaction does not feel sluggish. For chat and explanation tasks where quality matters more than speed, a larger model produces noticeably better results. Configure ellama or gptel with multiple named backends pointing at different models, and switch between them with M-x gptel-menu or M-x ellama-provider-select depending on the task at hand.

Keep the model loaded between requests by setting a long keep_alive in your Ollama configuration. For Emacs users who tend to keep their editor open for hours or days at a stretch, setting OLLAMA_KEEP_ALIVE=-1 keeps the model in memory indefinitely until Ollama is restarted. The memory cost is the model’s VRAM footprint — typically 4 to 8 GB for a 7B model at Q4 quantisation — which is worth the trade-off for eliminating the cold-start delay on every first request after a period of inactivity.

Which Package to Choose

For most Emacs users, gptel is the best starting point — it is actively maintained, has the broadest provider support, and its chat buffer model is immediately intuitive. If you are heavily invested in Org Mode, org-ai integrates more naturally into the way you already work with documents. If you want the richest command set out of the box — summarise, translate, improve, explain — ellama provides more named commands than any other package and the llm backend abstraction makes it easy to experiment with different models and providers.

The custom Elisp approach is worth considering if you have specific workflow requirements that none of the packages support, or if you simply prefer to keep your Emacs configuration minimal and understand every line of it. The core integration — an async curl process with a filter function that parses streaming JSON and inserts tokens into a buffer — is about 30 lines of Elisp and provides a solid foundation for building whatever workflow fits your needs.

Using the llm Emacs Library Directly

The llm package that ellama uses as its backend is also useful on its own as a foundation for custom integrations. It provides a clean, provider-agnostic API for making LLM requests in Elisp, so code you write against it works with Ollama today and can be switched to any other supported provider with a single configuration change. The key functions are llm-chat for blocking requests, llm-chat-async for non-blocking requests with a callback, and llm-chat-streaming for streaming token-by-token responses.

Using llm-chat-streaming with an Ollama backend gives you the same streaming behaviour as the raw curl approach but with proper error handling, retry logic, and provider abstraction built in. The callback receives a string that grows with each new token rather than individual token strings, so you append the latest version to your buffer rather than appending individual tokens — a subtle but important difference that avoids race conditions on the buffer position pointer when tokens arrive in rapid succession.

Doom Emacs and Spacemacs Configuration

If you use Doom Emacs, add the packages to your packages.el and configure them in config.el using the same use-package syntax shown above — Doom wraps use-package with its own use-package! macro that accepts identical arguments. The :after keyword is useful for making Ollama keybindings available only in specific modes: :after prog-mode restricts the bindings to programming language buffers, :after org-mode restricts them to Org files.

Spacemacs users can install the packages via the additional-packages list in .spacemacs and configure them in the user-config function. The SPC key prefix used by Spacemacs for its leader key is already heavily mapped, so binding Ollama commands under SPC o a (for “ollama” under the “open” prefix) avoids conflicts with built-in Spacemacs bindings. Both distributions handle lazy loading automatically, so the Ollama packages only load when their commands are first called, keeping startup time unaffected.

Combining Ollama with Emacs Org Mode for Research

One of the most productive ways to use Ollama with Emacs is alongside Org Mode for research and note-taking. Create a dedicated Org file as your AI research notebook — use headings to organise topics, #+begin_ai blocks (with org-ai) or inline gptel calls to query Ollama about specific questions, and Org’s tagging and agenda systems to track which questions have been answered and which still need follow-up.

Because all the AI interactions are stored as plain text in an Org file, they are searchable with M-x occur and helm-occur, linkable with Org’s internal link syntax, and exportable to HTML, PDF, or Markdown with org-export. The result is a research workflow where AI assistance is fully integrated into your note-taking rather than living in a separate chat application that you have to switch to and from. Everything stays in Emacs, everything is plain text, and everything is searchable — which is exactly the kind of workflow that makes Emacs users reluctant to switch to other tools.

Leave a Comment