How to Build a Local AI TUI with Ollama

A terminal user interface gives you the interactivity of a chat application without leaving the command line. Building one for Ollama means you get a keyboard-driven, full-screen local AI assistant that starts in milliseconds, requires no browser, and works over SSH. This guide walks through building a polished TUI chat application in Python using the Textual framework, which provides a rich component library for building terminal apps that look and feel like modern desktop software — complete with panels, scrollable chat history, input fields, and streaming token display.

Textual is the right choice for a Python TUI in 2026 — it has the most active development, the best documentation, and its async-first design integrates cleanly with Ollama’s streaming API. The result is a responsive full-screen chat interface that handles token streaming without blocking the UI thread, supports keyboard shortcuts, and renders cleanly in any modern terminal emulator.

Setup

Install the dependencies:

pip install textual httpx

Textual handles the full-screen terminal rendering, keyboard input, and layout. httpx provides the async HTTP client for streaming from Ollama. No other dependencies are needed.

Basic App Structure

A Textual app is a Python class that inherits from App. Define the layout using CSS-like styles and compose it from built-in widgets:

from textual.app import App, ComposeResult
from textual.widgets import Header, Footer, Input, RichLog
from textual.containers import Container

class OllamaTUI(App):
    CSS = """
    Screen {
        layout: vertical;
    }
    #chat-log {
        height: 1fr;
        border: solid $primary;
        padding: 1;
    }
    #input-bar {
        height: 3;
        dock: bottom;
    }
    """
    BINDINGS = [
        ("ctrl+c", "quit", "Quit"),
        ("ctrl+r", "clear_history", "Clear"),
    ]

    def compose(self) -> ComposeResult:
        yield Header()
        yield RichLog(id="chat-log", markup=True, wrap=True)
        yield Input(id="input-bar", placeholder="Ask Ollama... (Enter to send)")
        yield Footer()

if __name__ == "__main__":
    OllamaTUI().run()

Run this with python tui.py and you get a full-screen terminal app with a scrollable log area and an input field at the bottom. The CSS snippet above uses Textual’s layout system: height: 1fr makes the chat log fill all available vertical space, and dock: bottom pins the input bar to the bottom of the screen regardless of terminal height.

Streaming Ollama Responses

Handle the input submission and stream Ollama’s response into the chat log using Textual’s worker system, which runs async tasks without blocking the UI:

import httpx, json
from textual.worker import Worker, get_current_worker

OLLAMA_URL = "http://localhost:11434"
MODEL = "llama3.2"

class OllamaTUI(App):
    # ... (CSS and BINDINGS as above)

    def __init__(self):
        super().__init__()
        self.history = []

    def on_input_submitted(self, event: Input.Submitted) -> None:
        prompt = event.value.strip()
        if not prompt:
            return
        self.query_one("#input-bar", Input).clear()
        log = self.query_one("#chat-log", RichLog)
        log.write(f"[bold cyan]You:[/] {prompt}")
        self.history.append({"role": "user", "content": prompt})
        self.run_worker(self.stream_response(), exclusive=True)

    async def stream_response(self) -> None:
        log = self.query_one("#chat-log", RichLog)
        log.write("[bold green]Assistant:[/] ", end="")
        full = ""
        try:
            async with httpx.AsyncClient(timeout=120) as client:
                async with client.stream(
                    "POST", f"{OLLAMA_URL}/api/chat",
                    json={"model": MODEL, "messages": self.history, "stream": True}
                ) as resp:
                    async for line in resp.aiter_lines():
                        if not line:
                            continue
                        chunk = json.loads(line)
                        token = chunk.get("message", {}).get("content", "")
                        if token:
                            full += token
                            log.write(token, end="")
                        if chunk.get("done"):
                            break
            log.write("")  # newline after response
            self.history.append({"role": "assistant", "content": full})
        except Exception as e:
            log.write(f"[red]Error: {e}[/]")

    def action_clear_history(self) -> None:
        self.history.clear()
        self.query_one("#chat-log", RichLog).clear()

The run_worker call runs stream_response as a background async task, so the UI stays responsive — you can scroll the chat log while the response is streaming. The exclusive=True flag cancels any running worker before starting a new one, preventing multiple simultaneous Ollama requests if the user sends another message before the first response completes.

Adding a Model Selector

Let users switch models at runtime using a Textual Select widget populated from Ollama’s tags API:

from textual.widgets import Select

class OllamaTUI(App):
    CSS = """
    #toolbar { height: 3; layout: horizontal; }
    #model-select { width: 30; }
    """

    async def on_mount(self) -> None:
        await self.load_models()

    async def load_models(self) -> None:
        try:
            async with httpx.AsyncClient(timeout=10) as client:
                resp = await client.get(f"{OLLAMA_URL}/api/tags")
                models = [m["name"] for m in resp.json().get("models", [])]
                select = self.query_one("#model-select", Select)
                select.set_options([(m, m) for m in models])
                if models:
                    select.value = models[0]
                    self.current_model = models[0]
        except Exception:
            self.current_model = MODEL

    def on_select_changed(self, event: Select.Changed) -> None:
        if event.select.id == "model-select":
            self.current_model = str(event.value)
            log = self.query_one("#chat-log", RichLog)
            log.write(f"[dim]Switched to {self.current_model}[/]")

The on_mount hook fires when the app first renders, fetching available models from Ollama and populating the selector. If Ollama is not running or no models are installed, the exception is caught silently and the default model name is used. The on_select_changed handler updates self.current_model whenever the user picks a different model, and subsequent requests use the new selection.

Syntax-Highlighted Code Blocks

Textual’s RichLog widget uses Rich for rendering, which means you get syntax highlighting for code blocks if you detect and format them before writing. Wrap code detection in a simple parser that switches between prose and code mode:

from rich.syntax import Syntax
from rich.console import Console

def render_response(log: RichLog, text: str) -> None:
    in_code = False
    lang = "text"
    code_buf = []
    for line in text.splitlines():
        if line.startswith("```"):
            if in_code:
                syntax = Syntax("\n".join(code_buf), lang, theme="monokai")
                log.write(syntax)
                code_buf.clear()
                in_code = False
            else:
                lang = line[3:].strip() or "text"
                in_code = True
        elif in_code:
            code_buf.append(line)
        else:
            log.write(line)

Call render_response(log, full) at the end of streaming instead of writing raw tokens to get proper syntax highlighting for any code blocks in the response. The Syntax class from Rich handles language detection and colouring using Pygments, so Python, JavaScript, Bash, SQL, and dozens of other languages render with appropriate colours automatically based on the fence marker the model uses.

System Prompt Panel

Add a collapsible panel for setting a system prompt, toggled with a keyboard shortcut:

from textual.widgets import TextArea, Collapsible

class OllamaTUI(App):
    BINDINGS = [
        ("ctrl+p", "toggle_system", "System Prompt"),
        ("ctrl+r", "clear_history", "Clear"),
        ("ctrl+c", "quit", "Quit"),
    ]

    def compose(self) -> ComposeResult:
        yield Header()
        with Collapsible(title="System Prompt", id="system-panel", collapsed=True):
            yield TextArea(id="system-input")
        yield RichLog(id="chat-log", markup=True, wrap=True)
        yield Input(id="input-bar", placeholder="Message...")
        yield Footer()

    def action_toggle_system(self) -> None:
        panel = self.query_one("#system-panel", Collapsible)
        panel.collapsed = not panel.collapsed

    def get_system_prompt(self) -> list:
        text = self.query_one("#system-input", TextArea).text.strip()
        return [{"role": "system", "content": text}] if text else []

The Collapsible widget hides the system prompt area by default, keeping the UI clean for normal use. Pressing Ctrl+P expands it, lets the user type or paste a system prompt, then Ctrl+P again collapses it. In stream_response, prepend self.get_system_prompt() to self.history when building the messages list so the system prompt is always included without being stored in the conversation history.

Saving and Loading Conversations

Add commands to save the current conversation to a JSON file and load it back:

import json, pathlib
from textual.widgets import Button

SAVE_PATH = pathlib.Path.home() / ".ollama_tui_history.json"

def action_save(self) -> None:
    SAVE_PATH.write_text(json.dumps(self.history, indent=2))
    self.notify(f"Saved to {SAVE_PATH}")

def action_load(self) -> None:
    if SAVE_PATH.exists():
        self.history = json.loads(SAVE_PATH.read_text())
        log = self.query_one("#chat-log", RichLog)
        log.clear()
        for msg in self.history:
            prefix = "[bold cyan]You:[/]" if msg["role"] == "user" else "[bold green]Assistant:[/]"
            log.write(f"{prefix} {msg['content']}")
        self.notify("Conversation loaded")

The notify method displays a brief toast message in the corner of the screen confirming the save or load. Saving to a fixed path in the home directory keeps the interface simple — for a more polished implementation, use Textual’s DirectoryTree widget to show a file picker, letting users save multiple named conversations and browse them from within the TUI.

Packaging as a Standalone Tool

Package the TUI as a standalone command-line tool using pipx so it can be installed once and run from anywhere without activating a virtual environment:

# pyproject.toml
[project]
name = "ollama-tui"
version = "0.1.0"
dependencies = ["textual", "httpx"]

[project.scripts]
ollama-tui = "ollama_tui.app:main"

# Install with pipx:
pipx install .

After installation, running ollama-tui from any terminal launches the full-screen chat interface. Pass command-line arguments using Python’s argparse in the main() function to support flags like --model to set the default model, --url to point at a remote Ollama instance, and --load to pre-load a saved conversation on startup. These make the tool practical for different setups without editing the source code.

Alternative: Using Textual with Rich Progress

If you prefer a simpler non-full-screen approach, rich alone provides a solid streaming output experience for a command-line chat script. Use rich.live.Live with a Markdown renderable to display the growing response in place as tokens arrive — the response renders with proper heading formatting, bullet points, bold text, and syntax-highlighted code blocks. This is less than 50 lines of Python and gives a polished output experience without the complexity of a full TUI layout. Combine it with readline for input history (so the up arrow recalls previous prompts) and you have a capable local AI CLI tool that starts instantly and works in any terminal.

Running Over SSH

One of the practical advantages of a TUI over a web-based chat interface is that it works over SSH without any port forwarding or browser setup. SSH into the machine running Ollama, activate the virtualenv, and run python tui.py. The full-screen Textual interface renders correctly in any SSH session that supports standard terminal escape codes — which includes every modern terminal emulator on macOS, Linux, and Windows. This makes the TUI approach particularly useful for accessing a homelab GPU server running Ollama from a laptop over a local network connection.

Adding Keyboard-Driven Commands

A well-designed TUI relies on keyboard shortcuts rather than buttons or menus. Textual makes binding keys to actions straightforward using the BINDINGS class variable, and the Footer widget automatically displays the available shortcuts at the bottom of the screen. Beyond the basic quit and clear bindings, useful additions include a shortcut to copy the last response to the clipboard using pyperclip, a shortcut to toggle between dark and light themes using Textual’s built-in toggle_dark action, and a shortcut to open the current conversation as a Markdown file in the system’s default editor using Python’s subprocess.run. These quality-of-life additions make the TUI feel like a proper tool rather than a demo.

Textual also supports command palette integration via the COMMANDS class variable, which opens a searchable command palette when the user presses Ctrl+P (similar to VS Code’s command palette). Registering your custom actions — clear history, save conversation, switch model, toggle system prompt — in the command palette means users can discover and invoke them without memorising keyboard shortcuts. This is particularly useful when sharing the TUI with teammates who are less familiar with the keybindings.

Adding a Status Bar

Display useful runtime information — the current model name, whether a request is in progress, and the number of messages in the conversation — in a status bar below the chat log. Textual’s Label widget with a reactive attribute updates automatically whenever the underlying value changes:

from textual.reactive import reactive
from textual.widgets import Label

class OllamaTUI(App):
    current_model: reactive[str] = reactive("llama3.2")
    is_loading: reactive[bool] = reactive(False)

    def watch_current_model(self, model: str) -> None:
        self.query_one("#status", Label).update(self._status_text())

    def watch_is_loading(self, loading: bool) -> None:
        self.query_one("#status", Label).update(self._status_text())

    def _status_text(self) -> str:
        state = "[yellow]Generating...[/]" if self.is_loading else "[green]Ready[/]"
        return f"Model: [bold]{self.current_model}[/]  |  {state}  |  Messages: {len(self.history)}"

Reactive attributes in Textual automatically trigger watch_* methods whenever their value changes, making it easy to keep the status bar in sync without manual update calls scattered throughout the code. Set self.is_loading = True at the start of stream_response and self.is_loading = False at the end, and the status bar updates automatically.

Handling Long Conversations

As conversations grow, two practical issues arise: the context window fills up, and the chat log becomes slow to scroll through. For context window management, add a sliding window that keeps only the last N messages in the history sent to Ollama, while still displaying the full conversation in the log. A configurable window size of 20 messages (10 exchanges) works well for most models — it keeps enough context for coherent multi-turn conversations without approaching the model’s token limit.

For the chat log performance issue, Textual’s RichLog widget handles thousands of lines efficiently through virtual scrolling, so you rarely need to worry about it in practice. However, if you notice scrolling slowdowns with very long conversations, switch from RichLog to a ListView of custom ListItem widgets, one per message. This trades Rich formatting flexibility for better performance with extremely long conversations, and also makes it easier to implement features like per-message copy buttons or message editing.

Testing the TUI

Textual has a built-in testing framework that runs the app in a headless mode, simulating user input and capturing the rendered output for assertions. Write tests that simulate typing a prompt, pressing Enter, and then checking that the chat log contains the expected response text. Mock the httpx.AsyncClient to return a fixed streaming response so tests run without a real Ollama instance. This testing approach catches layout regressions — widgets overlapping, the input bar disappearing, response text not appearing — that are otherwise hard to detect without visual inspection.

Run the Textual dev console during development with textual run --dev tui.py — this enables the developer tools panel accessible with Ctrl+backslash, which shows the full DOM tree, computed CSS values for each widget, and a live log of events. The developer tools are invaluable for debugging layout issues and understanding why a widget is not appearing where you expect it.

Why a TUI Over a Web UI

The most common alternative to a TUI is a web-based chat UI like Open WebUI, which provides a polished browser interface to Ollama. A TUI makes more sense in specific situations: when you work primarily in the terminal and want to avoid context switching to a browser, when you are accessing a remote server over SSH where opening a web browser would require port forwarding, when you want a tool that starts in under a second with no Node.js or Docker dependencies, or when you are building a specialised workflow tool that needs tight integration with other command-line utilities via pipes and subprocesses. For general use, Open WebUI is excellent. For terminal-native developers and server-side workflows, a well-built TUI is a more natural fit. The Textual-based approach in this guide gives you a foundation solid enough to extend into whatever specialised tool your workflow needs, with far less code than a comparable web application would require.

The full source for the application described in this guide — including the model selector, system prompt panel, syntax highlighting, conversation persistence, and status bar — comes to around 250 lines of Python. That is a remarkably small amount of code for a tool this capable, and it reflects both Textual’s quality as a framework and the simplicity of Ollama’s streaming API. Fork it, extend it with the features your workflow needs, and you have a local AI assistant that is genuinely yours.