How to Build a Chat UI for Ollama with Gradio

Gradio is a Python library that generates a web UI from a few lines of code. For Ollama, it is the fastest way to build a shareable chat interface — add streaming support and a model selector and you have a functional demo or internal tool in under 30 lines of Python.

Setup

pip install gradio ollama

Basic Chat Interface

import gradio as gr
import ollama

def chat(message: str, history: list) -> str:
    messages = []
    for user_msg, assistant_msg in history:
        messages.append({'role': 'user', 'content': user_msg})
        messages.append({'role': 'assistant', 'content': assistant_msg})
    messages.append({'role': 'user', 'content': message})

    response = ollama.chat(model='llama3.2', messages=messages)
    return response['message']['content']

demo = gr.ChatInterface(
    fn=chat,
    title='Local AI Chat',
    description='Powered by Ollama running locally'
)

if __name__ == '__main__':
    demo.launch()  # Opens at http://localhost:7860

Streaming

from typing import Generator

def chat_stream(message: str, history: list) -> Generator[str, None, None]:
    messages = []
    for user_msg, assistant_msg in history:
        messages.append({'role': 'user', 'content': user_msg})
        messages.append({'role': 'assistant', 'content': assistant_msg})
    messages.append({'role': 'user', 'content': message})

    stream = ollama.chat(model='llama3.2', messages=messages, stream=True)
    partial = ''
    for chunk in stream:
        partial += chunk['message']['content']
        yield partial

demo = gr.ChatInterface(fn=chat_stream, title='Local AI Chat')
demo.launch()

Model Selector

import ollama as ol

def get_models() -> list[str]:
    return [m['name'] for m in ol.list()['models']]

def chat_with_model(message: str, history: list, model: str) -> Generator:
    messages = []
    for u, a in history:
        messages += [{'role':'user','content':u}, {'role':'assistant','content':a}]
    messages.append({'role':'user','content':message})
    stream = ol.chat(model=model, messages=messages, stream=True)
    partial = ''
    for chunk in stream:
        partial += chunk['message']['content']
        yield partial

with gr.Blocks(title='Ollama Chat') as demo:
    gr.Markdown('# Ollama Chat')
    model_dropdown = gr.Dropdown(
        choices=get_models(),
        value=get_models()[0] if get_models() else 'llama3.2',
        label='Model'
    )
    chatbot = gr.ChatInterface(
        fn=chat_with_model,
        additional_inputs=[model_dropdown]
    )

demo.launch()

Sharing and Deployment

# Share publicly via Gradio's free tunnel (temporary URL)
demo.launch(share=True)

# Make accessible on your LAN
demo.launch(server_name='0.0.0.0', server_port=7860)

# Run as a background service
# uvicorn app:demo.app --host 0.0.0.0 --port 7860

Why Gradio for Ollama

Building a custom chat web interface from scratch — HTML, CSS, JavaScript, WebSocket handling, streaming token display — takes hours even for experienced frontend developers. Gradio reduces this to minutes: gr.ChatInterface handles conversation history, message formatting, streaming display, and the UI layout automatically. The result is not as customisable as a hand-built React app, but it is functional, responsive, and available immediately. For prototypes, internal tools, demos, and personal AI assistants, Gradio is the fastest path from working Python code to a usable browser interface.

Gradio’s strength is also its limitation: the default UI components are opinionated and not easily restyled to match a specific brand. For consumer-facing products, you will eventually want a custom frontend. But for the 80% of use cases where the priority is functionality over aesthetics — developer tools, internal knowledge bases, research assistants, AI-powered utilities — Gradio’s polished default design is more than adequate and saves significant development time.

System Prompts and Personas

SYSTEM_PROMPT = """You are a helpful Python coding assistant.
When asked to write code, always include type hints and docstrings.
If you are unsure, say so rather than guessing."""

def chat_with_system(message: str, history: list, model: str) -> Generator:
    messages = [{'role': 'system', 'content': SYSTEM_PROMPT}]
    for user_msg, assistant_msg in history:
        messages.append({'role': 'user', 'content': user_msg})
        messages.append({'role': 'assistant', 'content': assistant_msg})
    messages.append({'role': 'user', 'content': message})

    stream = ollama.chat(model=model, messages=messages, stream=True)
    partial = ''
    for chunk in stream:
        partial += chunk['message']['content']
        yield partial

File Upload for Document Q&A

import gradio as gr
import ollama

def chat_with_doc(message: str, history: list, doc_file) -> Generator:
    doc_text = ''
    if doc_file:
        with open(doc_file.name) as f:
            doc_text = f.read()[:4000]  # Limit to context window

    context = f'Document:\n{doc_text}\n\n' if doc_text else ''
    messages = []
    for u, a in history:
        messages += [{'role':'user','content':u},{'role':'assistant','content':a}]
    messages.append({'role':'user','content':context+message})

    stream = ollama.chat(model='llama3.2', messages=messages, stream=True)
    partial = ''
    for chunk in stream:
        partial += chunk['message']['content']
        yield partial

with gr.Blocks(title='Document Chat') as demo:
    gr.Markdown('# Chat with a Document')
    doc_input = gr.File(label='Upload a text file (optional)', file_types=['.txt','.md'])
    gr.ChatInterface(fn=chat_with_doc, additional_inputs=[doc_input])

demo.launch()

Gradio vs Open WebUI

Both Gradio and Open WebUI provide browser-based chat interfaces for Ollama, but they target different use cases. Open WebUI is a full-featured chat application with user accounts, conversation history persistence, plugin support, and a polished consumer UI — it is the right choice for a shared team AI assistant where non-technical users will be the primary audience. Gradio is a Python library for building custom interfaces — it is the right choice when you need to add AI capabilities to a specific application (document Q&A, code analysis, data exploration) with custom inputs and logic that Open WebUI does not support. Use Open WebUI when you need a general-purpose chat interface; use Gradio when you need a custom AI-powered tool with specific input types and workflows.

Getting Started

Install Gradio and Ollama, copy the basic streaming chat interface from this article, and run python app.py. The interface opens at localhost:7860 — share it with teammates on your LAN via server_name='0.0.0.0' or publicly via share=True for a temporary Gradio tunnel URL. Add the model selector to let users choose between models, add a system prompt field for persona customisation, and add file upload for document Q&A. Each feature adds 5–10 lines of Python — the component-based Gradio API makes it straightforward to compose increasingly capable interfaces from simple building blocks.

Custom Themes and Styling

import gradio as gr

# Use Gradio themes for quick restyling
with gr.Blocks(
    theme=gr.themes.Soft(),
    title='AI Assistant',
    css='.chatbot { height: 500px; }'
) as demo:
    gr.Markdown('## Your Local AI Assistant')
    gr.ChatInterface(
        fn=chat_stream,
        chatbot=gr.Chatbot(height=500, render_markdown=True),
        textbox=gr.Textbox(placeholder='Ask anything...', container=False),
        examples=[
            'Explain Docker in one paragraph',
            'Write a Python function to parse a CSV file',
            'What is the difference between async and threading?'
        ]
    )

demo.launch()

Gradio ships with several built-in themes: Default, Soft, Monochrome, Glass, and Base. The css parameter accepts arbitrary CSS that overrides Gradio’s default styles — useful for adjusting heights, fonts, or spacing. For more extensive custom styling, Gradio’s component-level CSS classes follow a predictable naming convention that you can target directly. The examples parameter adds clickable example prompts below the text input, which is particularly useful for demos where you want visitors to see what the model can do without typing their own queries.

Multi-Tab Interface

with gr.Blocks(title='AI Toolkit') as demo:
    with gr.Tab('Chat'):
        gr.ChatInterface(fn=chat_stream)
    with gr.Tab('Summarise'):
        with gr.Row():
            text_input = gr.Textbox(lines=10, label='Paste text to summarise')
            summary_output = gr.Textbox(lines=5, label='Summary')
        gr.Button('Summarise').click(
            fn=lambda t: ollama.chat(model='llama3.2',
                messages=[{'role':'user','content':f'Summarise: {t}'}])['message']['content'],
            inputs=text_input,
            outputs=summary_output
        )
    with gr.Tab('Image Caption'):
        img_input = gr.Image(type='pil', label='Upload image')
        caption_output = gr.Textbox(label='Caption')
        gr.Button('Caption').click(
            fn=caption_image,  # Your vision model function
            inputs=img_input,
            outputs=caption_output
        )

demo.launch()

The gr.Blocks API with tabs creates a multi-tool interface — different AI capabilities accessible from a single URL. This is the natural evolution of the simple chat interface for teams that want to expose multiple AI features without building separate applications for each one. Each tab uses Ollama as the backend but presents a different interface tailored to the task type, making the full toolkit accessible to non-technical users who would not know how to prompt-engineer for each specific task.

Running Gradio as a Background Service

# systemd service for Gradio app
sudo nano /etc/systemd/system/ollama-chat-ui.service

# [Unit]
# Description=Ollama Gradio Chat UI
# After=ollama.service
#
# [Service]
# ExecStart=python3 /opt/chat-ui/app.py
# Restart=always
# User=ubuntu
# WorkingDirectory=/opt/chat-ui
#
# [Install]
# WantedBy=multi-user.target

sudo systemctl enable ollama-chat-ui
sudo systemctl start ollama-chat-ui
# UI available at http://your-server:7860

Gradio in the Local AI Toolkit

Gradio occupies a specific and valuable niche in the local AI toolkit. It is the right tool when: you have working Python AI code and need a browser interface without building a web application; you want to share a demo with non-technical stakeholders who need to interact with a model; you need a quick internal tool for a specific AI task (document summarisation, image captioning, code explanation) accessible from a browser on your LAN; or you want to prototype a user interface for an AI feature before investing in a production frontend. It is not the right tool for consumer-facing products that need custom branding, mobile-first design, or performance optimisation for many concurrent users — those warrant a purpose-built frontend. The line between prototype and production often blurs with Gradio interfaces that are “just for internal use” — many Gradio apps end up running in production for years because the functionality is good enough and the cost of replacing them with a custom frontend never seems worth it. That pragmatic staying power is a testament to how well Gradio’s default interface serves most practical AI tool use cases.

Gradio vs Streamlit

Streamlit is the other major Python library for building AI web interfaces quickly. Both are excellent, and the choice between them often comes down to team familiarity. Gradio’s ChatInterface component makes chat interfaces with streaming and conversation history trivially easy — it is the purpose-built choice for LLM chat UIs. Streamlit requires more manual implementation of chat state and streaming display but provides more flexibility for custom layouts and data visualisation components, which makes it better for applications that combine AI chat with data analysis and charts. For pure chat interfaces backed by Ollama, Gradio is the faster path. For data-heavy applications that include an AI chat component alongside analytics, Streamlit’s flexibility is worth the additional implementation effort.

Getting the Most from Gradio

The fastest way to build a useful Gradio interface is to start with the basic gr.ChatInterface, deploy it for actual use immediately (even just for yourself), and add features based on what you find yourself wishing for. The streaming interface from this article takes under 5 minutes to run. From there, the model selector, system prompt input, and file upload are each 5–10 line additions. The multi-tab interface for combining multiple AI tools is a natural next step once you have several working AI functions and want them accessible from a single URL. Ship early, use it, and let actual usage patterns drive which enhancements are worth adding — the Gradio component library has solutions for most common UI requirements, and the documentation and community examples cover nearly every combination you are likely to need.

Gradio in the Context of This Article Series

This article series has covered the full range of Ollama integration patterns: language bindings in Python, JavaScript, Go, Java, C#, Swift, Ruby, Elixir, and PHP; infrastructure patterns with Docker Compose, Kubernetes, and systemd; observability with Prometheus, OpenTelemetry, and Langfuse; developer productivity tools for commit messages and code review; and user-facing interfaces with Telegram bots, WebSockets, and now Gradio. Gradio fills a specific gap in this landscape: the fastest path from a working Python AI function to a browser interface that non-developers can use without installing anything. Every other approach in this series assumes some developer interaction — API calls, command-line tools, code integration. Gradio makes the AI functionality browser-accessible to anyone on your network or the internet, democratising access to the AI capabilities you have built with the patterns from earlier articles. It is the presentation layer that makes the rest of the stack useful to the people who are not developers but still benefit from the AI capabilities you have built.

Leave a Comment