How to Use Ollama with Streamlit

Streamlit is the fastest way to build interactive Python web applications. It turns Python scripts into shareable web apps with a single command, making it ideal for data scientists and ML engineers who want to build demos, tools, and dashboards without learning web development. Paired with Ollama, Streamlit gives you a polished local AI application in under 100 lines of Python: a full-featured chat interface, a document analysis tool, or an AI-powered data explorer that runs entirely on your machine.

Streamlit’s reactive programming model — the entire script reruns on every user interaction — maps naturally onto chat interfaces where each message triggers a new LLM call and UI update. Combined with Ollama’s streaming API and Streamlit’s st.write_stream function, you get a typewriter streaming effect with almost no code.

Setup

pip install streamlit httpx
ollama pull llama3.2

Start with streamlit run app.py. Streamlit opens your browser automatically and reloads on every file save, making iteration very fast.

Streaming Chat App

Here is a complete streaming chat application using Streamlit and Ollama:

import streamlit as st
import httpx, json

st.title("Local AI Chat")
OLLAMA_URL = "http://localhost:11434"
MODEL = st.sidebar.selectbox("Model", ["llama3.2", "qwen2.5-coder:7b"])
SYSTEM = st.sidebar.text_area("System prompt", "You are a helpful assistant.")

if "messages" not in st.session_state:
    st.session_state.messages = []

for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

if prompt := st.chat_input("Ask something..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    msgs = ([{"role": "system", "content": SYSTEM}] if SYSTEM else []) + st.session_state.messages

    def gen():
        with httpx.Client(timeout=120) as client:
            with client.stream("POST", f"{OLLAMA_URL}/api/chat",
                               json={"model": MODEL, "messages": msgs, "stream": True}) as resp:
                for line in resp.iter_lines():
                    if not line: continue
                    chunk = json.loads(line)
                    token = chunk.get("message", {}).get("content", "")
                    if token: yield token
                    if chunk.get("done"): break

    with st.chat_message("assistant"):
        reply = st.write_stream(gen())
    st.session_state.messages.append({"role": "assistant", "content": reply})

This is a complete streaming chat app. The sidebar lets users switch models and set a system prompt at runtime. st.session_state persists conversation history. st.write_stream accepts a generator and streams tokens into the chat bubble as they arrive — Streamlit handles the rendering automatically.

Document Analysis App

Add a file uploader for document analysis:

st.title("Document Analyser")
uploaded = st.file_uploader("Upload a text file", type=["txt","md","py"])
if uploaded:
    content = uploaded.read().decode("utf-8", errors="ignore")
    st.text_area("Preview", content[:1000], height=150)
    task = st.selectbox("Task", [
        "Summarise in bullet points", "List action items",
        "Explain to a non-expert", "Find potential issues"
    ])
    if st.button("Analyse"):
        with st.spinner("Analysing..."):
            with httpx.Client(timeout=180) as client:
                resp = client.post(f"{OLLAMA_URL}/api/chat", json={
                    "model": MODEL,
                    "messages": [{"role": "user", "content": f"{task}:

{content[:8000]}"}],
                    "stream": False
                })
        result = resp.json()["message"]["content"]
        st.markdown(result)
        st.download_button("Download", result, file_name="analysis.md")

The st.download_button lets users save the analysis as a Markdown file. For PDF support add pymupdf and extract text with fitz.open() before passing to Ollama.

AI Data Explorer

Build a natural language interface over CSV data:

import pandas as pd
st.title("AI Data Explorer")
uploaded = st.file_uploader("Upload CSV", type="csv")
if uploaded:
    df = pd.read_csv(uploaded)
    st.dataframe(df.head(20))
    col1, col2 = st.columns(2)
    col1.metric("Rows", df.shape[0])
    col2.metric("Columns", df.shape[1])
    question = st.text_input("Ask a question about your data")
    if st.button("Ask") and question:
        context = f"Columns: {list(df.columns)}
Stats:
{df.describe().to_markdown()}
Sample:
{df.head(5).to_markdown(index=False)}"
        with st.spinner("Thinking..."):
            with httpx.Client(timeout=120) as client:
                resp = client.post(f"{OLLAMA_URL}/api/chat", json={
                    "model": MODEL,
                    "messages": [{"role":"user","content":f"Answer using this data:
{context}

Question: {question}"}],
                    "stream": False
                })
        st.markdown(resp.json()["message"]["content"])

Streamlit’s column layout, metric cards, and dataframe display create a polished dashboard without any CSS or HTML. The natural language query gives non-technical users a way to explore data without knowing pandas or SQL.

Model Comparison Tool

Compare responses from multiple models side by side using Streamlit’s column layout:

import time
st.title("Model Comparison")
prompt = st.text_area("Enter your prompt")
models = st.multiselect("Select models",
    ["llama3.2", "llama3.2:8b", "qwen2.5-coder:7b"],
    default=["llama3.2"])
if st.button("Compare") and prompt and models:
    cols = st.columns(len(models))
    for col, model in zip(cols, models):
        with col:
            st.subheader(model)
            start = time.time()
            with httpx.Client(timeout=120) as client:
                resp = client.post(f"{OLLAMA_URL}/api/chat", json={
                    "model": model,
                    "messages": [{"role":"user","content":prompt}],
                    "stream": False
                })
            elapsed = time.time() - start
            st.markdown(resp.json()["message"]["content"])
            st.caption(f"Generated in {elapsed:.1f}s")

The st.caption with generation time helps quantify the quality-vs-speed tradeoff for your hardware. This tool is invaluable for model selection — see multiple outputs and timings side by side and make an informed choice for your use case.

Why Streamlit for Ollama Applications

Streamlit is the right tool when you want to build an Ollama-powered application quickly and share it with colleagues who are not comfortable with the command line. A Python data scientist can go from “I have an Ollama prompt that works” to “I have a web app my whole team can use” in an afternoon — no JavaScript, no HTML, no deployment complexity. The chat interface, file uploader, data explorer, and model comparison tool in this guide are each under 40 lines of Python, yet they look and feel like proper web applications rather than hacked-together scripts.

Streamlit’s main limitation for Ollama use cases is that it does not support true concurrent users well — the global app state in st.session_state is per-browser-session, but the server runs requests sequentially for each user. For a personal tool or a small team, this is fine. For a high-traffic application where many users query Ollama simultaneously, FastAPI or Flask with a proper async architecture is a better foundation. For everything between personal tool and production service, Streamlit finds the right balance of simplicity and capability.

Caching Expensive Ollama Calls

Streamlit reruns the entire script on every interaction, which means Ollama is called every time a user clicks a button or changes a widget value — even if the inputs have not changed. Use Streamlit’s st.cache_data decorator to memoize expensive Ollama calls based on their inputs, so identical prompts return instantly from cache rather than making a new API call.

Mark any function that calls Ollama with @st.cache_data(ttl=3600) to cache its results for one hour. Streamlit generates a cache key from the function’s arguments automatically — if the prompt and model are the same as a previous call, the cached result is returned immediately. This is particularly useful for the document analysis and data exploration apps, where a user might ask the same question multiple times or refresh the page after making a minor change. Set the TTL to a longer value for analysis results that rarely change, or to zero to disable caching for real-time use cases where fresh results are always required.

Multi-Page Streamlit Apps

Streamlit supports multi-page apps natively since version 1.10. Create a pages/ directory alongside your main app.py and add Python files for each page. Streamlit automatically generates a sidebar navigation menu with links to each page. Each page file is an independent Python script with its own Streamlit components, but all pages share the same st.session_state, so conversation history and uploaded files are accessible from any page.

A natural structure for a multi-page Ollama application is a main landing page that explains the tool, a chat page for general conversation, a document analysis page for file uploads, and a settings page where users can configure the Ollama URL, default model, and system prompt. Storing settings in st.session_state rather than environment variables lets users adjust them at runtime through the UI rather than needing to edit configuration files and restart the app.

Integrating with Pandas and Visualisation Libraries

One of Streamlit’s strongest features is its native support for pandas DataFrames, Matplotlib, Plotly, and Altair charts. For an AI data analysis tool, the workflow is: user uploads a CSV, Ollama identifies interesting patterns in the data and suggests specific analyses, Streamlit executes the suggested analyses with pandas, and displays the results as interactive charts. The LLM acts as an intelligent layer that bridges the gap between natural language questions and the code needed to answer them — telling the user “I suggest plotting revenue by region as a bar chart” and then generating the pandas and Plotly code to do it.

Ask Ollama to generate Python code for a specific analysis using JSON schema mode to return the code in a structured format, then execute it with Python’s exec() in a sandboxed namespace. Display the results — DataFrames, figures, metrics — using Streamlit’s display functions. This AI-generated analysis pattern is powerful but requires careful handling: always run generated code in a restricted namespace, never allow it to access the file system or network, and display the generated code to the user so they can review it before it runs. Transparency about what code is being executed builds trust and lets users catch errors in the LLM’s reasoning before they affect results.

Sharing and Deploying Streamlit Apps

For sharing a Streamlit Ollama app within a local network, run with streamlit run app.py --server.address 0.0.0.0 --server.port 8501 and share the machine’s IP address with colleagues. Anyone on the same network can access it at http://192.168.x.x:8501. Since Ollama processes requests sequentially, multiple simultaneous users will experience queuing — the second user’s request waits for the first user’s response to complete. For small teams with occasional usage this is acceptable; for heavier use, deploy with multiple Streamlit instances behind a load balancer, each pointing at the same Ollama server.

For permanent deployment on a server, run Streamlit with a process manager like pm2 or systemd to keep it alive and restart it automatically if it crashes. Secure it behind nginx with HTTPS and basic authentication to prevent public access to your local AI application. The nginx configuration is straightforward: reverse proxy port 80/443 to port 8501, add an auth_basic block with a password file, and configure SSL with Let’s Encrypt. This gives you a private, secure, production-grade Streamlit application accessible from anywhere with a browser and the right credentials.

Streamlit vs Gradio for Ollama UIs

Gradio is the other popular Python library for building ML application UIs, and it is already covered in a separate guide on this site. The key differences for Ollama use cases: Gradio has built-in chat interface components and model sharing via Hugging Face Spaces, making it slightly faster to get a chat UI running. Streamlit is more general-purpose — it handles data exploration, custom dashboards, multi-page apps, and complex layouts better than Gradio, and its component ecosystem is larger. For a dedicated chat interface, Gradio is slightly more ergonomic. For anything more complex than a chat UI — data analysis tools, document processors, multi-step workflows — Streamlit is the better choice. Both are excellent for Ollama prototyping, and the Ollama API calls are identical between them.

Building a Prompt Engineering Workbench

Streamlit’s interactive widgets make it an excellent environment for iterating on prompts. Build a workbench with a system prompt editor, a user message input, sliders for temperature and context length, a model selector, and a side-by-side comparison of outputs across different parameter combinations. Add a history of previous runs stored in st.session_state so you can scroll back and compare results without losing earlier experiments. Include a button to export the current prompt configuration as a JSON file that you can reload in a future session or share with a colleague.

This kind of prompt engineering tool, built in an afternoon with Streamlit and Ollama, replaces expensive commercial prompt testing tools for most practical use cases. Every change is instant, every model is available, every result is inspectable, and everything stays on your machine.

Custom Styling and Components

Streamlit’s built-in st.code() function renders code with syntax highlighting based on a language parameter. For Ollama applications that generate code, parse the response for fenced code blocks and render each block with st.code(code, language=lang) rather than displaying raw Markdown. This gives users properly highlighted code they can copy directly. The language tag from the model’s own fence markers is accurate enough to use directly — the model almost always knows what language it is generating and labels the block accordingly.

For more advanced UI customisation, inject HTML using st.html() or embed JavaScript widgets with st.components.v1.html(). Keep custom HTML minimal — the more you customise away from Streamlit’s defaults, the more maintenance work you create for yourself when Streamlit updates its components. The built-in chat interface, file uploaders, metric cards, and layout containers handle the vast majority of Ollama application UI needs without any custom HTML, and reaching for custom components too early is a common mistake that slows development without a proportional improvement in user experience.

Streamlit’s strength is speed of iteration — from a working Ollama prompt to a shareable web application takes an afternoon, not a week. For AI-powered internal tools and data science dashboards, that speed advantage over traditional web frameworks is significant enough to make Streamlit the right default choice until your requirements outgrow what it can do. Start with the chat app template, add the specific AI capabilities your use case needs, and share the URL with your team — most of the time, that is all you need.

Streamlit finds the right balance between simplicity and capability for most local AI application use cases. The chat app, document analyser, data explorer, and model comparison tool in this guide cover the patterns you will reach for most often — start with the one closest to your use case and extend from there.

Leave a Comment