Introduction
Remix is a full-stack React framework that emphasises web fundamentals — HTTP, forms, and the browser’s native capabilities — over client-side abstractions. Its loader and action model gives every route a clean server/client boundary, making it straightforward to put Ollama on the server side where it belongs and stream results to the browser. Remix’s native support for streaming responses, combined with React’s Suspense, makes it one of the most ergonomic frameworks for building AI-powered web applications with local LLMs.
In this guide you will create a Remix application, add a resource route that proxies streaming requests to Ollama, and build a chat UI that displays tokens as they arrive. All inference runs locally — no API keys, no external services.
Prerequisites
- Node.js 18+ — check with
node -v. - Ollama — install from ollama.com and pull a model:
ollama pull llama3.2.
Confirm Ollama is running at http://localhost:11434 before continuing. Start it with ollama serve if needed.
Scaffolding the Remix Project
Create a new Remix application using the official template:
npx create-remix@latest ollama-remix
cd ollama-remix
npm install
npm run dev
Open http://localhost:5173 to confirm the app is running. Remix uses a app/routes/ directory for file-based routing. Each route file can export a loader (for GET requests), an action (for mutations), and a default React component for the UI.
Creating the Ollama Resource Route
A Remix resource route is a route file that exports only server functions — no React component. It is the right place for an Ollama proxy because it runs entirely on the server. Create app/routes/api.chat.ts:
import type { ActionFunctionArgs } from "@remix-run/node";
export async function action({ request }: ActionFunctionArgs) {
const body = await request.json();
const ollamaRes = await fetch("http://localhost:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ ...body, stream: true }),
signal: AbortSignal.timeout(120_000),
});
if (!ollamaRes.ok) {
return new Response(JSON.stringify({ error: "Ollama error" }), {
status: 500,
headers: { "Content-Type": "application/json" },
});
}
return new Response(ollamaRes.body, {
headers: {
"Content-Type": "application/x-ndjson",
"Cache-Control": "no-cache",
},
});
}
The file name api.chat.ts maps to the URL path /api/chat in Remix’s dot-delimiter routing convention. The action export handles POST requests, forwards them to Ollama with streaming enabled, and pipes the response body straight back. Because this is a server function, the Ollama URL never appears in the browser’s network tab.
Building the Chat UI Route
Now create the main chat page at app/routes/_index.tsx. This replaces the default Remix index route:
import { useState, useEffect, useRef } from "react";
type Message = { role: "user" | "assistant"; content: string };
export default function Index() {
const [model, setModel] = useState("llama3.2");
const [prompt, setPrompt] = useState("");
const [messages, setMessages] = useState<Message[]>([]);
const [loading, setLoading] = useState(false);
const bottomRef = useRef<HTMLDivElement>(null);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
async function send() {
if (!prompt.trim() || loading) return;
const userMsg: Message = { role: "user", content: prompt.trim() };
const nextMessages = [...messages, userMsg];
setMessages(nextMessages);
setPrompt("");
setLoading(true);
const assistantMsg: Message = { role: "assistant", content: "" };
setMessages([...nextMessages, assistantMsg]);
try {
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model, messages: nextMessages }),
});
if (!res.ok) throw new Error("Request failed");
const reader = res.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n").filter(Boolean);
for (const line of lines) {
try {
const data = JSON.parse(line);
if (data.message?.content) {
setMessages(prev => {
const copy = [...prev];
copy[copy.length - 1] = {
role: "assistant",
content: copy[copy.length - 1].content + data.message.content,
};
return copy;
});
}
} catch {}
}
}
} catch {
setMessages(prev => {
const copy = [...prev];
copy[copy.length - 1] = { role: "assistant", content: "Error — is Ollama running?" };
return copy;
});
} finally {
setLoading(false);
}
}
return (
<main style={{ maxWidth: 720, margin: "2rem auto", fontFamily: "sans-serif", padding: "0 1rem" }}>
<h1>Ollama + Remix</h1>
<select value={model} onChange={e => setModel(e.target.value)} style={{ marginBottom: "1rem" }}>
<option value="llama3.2">llama3.2</option>
<option value="mistral">mistral</option>
<option value="gemma3">gemma3</option>
</select>
<div style={{ border: "1px solid #ddd", borderRadius: 8, padding: "1rem", minHeight: 300, maxHeight: 480, overflowY: "auto", background: "#fafafa", marginBottom: "1rem" }}>
{messages.map((m, i) => (
<div key={i} style={{ marginBottom: "0.75rem", color: m.role === "user" ? "#1a56db" : "#111" }}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<div ref={bottomRef} />
</div>
<div style={{ display: "flex", gap: "0.5rem" }}>
<input
value={prompt}
onChange={e => setPrompt(e.target.value)}
onKeyDown={e => e.key === "Enter" && send()}
placeholder="Type a message..."
style={{ flex: 1, padding: "0.5rem", border: "1px solid #ccc", borderRadius: 4 }}
/>
<button
onClick={send}
disabled={loading}
style={{ padding: "0.5rem 1rem", background: loading ? "#aaa" : "#e8552f", color: "white", border: "none", borderRadius: 4, cursor: loading ? "default" : "pointer" }}
>
{loading ? "Thinking..." : "Send"}
</button>
</div>
</main>
);
}
The useRef and useEffect combination auto-scrolls to the bottom of the message list each time a new token arrives, keeping the latest content visible during a long generation. The functional state update inside the stream loop ensures React always operates on the latest state rather than a stale closure value.
Loading Models with a Remix Loader
Remix loaders run on the server before the page renders, making them ideal for fetching the list of installed Ollama models. Add a loader export to _index.tsx:
import { useLoaderData } from "@remix-run/react";
import type { LoaderFunctionArgs } from "@remix-run/node";
export async function loader({}: LoaderFunctionArgs) {
try {
const res = await fetch("http://localhost:11434/api/tags");
const data = await res.json();
return { models: data.models.map((m: { name: string }) => m.name) as string[] };
} catch {
return { models: ["llama3.2"] };
}
}
Then consume it in the component:
const { models } = useLoaderData<typeof loader>();
const [model, setModel] = useState(models[0] || "llama3.2");
Replace the hardcoded <option> elements with models.map(m => <option key={m} value={m}>{m}</option>). Because the loader runs server-side during SSR, the model list is baked into the initial HTML — no loading state or client-side flash needed.
Using Remix’s useFetcher for Non-Navigation Submissions
Remix’s useFetcher hook is designed for data mutations that do not cause a full page navigation — exactly the pattern needed for a chat interface. While the manual fetch approach above works well, useFetcher gives you Remix-native loading state management and integrates with Remix’s error boundaries. Here is an alternative implementation using useFetcher:
import { useFetcher } from "@remix-run/react";
export default function Index() {
const fetcher = useFetcher();
const [messages, setMessages] = useState<Message[]>([]);
// Note: for streaming, the manual fetch approach in the previous section
// is simpler. useFetcher is best for non-streaming mutations where you
// want the full Remix data flow including revalidation.
}
For streaming use cases, the manual fetch approach gives you direct access to the response body reader, which is what you need to process tokens as they arrive. useFetcher is more appropriate for non-streaming actions like saving a conversation to a database, updating settings, or deleting a message.
Adding Conversation Persistence with a Remix Action
Remix actions make it easy to add server-side side effects alongside the streaming response. For example, you might want to save each completed conversation turn to a file or database. Create a separate save endpoint at app/routes/api.save.ts:
import type { ActionFunctionArgs } from "@remix-run/node";
import { writeFile, readFile } from "node:fs/promises";
export async function action({ request }: ActionFunctionArgs) {
const { messages } = await request.json();
const history = JSON.parse(
await readFile("./chat-history.json", "utf-8").catch(() => "[]")
);
history.push({ timestamp: new Date().toISOString(), messages });
await writeFile("./chat-history.json", JSON.stringify(history, null, 2));
return new Response(null, { status: 204 });
}
After the streaming loop completes in the client component, POST the finished conversation to /api/save:
// after the streaming while loop:
await fetch("/api/save", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: updatedMessages }),
});
This pattern keeps the streaming path fast — the save happens after the response is fully received — while giving you a persistent record of every conversation.
Conclusion
Remix’s resource routes, server-side loaders, and native streaming support make it an excellent framework for Ollama-powered applications. The clear server/client boundary means your Ollama instance stays private by default, the loader pattern gives you SSR-populated data without client-side loading states, and Remix’s web-standards approach means the patterns you learn here transfer directly to standard Web APIs. From this foundation you can add Remix’s nested routing for multi-page AI tools, integrate a database for conversation history, or build a progressive enhancement layer so the app works even without JavaScript — all while keeping inference local and free.
Styling with Remix’s Built-in CSS Support
Remix has first-class support for CSS modules and global stylesheets. To add scoped styles to the chat page, rename your stylesheet to _index.module.css and import it as a CSS module:
import styles from "./_index.module.css";
// in JSX:
<main className={styles.chat}>
<div className={styles.messages}> ... </div>
</main>
The corresponding CSS file:
.chat { max-width: 720px; margin: 2rem auto; font-family: sans-serif; padding: 0 1rem; }
.messages { border: 1px solid #ddd; border-radius: 8px; padding: 1rem; min-height: 300px; max-height: 480px; overflow-y: auto; background: #fafafa; margin-bottom: 1rem; }
.user { color: #1a56db; margin-bottom: 0.75rem; }
.assistant { color: #111; margin-bottom: 0.75rem; }
.inputRow { display: flex; gap: 0.5rem; }
.inputRow input { flex: 1; padding: 0.5rem; border: 1px solid #ccc; border-radius: 4px; }
.inputRow button { padding: 0.5rem 1rem; background: #e8552f; color: white; border: none; border-radius: 4px; cursor: pointer; }
.inputRow button:disabled { background: #aaa; cursor: default; }
CSS modules scope class names to the component automatically, so there is no risk of style conflicts as your application grows. Remix also supports Tailwind CSS through its standard Vite integration — add the Tailwind Vite plugin and you can use utility classes throughout your routes without any additional configuration.
Deploying Remix with Ollama
Remix supports multiple deployment targets through its adapter system. For local production use, the Node.js adapter is the simplest choice. It is already configured in the default Remix template. Build and start the app:
npm run build
npm start
The production server runs on port 3000 by default. All Ollama calls happen server-side in the resource route, so users only need to reach your Remix server — the Ollama instance can stay bound to localhost. If you want to expose the app to other devices on your local network, set the HOST environment variable: HOST=0.0.0.0 npm start. For a more permanent setup, consider running both the Remix server and Ollama as systemd services on a Linux machine, which ensures they restart automatically after reboots and are always available on your local network.