Introduction
Angular is a batteries-included frontend framework from Google that pairs a strong opinions on project structure with powerful built-in tools for dependency injection, reactive programming with RxJS, and HTTP communication. When you add Ollama to an Angular project, you get a productive local AI development stack — RxJS observables are a natural fit for streaming LLM responses, and Angular’s HttpClient handles the HTTP layer cleanly.
This guide walks you through creating an Angular application that connects to a locally-running Ollama model, streams responses into a component using RxJS, and organises the Ollama logic in an injectable service. By the end you will have a working chat interface backed entirely by local inference.
Prerequisites
- Node.js 18+ and npm — check with
node -v. - Angular CLI — install globally with
npm install -g @angular/cli. - Ollama — install from ollama.com and pull a model:
ollama pull llama3.2.
Start Ollama with ollama serve and confirm it is reachable at http://localhost:11434 before continuing.
Creating the Angular Project
Scaffold a new Angular application:
ng new ollama-angular --routing=false --style=css
cd ollama-angular
When prompted, choose the standalone component style if using Angular 17+, or the NgModule style for earlier versions. Both work fine for this guide. Start the dev server:
ng serve
Open http://localhost:4200 to confirm the app is running.
Configuring CORS on Ollama
By default, Ollama only allows requests from the same origin. Since your Angular dev server runs on port 4200 while Ollama is on port 11434, you need to tell Ollama to accept cross-origin requests. Set the OLLAMA_ORIGINS environment variable before starting Ollama:
# macOS / Linux
OLLAMA_ORIGINS=http://localhost:4200 ollama serve
# Windows PowerShell
$env:OLLAMA_ORIGINS="http://localhost:4200"; ollama serve
Alternatively, set OLLAMA_ORIGINS=* to allow all origins during development, then tighten it for production. Once CORS is configured, your Angular app can call the Ollama API directly without a backend proxy.
Creating the Ollama Service
Angular services are the right place for HTTP logic. Create a service to encapsulate all Ollama API calls:
ng generate service ollama
Open src/app/ollama.service.ts and implement the streaming chat method:
import { Injectable } from '@angular/core'
import { Observable } from 'rxjs'
export interface Message { role: string; content: string }
@Injectable({ providedIn: 'root' })
export class OllamaService {
private baseUrl = 'http://localhost:11434'
chat(model: string, messages: Message[]): Observable<string> {
return new Observable(observer => {
fetch(this.baseUrl + '/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model, messages, stream: true })
}).then(async res => {
const reader = res.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) { observer.complete(); break }
const lines = decoder.decode(value).split('\n').filter(Boolean)
for (const line of lines) {
try {
const data = JSON.parse(line)
if (data.message?.content) observer.next(data.message.content)
} catch {}
}
}
}).catch(err => observer.error(err))
})
}
getModels(): Promise<string[]> {
return fetch(this.baseUrl + '/api/tags')
.then(r => r.json())
.then(d => d.models.map((m: { name: string }) => m.name))
.catch(() => ['llama3.2'])
}
}
The chat method wraps the streaming fetch in an RxJS Observable. Each token emitted by Ollama becomes a next emission on the observable, and when the stream ends the observable completes. This fits naturally into Angular’s reactive programming model.
Building the Chat Component
Generate the chat component:
ng generate component chat
Open src/app/chat/chat.component.ts and wire it to the service:
import { Component, OnInit } from '@angular/core'
import { FormsModule } from '@angular/forms'
import { NgFor, NgIf } from '@angular/common'
import { OllamaService, Message } from '../ollama.service'
@Component({
selector: 'app-chat',
standalone: true,
imports: [FormsModule, NgFor, NgIf],
templateUrl: './chat.component.html',
styleUrl: './chat.component.css'
})
export class ChatComponent implements OnInit {
model = 'llama3.2'
models: string[] = ['llama3.2']
prompt = ''
messages: Message[] = []
loading = false
constructor(private ollama: OllamaService) {}
ngOnInit() {
this.ollama.getModels().then(m => { this.models = m; this.model = m[0] })
}
send() {
if (!this.prompt.trim()) return
this.messages.push({ role: 'user', content: this.prompt.trim() })
this.prompt = ''
this.loading = true
const assistantMsg: Message = { role: 'assistant', content: '' }
this.messages.push(assistantMsg)
this.ollama.chat(this.model, [...this.messages.slice(0, -1)]).subscribe({
next: token => { assistantMsg.content += token },
error: () => { assistantMsg.content = 'Error: is Ollama running?'; this.loading = false },
complete: () => { this.loading = false }
})
}
}
Now create the template at src/app/chat/chat.component.html:
<div class="chat">
<h1>Ollama Chat</h1>
<select [(ngModel)]="model">
<option *ngFor="let m of models" [value]="m">{{ m }}</option>
</select>
<div class="messages">
<div *ngFor="let msg of messages" [class]="msg.role">
<strong>{{ msg.role }}:</strong> {{ msg.content }}
</div>
</div>
<div class="input-row">
<input [(ngModel)]="prompt" (keyup.enter)="send()" placeholder="Type a message..." />
<button (click)="send()" [disabled]="loading">{{ loading ? 'Thinking...' : 'Send' }}</button>
</div>
</div>
Add basic styles to chat.component.css:
.chat { max-width: 700px; margin: 2rem auto; font-family: sans-serif; }
.messages { border: 1px solid #ddd; border-radius: 8px; padding: 1rem; min-height: 300px; max-height: 480px; overflow-y: auto; background: #fafafa; margin: 1rem 0; }
.user { color: #1a56db; margin-bottom: 0.75rem; }
.assistant { color: #111; margin-bottom: 0.75rem; }
.input-row { display: flex; gap: 0.5rem; }
input { flex: 1; padding: 0.5rem; border: 1px solid #ccc; border-radius: 4px; }
button { padding: 0.5rem 1rem; background: #dd0031; color: white; border: none; border-radius: 4px; cursor: pointer; }
button:disabled { background: #aaa; }
Finally, update src/app/app.component.html to use the chat component:
<app-chat />
And import it in app.component.ts:
import { ChatComponent } from './chat/chat.component'
@Component({ ..., imports: [ChatComponent] })
Using RxJS Operators for Advanced Streaming
One of Angular’s strengths is RxJS, and the streaming observable from the Ollama service opens the door to powerful reactive patterns. For example, you can use scan to accumulate tokens into a growing string, and tap to trigger change detection explicitly if you run into zone.js issues:
import { scan } from 'rxjs/operators'
this.ollama.chat(this.model, messages).pipe(
scan((acc, token) => acc + token, '')
).subscribe(fullText => {
this.messages[this.messages.length - 1].content = fullText
})
This approach replaces the mutation pattern with an immutable accumulation, which is friendlier to Angular’s change detection and makes the data flow easier to reason about. You can also use takeUntil to cancel the stream when a user clicks a stop button, or debounceTime to throttle re-renders if the model generates tokens very quickly.
Conclusion
Angular’s service architecture, RxJS observables, and HttpClient integration make it a natural fit for streaming LLM interfaces. Wrapping the Ollama fetch stream in an observable keeps the component code clean, and the dependency injection system makes it easy to swap the Ollama service for a different provider or add a caching layer later. From this foundation you can add Angular Material components for a polished UI, integrate NgRx for global conversation state, or build a multi-page app with separate views for different AI tools — all powered by local inference with no external API dependencies.
Adding a System Prompt and Conversation Reset
A system prompt lets you shape the model’s behaviour without exposing that configuration to the end user. Update the send method in the component to always prepend a system message:
send() {
if (!this.prompt.trim()) return
const userMsg: Message = { role: 'user', content: this.prompt.trim() }
this.messages.push(userMsg)
this.prompt = ''
this.loading = true
const assistantMsg: Message = { role: 'assistant', content: '' }
this.messages.push(assistantMsg)
const payload: Message[] = [
{ role: 'system', content: 'You are a helpful and concise assistant.' },
...this.messages.filter(m => m.content && m.role !== 'system').slice(0, -1)
]
this.ollama.chat(this.model, payload).subscribe({
next: token => { assistantMsg.content += token },
complete: () => { this.loading = false },
error: () => { assistantMsg.content = 'Error — is Ollama running?'; this.loading = false }
})
}
It is also useful to let users reset the conversation. Add a reset method and a button in the template:
// component
reset() { this.messages = [] }
// template
<button (click)="reset()" [disabled]="loading">New chat</button>
Clearing the messages array starts a fresh context, which is important when switching topics or models — older messages from a different context can confuse the model and degrade response quality.
Proxying Through a Backend for Production
Calling Ollama directly from the browser works well for local development, but for any shared or semi-public deployment you should proxy through a backend. Angular’s dev server proxy configuration makes this easy during development. Create a file called proxy.conf.json at the project root:
{
"/ollama": {
"target": "http://localhost:11434",
"pathRewrite": { "^/ollama": "" },
"changeOrigin": true,
"secure": false
}
}
Then reference it in angular.json under serve > options:
"proxyConfig": "proxy.conf.json"
Update your service to call /ollama/api/chat instead of the full Ollama URL. Now all Ollama requests are proxied through the Angular dev server, eliminating CORS issues entirely and keeping the Ollama address out of the browser’s network tab. For production, configure the same proxy in whichever reverse proxy — Nginx, Caddy, or a Node server — sits in front of your deployed Angular app. This also gives you a natural place to add API key checks, request logging, or per-user rate limiting without modifying the Angular client code.
Testing the Ollama Service
Angular’s dependency injection system makes the OllamaService straightforward to test. In a unit test, you can provide a mock implementation that returns a controlled observable rather than making real HTTP calls:
const mockOllama = {
chat: () => of('Hello ', 'world', '!'),
getModels: () => Promise.resolve(['llama3.2'])
}
TestBed.configureTestingModule({
providers: [{ provide: OllamaService, useValue: mockOllama }]
})
Using of from RxJS to emit a sequence of token strings lets you test that your component correctly accumulates tokens into the assistant message, handles errors, and resets the loading flag on completion — all without needing a running Ollama instance. This separation of concerns, where the service owns all Ollama communication and the component owns display logic, is one of the key advantages of Angular’s architecture for building maintainable AI-powered applications.
Change Detection and Zone.js Considerations
Angular’s default change detection runs inside Zone.js, which patches browser APIs like fetch to trigger change detection automatically. However, because the streaming loop reads from a ReadableStream reader in a tight async loop, Angular may not always pick up token updates immediately. If you notice the UI updating in large batches rather than token by token, you can inject ChangeDetectorRef and call detectChanges() inside the streaming loop:
import { ChangeDetectorRef } from '@angular/core'
constructor(private ollama: OllamaService, private cdr: ChangeDetectorRef) {}
// inside the subscribe next callback:
next: token => {
assistantMsg.content += token
this.cdr.detectChanges()
}
Alternatively, if you are using Angular’s new signals-based reactivity introduced in Angular 17, you can store the assistant message content in a signal and update it inside the stream. Signals bypass Zone.js entirely and give you fine-grained, synchronous DOM updates that are ideal for high-frequency streaming updates. Either approach produces a smooth typewriter effect — the right choice depends on which reactivity model your project has already adopted.