How to Use Ollama with Go

Ollama provides an official Go library that wraps the REST API with idiomatic Go types, streaming support, and full model management capabilities. If you are building a Go application that needs local LLM inference — a CLI tool, a backend service, a data processing pipeline — the Go library is the cleanest path to Ollama integration without dropping to raw HTTP calls.

Installation

go get github.com/ollama/ollama/api

Basic Chat

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/ollama/ollama/api"
)

func main() {
	client, err := api.ClientFromEnvironment()
	if err != nil {
		log.Fatal(err)
	}

	messages := []api.Message{
		{Role: "user", Content: "Why is Go popular for backend services?"},
	}

	req := &api.ChatRequest{
		Model:    "llama3.2",
		Messages: messages,
	}

	var fullResponse string
	err = client.Chat(context.Background(), req, func(resp api.ChatResponse) error {
		fmt.Print(resp.Message.Content)
		fullResponse += resp.Message.Content
		return nil
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println()
}

Non-Streaming Response

func chatSync(client *api.Client, model, prompt string) (string, error) {
	messages := []api.Message{{Role: "user", Content: prompt}}
	req := &api.ChatRequest{Model: model, Messages: messages}

	var result string
	err := client.Chat(context.Background(), req, func(resp api.ChatResponse) error {
		if resp.Done {
			result = resp.Message.Content
		}
		return nil
	})
	return result, err
}

Generate (Raw Completion)

func generate(client *api.Client, model, prompt string) (string, error) {
	stream := false
	req := &api.GenerateRequest{
		Model:  model,
		Prompt: prompt,
		Stream: &stream,
	}

	var output string
	err := client.Generate(context.Background(), req, func(resp api.GenerateResponse) error {
		output += resp.Response
		return nil
	})
	return output, err
}

Embeddings

func embed(client *api.Client, model, text string) ([]float64, error) {
	req := &api.EmbeddingRequest{
		Model:  model,
		Prompt: text,
	}
	resp, err := client.Embeddings(context.Background(), req)
	if err != nil {
		return nil, err
	}
	return resp.Embedding, nil
}

// Usage:
embed, _ := embed(client, "nomic-embed-text", "Go is a statically typed language")
fmt.Printf("Dimensions: %d\n", len(embed))

Model Management

// List available models
models, err := client.List(context.Background())
for _, m := range models.Models {
	fmt.Printf("%-40s %.1f GB\n", m.Name, float64(m.Size)/1e9)
}

// Pull a model with progress
pullReq := &api.PullRequest{Model: "llama3.2"}
client.Pull(context.Background(), pullReq, func(resp api.ProgressResponse) error {
	if resp.Total > 0 {
		pct := float64(resp.Completed) / float64(resp.Total) * 100
		fmt.Printf("\rDownloading: %.1f%%", pct)
	}
	return nil
})

// Show running models
ps, err := client.List(context.Background())
_ = ps

Custom Client (Remote Ollama)

import "net/url"

remoteURL, _ := url.Parse("http://192.168.1.100:11434")
client := api.NewClient(remoteURL, &http.Client{})

Building a CLI Tool

package main

import (
	"bufio"
	"context"
	"fmt"
	"os"

	"github.com/ollama/ollama/api"
)

func main() {
	client, _ := api.ClientFromEnvironment()
	scanner := bufio.NewScanner(os.Stdin)
	var history []api.Message

	fmt.Println("Chat with llama3.2 (type 'quit' to exit)")
	for {
		fmt.Print("You: ")
		if !scanner.Scan() {
			break
		}
		input := scanner.Text()
		if input == "quit" {
			break
		}

		history = append(history, api.Message{Role: "user", Content: input})

		req := &api.ChatRequest{Model: "llama3.2", Messages: history}
		var reply string
		fmt.Print("Assistant: ")
		client.Chat(context.Background(), req, func(resp api.ChatResponse) error {
			fmt.Print(resp.Message.Content)
			reply += resp.Message.Content
			return nil
		})
		fmt.Println()
		history = append(history, api.Message{Role: "assistant", Content: reply})
	}
}

Why Use Go for Ollama Integrations?

Go is a natural fit for Ollama integrations that need to be distributed as standalone binaries — CLI tools, background daemons, system utilities, and microservices. A Go binary that wraps Ollama can be compiled for any target platform (Linux, macOS, Windows, ARM) and distributed as a single executable with no runtime dependency. Compare this to a Python script that requires a Python installation and virtual environment, or a Node.js tool that requires Node.js and node_modules. For tooling that you want to share with colleagues or deploy on servers without managing language runtimes, Go’s single-binary distribution model is a significant practical advantage.

Go’s concurrency primitives — goroutines and channels — also pair naturally with Ollama’s streaming API. A Go service that processes multiple Ollama requests concurrently, manages timeouts with contexts, and handles graceful shutdown fits cleanly into Go’s concurrency model without the callback complexity that concurrent streaming creates in other languages. The standard library’s context package integrates directly with Ollama’s API — every Ollama client method accepts a context, so request cancellation, timeouts, and deadline propagation all work through Go’s standard patterns.

Error Handling Pattern

package main

import (
	"context"
	"errors"
	"fmt"
	"net"
	"time"

	"github.com/ollama/ollama/api"
)

func chatWithTimeout(model, prompt string) (string, error) {
	client, err := api.ClientFromEnvironment()
	if err != nil {
		return "", fmt.Errorf("failed to create client: %w", err)
	}

	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
	defer cancel()

	messages := []api.Message{{Role: "user", Content: prompt}}
	req := &api.ChatRequest{Model: model, Messages: messages}

	var result string
	err = client.Chat(ctx, req, func(resp api.ChatResponse) error {
		result += resp.Message.Content
		return nil
	})
	if err != nil {
		var netErr *net.OpError
		if errors.As(err, &netErr) {
			return "", fmt.Errorf("ollama not running: %w", err)
		}
		if errors.Is(ctx.Err(), context.DeadlineExceeded) {
			return "", fmt.Errorf("request timed out after 2 minutes")
		}
		return "", err
	}
	return result, nil
}

Concurrent Batch Processing

func batchProcess(client *api.Client, model string, prompts []string) []string {
	results := make([]string, len(prompts))
	sem := make(chan struct{}, 3) // max 3 concurrent requests

	var wg sync.WaitGroup
	for i, prompt := range prompts {
		wg.Add(1)
		go func(idx int, p string) {
			defer wg.Done()
			sem <- struct{}{}
			defer func() { <-sem }()

			messages := []api.Message{{Role: "user", Content: p}}
			req := &api.ChatRequest{Model: model, Messages: messages}
			var out string
			client.Chat(context.Background(), req, func(r api.ChatResponse) error {
				out += r.Message.Content
				return nil
			})
			results[idx] = out
		}(i, prompt)
	}
	wg.Wait()
	return results
}

Structured Output

import (
	"encoding/json"
	"github.com/ollama/ollama/api"
)

type ContactInfo struct {
	Name  string `json:"name"`
	Email string `json:"email"`
	Phone string `json:"phone,omitempty"`
}

func extractContact(client *api.Client, text string) (*ContactInfo, error) {
	schema := map[string]any{
		"type": "object",
		"properties": map[string]any{
			"name":  map[string]string{"type": "string"},
			"email": map[string]string{"type": "string"},
			"phone": map[string]string{"type": "string"},
		},
		"required": []string{"name", "email"},
	}
	messages := []api.Message{{Role: "user", Content: "Extract contact info: " + text}}
	req := &api.ChatRequest{
		Model: "llama3.2", Messages: messages,
		Format: schema,
		Options: map[string]any{"temperature": 0},
	}
	var raw string
	client.Chat(context.Background(), req, func(r api.ChatResponse) error {
		raw += r.Message.Content
		return nil
	})
	var contact ContactInfo
	if err := json.Unmarshal([]byte(raw), &contact); err != nil {
		return nil, err
	}
	return &contact, nil
}

HTTP Server Wrapping Ollama

package main

import (
	"encoding/json"
	"net/http"
	"github.com/ollama/ollama/api"
)

type ChatRequest struct {
	Message string `json:"message"`
	Model   string `json:"model"`
}

func chatHandler(ollamaClient *api.Client) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		var req ChatRequest
		json.NewDecoder(r.Body).Decode(&req)
		if req.Model == "" { req.Model = "llama3.2" }

		w.Header().Set("Content-Type", "text/plain; charset=utf-8")
		w.Header().Set("Transfer-Encoding", "chunked")

		messages := []api.Message{{Role: "user", Content: req.Message}}
		ollamaClient.Chat(r.Context(), &api.ChatRequest{
			Model: req.Model, Messages: messages,
		}, func(resp api.ChatResponse) error {
			if _, err := fmt.Fprint(w, resp.Message.Content); err != nil {
				return err
			}
			w.(http.Flusher).Flush()
			return nil
		})
	}
}

func main() {
	client, _ := api.ClientFromEnvironment()
	http.HandleFunc("/chat", chatHandler(client))
	http.ListenAndServe(":8080", nil)
}

Getting Started

Run go get github.com/ollama/ollama/api in any Go module and import the package. The api.ClientFromEnvironment() constructor reads the OLLAMA_HOST environment variable (defaulting to localhost:11434), so you can point your application at a remote Ollama server by setting that variable without code changes. All API methods accept a context for cancellation and timeout control, fitting naturally into Go's standard patterns for service development. For CLI tools, the streaming callback pattern gives you real-time output without buffering the full response, which is important for long generations that users are watching in a terminal.

Why Go for Ollama Integrations

Go's single-binary distribution model makes it uniquely attractive for Ollama-powered tooling. A CLI tool that wraps Ollama compiles to a self-contained executable for Linux, macOS, or Windows ARM — no runtime dependencies, no package manager, no virtual environment. Compare this to distributing a Python script (requires Python, pip, a venv) or a Node.js tool (requires Node.js, npm, node_modules). For tools you want to share with non-developer colleagues or deploy on servers without managing language runtimes, Go's compilation model is a meaningful practical advantage that compounds over time as you accumulate multiple tools.

Go's concurrency primitives also pair naturally with Ollama's streaming API. The Chat method's callback function runs in a goroutine, and the context package integrates directly — you get request cancellation, timeouts, and deadline propagation through Go's standard patterns without additional abstraction. A Go service that processes multiple Ollama requests concurrently, manages graceful shutdown, and exposes metrics all fits cleanly into Go's standard library without reaching for external frameworks. For teams already writing Go backend services, adding Ollama as a local inference backend requires no new tooling or conceptual model — it is just another HTTP client with a callback for streaming.

Using the OLLAMA_HOST Environment Variable

The api.ClientFromEnvironment() constructor reads the OLLAMA_HOST environment variable, defaulting to http://localhost:11434 if unset. This means your Go binary can be pointed at a remote Ollama server without recompilation:

# Point to a remote Ollama server
export OLLAMA_HOST=http://192.168.1.100:11434
./my-ollama-tool

# Or inline
OLLAMA_HOST=http://team-server:11434 ./my-ollama-tool

For applications with more complex routing (different models on different servers, fallback logic), use the manual client constructor with a parsed URL. The environment variable approach is sufficient for most tools where the server location is a deployment-time concern rather than a runtime decision.

Testing Ollama Integrations in Go

// Use a test HTTP server to mock Ollama in unit tests
import (
	"encoding/json"
	"net/http"
	"net/http/httptest"
	"net/url"
	"testing"

	"github.com/ollama/ollama/api"
)

func TestChatSync(t *testing.T) {
	// Mock Ollama server
	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if r.URL.Path == "/api/chat" {
			w.Header().Set("Content-Type", "application/x-ndjson")
			json.NewEncoder(w).Encode(api.ChatResponse{
				Message: api.Message{Role: "assistant", Content: "Hello from mock!"},
				Done:    true,
			})
		}
	}))
	defer server.Close()

	u, _ := url.Parse(server.URL)
	client := api.NewClient(u, server.Client())

	result, err := chatSync(client, "llama3.2", "Hi")
	if err != nil {
		t.Fatal(err)
	}
	if result != "Hello from mock!" {
		t.Errorf("expected 'Hello from mock!', got %q", result)
	}
}

Distributing a Go Ollama Tool

# Build for the current platform
go build -o my-ai-tool .

# Cross-compile for Linux AMD64
GOOS=linux GOARCH=amd64 go build -o my-ai-tool-linux .

# Cross-compile for macOS ARM64 (Apple Silicon)
GOOS=darwin GOARCH=arm64 go build -o my-ai-tool-mac-arm .

# Cross-compile for Windows
GOOS=windows GOARCH=amd64 go build -o my-ai-tool.exe .

# Embed version info at build time
go build -ldflags "-X main.Version=1.0.0 -X main.BuildDate=$(date -u +%Y-%m-%d)" -o my-ai-tool .

The resulting binaries require no installation — users copy the file to any location in their PATH and run it immediately. This distribution model is one of Go's strongest practical advantages for developer tooling: you can build a useful Ollama-powered CLI tool and share it as a single file download that works on any platform without any setup instructions beyond "download and run".

Go vs Python vs JavaScript for Ollama Projects

Each language has a clear sweet spot for Ollama integration. Python is the best choice for data science workflows, Jupyter notebooks, ML pipelines, and projects with heavy dependencies on the Python ML ecosystem (NumPy, pandas, scikit-learn). JavaScript is the best choice for web applications, Next.js projects, and browser-based tools where the AI is one feature of a larger web product. Go is the best choice for CLI tools, system utilities, backend microservices, and anything you want to distribute as a binary without runtime dependencies. The Ollama API is identical across all three — if you need to switch languages for a specific project, the concepts and request patterns transfer directly with only syntax changes. Choose the language that fits your team's existing expertise and the target deployment environment, not the one with the most Ollama examples online.

Integrating Ollama with Existing Go Services

Adding Ollama to an existing Go service is straightforward because the client is just an HTTP client — it does not require a global singleton, special initialisation, or framework integration. Create the client once at service startup and pass it to handlers as a dependency. The client is concurrency-safe and can be shared across goroutines. For services that need to handle Ollama being temporarily unavailable (during restarts or model swaps), wrap requests in retry logic using exponential backoff — the standard pattern for unreliable dependencies in Go:

func chatWithRetry(client *api.Client, req *api.ChatRequest, maxAttempts int) (string, error) {
	var lastErr error
	for attempt := range maxAttempts {
		var result string
		err := client.Chat(context.Background(), req, func(r api.ChatResponse) error {
			result += r.Message.Content
			return nil
		})
		if err == nil {
			return result, nil
		}
		lastErr = err
		backoff := time.Duration(1<

Logging and Observability

For production Go services using Ollama, log the key performance metrics from each request — prompt token count, completion token count, and total duration. These metrics are available in the final streaming response chunk (resp.Done == true) and give you visibility into model performance, cost estimation, and anomaly detection (unusually long requests may indicate a model stuck in a loop):

err = client.Chat(ctx, req, func(resp api.ChatResponse) error {
	result += resp.Message.Content
	if resp.Done {
		log.Printf("model=%s prompt_tokens=%d completion_tokens=%d duration_ms=%d",
			resp.Model,
			resp.PromptEvalCount,
			resp.EvalCount,
			resp.TotalDuration/1e6, // nanoseconds to milliseconds
		)
	}
	return nil
})

The Go Ecosystem for AI Development in 2026

Go's AI ecosystem is less mature than Python's but growing, particularly for infrastructure and tooling use cases. The Ollama Go library is the highest-quality Go AI integration available for local models. For vector databases, pgvector has a Go client, and Chroma has a community Go client. For embedding pipelines, calling Ollama's embeddings endpoint from Go is as simple as the Python equivalent. For production services that need to process documents, images, or audio at scale with local AI, Go's performance and concurrency characteristics make it a genuinely compelling choice — you get the AI capabilities of the Python ecosystem's best models (served by Ollama) with Go's operational characteristics (low memory overhead, fast startup, easy deployment). This combination is increasingly attractive for teams building AI infrastructure that needs to run reliably at scale without the operational overhead of Python services.

Writing Idiomatic Go with the Ollama Library

The Ollama Go library follows Go conventions closely — methods return errors as the last value, streaming uses callbacks rather than channels, and the client is designed to be created once and reused. A few patterns worth knowing when writing idiomatic Go with the library. First, always check errors from Chat and Generate calls — unlike some streaming APIs that swallow errors, the Ollama library returns an error from the callback-based streaming methods if the connection drops mid-stream. Second, the callback function runs synchronously in the calling goroutine unless you spawn a goroutine inside it, so you can safely write to a strings.Builder inside the callback without mutex protection. Third, the Done field on the response struct marks the final chunk — check it when you need to access the performance metrics (EvalCount, PromptEvalCount, TotalDuration) which are only populated on the final chunk.

For production code, wrap the Ollama client in your own interface so you can swap the implementation in tests without needing a running Ollama instance. Define an LLMClient interface with the methods you use (Chat, Generate, Embeddings) and implement it both with the real Ollama client and with a mock that returns fixtures. This is standard Go dependency injection and makes your AI-powered code significantly more testable than code that calls the Ollama API directly from business logic functions. The Ollama library's callback-based streaming is simple enough that writing a mock implementation takes about 20 lines of code — a worthwhile investment for any production service.

Real-World Go + Ollama Projects

The Go + Ollama combination excels in several project types. Command-line developer tools — code reviewers, documentation generators, commit message writers — benefit from Go's fast startup and single-binary distribution. Background processing daemons that classify, tag, or summarise incoming data (emails, support tickets, news items) benefit from Go's low memory overhead when running as a long-lived service. API gateways that add AI capabilities to existing services without requiring the client to know about Ollama benefit from Go's HTTP handling strengths. And infrastructure tooling — monitoring scripts, log analysers, automated report generators — benefits from Go's easy deployment and cross-compilation. In all of these cases, the Ollama library provides clean access to local AI inference without the runtime overhead or distribution complexity of equivalent Python tooling.

Getting Started

Run go get github.com/ollama/ollama/api in any Go module, create a client with api.ClientFromEnvironment(), and call client.Chat() with your messages and a callback to handle the streaming response. The first working integration takes under 30 minutes for a Go developer already familiar with HTTP client patterns. From there, add structured output using the Format field for extraction tasks, add context-based timeouts for production reliability, and consider the HTTP server wrapper pattern if you need to expose Ollama capabilities to other services or team members. The Go library's idiomatic API and the single-binary distribution model make it the right choice for building local AI tooling that needs to work reliably across different environments and be easy to deploy and share — a development experience that makes local AI tooling in Go feel as natural and productive as any other Go project you would build.

The local AI ecosystem in 2026 has reached a point where the tools and patterns described in this article are stable, well-supported, and ready for production use. Whether you reach for Go, Python, JavaScript, or another language entirely, the underlying Ollama API remains the same — consistent, version-stable, and designed to support exactly the kind of practical applications this series has covered — and that foundation will serve you well as Ollama and the broader local AI ecosystem continue to evolve and new models raise the quality ceiling of what local inference can achieve.

Leave a Comment