Ollama’s OpenAI-compatible REST API works naturally from .NET via the official OllamaSharp library or Microsoft’s Semantic Kernel framework. This guide covers both approaches for integrating local Ollama inference into C# and ASP.NET Core applications.
Option 1: OllamaSharp
dotnet add package OllamaSharp
using OllamaSharp;
var ollama = new OllamaApiClient("http://localhost:11434");
// Chat
var messages = new List<Message>
{
new() { Role = "user", Content = "Why is C# popular?" }
};
var chat = new Chat(ollama, "llama3.2");
await foreach (var token in chat.Send("Why is C# popular?"))
{
Console.Write(token);
}
Console.WriteLine();
// List models
var models = await ollama.ListLocalModels();
foreach (var model in models)
Console.WriteLine($"{model.Name} — {model.Size / 1e9:F1} GB");
Option 2: Microsoft.Extensions.AI + Ollama
dotnet add package Microsoft.Extensions.AI.Ollama
using Microsoft.Extensions.AI;
// Register in Program.cs
builder.Services.AddChatClient(
new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.2")
);
// Inject and use in a service
public class ChatService(IChatClient chatClient)
{
public async Task<string> AskAsync(string question)
{
var response = await chatClient.CompleteAsync(question);
return response.Message.Text ?? string.Empty;
}
public async IAsyncEnumerable<string> StreamAsync(string question)
{
await foreach (var update in chatClient.CompleteStreamingAsync(question))
yield return update.Text ?? string.Empty;
}
}
Semantic Kernel
dotnet add package Microsoft.SemanticKernel.Connectors.Ollama --prerelease
using Microsoft.SemanticKernel;
var kernel = Kernel.CreateBuilder()
.AddOllamaChatCompletion("llama3.2", new Uri("http://localhost:11434"))
.AddOllamaTextEmbeddingGeneration("nomic-embed-text", new Uri("http://localhost:11434"))
.Build();
// Simple prompt
var result = await kernel.InvokePromptAsync("Explain async/await in C# in one paragraph.");
Console.WriteLine(result);
// With prompt template
var summarize = kernel.CreateFunctionFromPrompt(
"Summarize this in 3 bullet points:\n\n{{$input}}"
);
var summary = await kernel.InvokeAsync(summarize, new() {{ "input", longText }});
Console.WriteLine(summary);
ASP.NET Core Streaming Endpoint
app.MapPost("/chat/stream", async (string message, IChatClient chatClient, HttpResponse response) =>
{
response.ContentType = "text/plain; charset=utf-8";
await foreach (var update in chatClient.CompleteStreamingAsync(message))
{
await response.WriteAsync(update.Text ?? "");
await response.Body.FlushAsync();
}
});
Embeddings in C#
using OllamaSharp;
var ollama = new OllamaApiClient("http://localhost:11434");
var embedding = await ollama.GenerateEmbeddings(new GenerateEmbeddingRequest
{
Model = "nomic-embed-text",
Prompt = "The quick brown fox"
});
Console.WriteLine($"Dimensions: {embedding.Embedding.Length}");
// Cosine similarity
static double CosineSimilarity(double[] a, double[] b)
{
var dot = a.Zip(b, (x, y) => x * y).Sum();
return dot / (Math.Sqrt(a.Sum(x => x * x)) * Math.Sqrt(b.Sum(x => x * x)));
}
Why .NET + Ollama
The .NET ecosystem has strong official support for local AI via Microsoft’s own tooling — Microsoft.Extensions.AI provides the IChatClient and IEmbeddingGenerator interfaces that work across Ollama, Azure OpenAI, and GitHub Models, and Semantic Kernel is Microsoft’s flagship AI orchestration framework. For .NET teams, this means the integration path to Ollama uses supported, well-documented Microsoft APIs rather than third-party wrappers, with the expectation that these interfaces will remain stable across .NET versions. The OllamaSharp library is a solid alternative for teams that want a simpler, Ollama-specific client without the abstraction overhead of the Microsoft AI abstractions.
Structured Output in C#
using OllamaSharp;
using System.Text.Json;
public record ContactInfo(string Name, string Email, string? Phone);
public async Task<ContactInfo?> ExtractContact(string text)
{
var schema = JsonSerializer.SerializeToElement(new
{
type = "object",
properties = new
{
name = new { type = "string" },
email = new { type = "string" },
phone = new { type = "string" }
},
required = new[] { "name", "email" }
});
var response = await ollama.Chat(new ChatRequest
{
Model = "llama3.2",
Messages = [new Message { Role = "user", Content = $"Extract contact: {text}" }],
Format = schema,
Stream = false,
Options = new RequestOptions { Temperature = 0 }
});
return JsonSerializer.Deserialize<ContactInfo>(response.Message.Content);
}
Background Service for Batch Processing
public class DocumentProcessingService(IChatClient chatClient) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var documents = await GetPendingDocuments();
foreach (var doc in documents)
{
var summary = await chatClient.CompleteAsync(
$"Summarise in 3 bullet points:\n\n{doc.Content}",
cancellationToken: stoppingToken
);
await SaveSummary(doc.Id, summary.Message.Text);
}
await Task.Delay(TimeSpan.FromMinutes(5), stoppingToken);
}
}
}
Choosing Between OllamaSharp, Microsoft.Extensions.AI, and Semantic Kernel
Each option has a distinct sweet spot. OllamaSharp is ideal for straightforward Ollama integration with full access to Ollama-specific features (model management, Modelfile creation, keep-alive control) without abstraction overhead. Microsoft.Extensions.AI is ideal when you want model-agnostic code — write once against the IChatClient interface, configure for Ollama in development and Azure OpenAI in production. Semantic Kernel is ideal for complex agent workflows, function calling, memory management, and orchestrated multi-step AI pipelines — it is the most powerful but also the most complex. For most .NET applications, Microsoft.Extensions.AI with OllamaSharp as the backing implementation is the right balance of portability and simplicity. Start there and reach for Semantic Kernel when your use case genuinely needs its agent and plugin capabilities.
Testing .NET Ollama Applications
The IChatClient interface from Microsoft.Extensions.AI makes unit testing straightforward — create a mock implementation that returns canned responses for tests without needing a running Ollama instance. For integration tests, use a real Ollama instance with a small fast model (qwen2.5:1.5b) and set a short timeout to catch performance regressions. The background service pattern shown above is testable via IHostedService test utilities in the Microsoft.Extensions.Hosting.Testing package, which lets you run the service for a fixed duration and assert on its side effects without a real-time wait.
Getting Started
Add dotnet add package Microsoft.Extensions.AI.Ollama, register OllamaChatClient as IChatClient in your DI container, and inject it into any service that needs AI capabilities. The first working chat endpoint takes under 30 minutes to implement. Add streaming via CompleteStreamingAsync for user-facing responses, structured output via JSON schema for extraction tasks, and the background service pattern for batch processing workloads. The .NET AI ecosystem is well-documented, actively developed by Microsoft, and provides the most natural path for C# teams to add local AI capabilities to their applications without learning a new framework paradigm.
C# Type Safety with Structured AI Output
One of C#’s strongest assets for AI integration is its type system. The structured output pattern — providing a JSON schema to the model and deserialising the response to a typed record — leverages C#’s record types, nullable reference types, and System.Text.Json for end-to-end type safety from prompt to business logic. Unlike Python’s Pydantic approach (which generates schema from class definitions), C# produces the JSON schema manually, but the deserialisation step is equally clean and produces strongly-typed objects that the rest of your application can work with safely. For teams already using C# for its type safety, this pattern extends that safety into the AI integration layer without any additional abstractions.
.NET MAUI: Cross-Platform Desktop and Mobile
.NET MAUI enables building cross-platform applications that can call a LAN-hosted Ollama from Windows, macOS, Android, and iOS all from a single C# codebase. The same HttpClient-based integration that works in ASP.NET Core works in MAUI — add the Microsoft.Extensions.AI.Ollama package, configure the base URL (localhost for desktop, LAN IP for mobile), and use IChatClient in your ViewModels. For MAUI apps targeting mobile platforms, the same LAN-access considerations apply as for iOS: the phone and Ollama server must be on the same network, and the relevant platform permissions (local network access on iOS, internet permission on Android) must be declared in the app manifest.
Blazor WebAssembly Limitations
Blazor WebAssembly runs C# in the browser via WebAssembly and cannot directly call localhost Ollama due to browser CORS restrictions and the browser sandbox. For Blazor WASM + Ollama, the standard pattern is a Blazor Server backend that calls Ollama (bypassing CORS since it is a server-to-server call) and streams responses to the Blazor WASM frontend. Blazor Server (running on ASP.NET Core) has no such limitations and is the simpler choice for Ollama-integrated Blazor applications — the SignalR connection between server and browser handles streaming naturally, and the server-side code calls Ollama directly as shown in the ASP.NET Core examples above.
Performance in .NET Applications
The performance characteristics of Ollama AI requests in .NET mirror those in other languages — inference speed is dominated by the model and hardware, not the client library. For ASP.NET Core applications, the key concern is thread efficiency: use async/await throughout the call chain so threads are not blocked waiting for long-running inference, and consider request queuing for endpoints that serve multiple concurrent users against a single Ollama instance. The IAsyncEnumerable-based streaming in Microsoft.Extensions.AI integrates naturally with ASP.NET Core’s response streaming, allowing the server to begin sending tokens to the client before generation completes without holding a thread for the full duration. For .NET 8+ applications, the combination of async streaming, minimal APIs, and Ollama’s native streaming support produces an efficient pipeline from model inference to browser with minimal memory overhead and good concurrency characteristics.
The .NET AI Ecosystem in 2026
Microsoft’s investment in AI tooling for .NET has accelerated significantly. The Microsoft.Extensions.AI abstractions, Semantic Kernel, and the OllamaSharp community library together give .NET developers a rich set of options for integrating AI — more mature and better-supported than the equivalent ecosystem was for any non-Python language two years ago. For enterprise organisations standardised on .NET, there is no longer a meaningful technical reason to maintain a separate Python service for AI capabilities in most applications — the .NET AI stack handles chat, embeddings, structured output, RAG, and agent workflows with the same quality and maturity as the Python alternatives, and it integrates naturally with the rest of the .NET infrastructure (Entity Framework, Aspire, minimal APIs, MAUI) that the team already knows. Ollama as the local model backend completes the picture by providing the inference layer without cloud dependencies, enabling .NET teams to build private, cost-effective AI applications using tools they already understand.
RAG in C# with In-Memory Vector Search
For a simple RAG pipeline in C# without a full vector database, use Microsoft’s VectorData abstractions with an in-memory store:
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
// In-memory vector store (replace with Redis or pgvector for production)
var store = new VolatileVectorStore();
var collection = store.GetCollection<string, TextEntry>("docs");
await collection.EnsureCollectionExistsAsync();
// Index documents
var embedder = new OllamaEmbeddingGenerator(
new Uri("http://localhost:11434"), "nomic-embed-text"
);
var entries = new[]
{
new TextEntry { Key = "1", Text = "Ollama runs local LLMs efficiently.", Vector = null },
new TextEntry { Key = "2", Text = "nomic-embed-text produces 768-dim embeddings.", Vector = null },
};
foreach (var entry in entries)
{
entry.Vector = await embedder.GenerateEmbeddingVectorAsync(entry.Text);
await collection.UpsertAsync(entry);
}
// Query
var queryVector = await embedder.GenerateEmbeddingVectorAsync("How does Ollama work?");
var results = await collection.VectorizedSearchAsync(queryVector, new() { Top = 2 });
await foreach (var r in results.Results)
Console.WriteLine(r.Record.Text);
[VectorStoreRecordCollection("docs")]
public class TextEntry
{
[VectorStoreRecordKey] public string Key { get; set; } = "";
[VectorStoreRecordData] public string Text { get; set; } = "";
[VectorStoreRecordVector(768)] public ReadOnlyMemory<float>? Vector { get; set; }
}
Practical Tips for .NET Ollama Development
Three practices make .NET Ollama development smoother. First, use the IChatClient interface from Microsoft.Extensions.AI as the type your services depend on — this lets you swap between OllamaSharp, Azure OpenAI, and mock implementations in tests without changing service code. Second, configure HttpClient timeouts appropriately — the default 100-second HttpClient timeout is tight for large model responses; extend it to 300 seconds for 7B+ models on slower hardware. Third, for ASP.NET Core applications that stream responses, ensure your middleware pipeline does not buffer responses — response buffering middleware will hold the entire AI response in memory before sending it to the client, negating the benefits of streaming. Check that no buffering middleware is enabled for AI streaming endpoints and test end-to-end that tokens appear in the browser before generation completes.
Getting Started
Run dotnet add package Microsoft.Extensions.AI.Ollama in any ASP.NET Core project, register the chat client in your DI container with a single line, and inject IChatClient into the service that needs AI capabilities. The first working endpoint takes under 20 minutes to implement for a developer already familiar with ASP.NET Core. Add streaming with the IAsyncEnumerable interface for better UX, structured output with JSON schema for extraction tasks, and the background service pattern for batch processing workloads. The Microsoft AI documentation is comprehensive and provides runnable samples for all common scenarios — and the combination of C# type safety, ASP.NET Core’s mature web framework, and Ollama’s local inference gives you a production-capable local AI application stack that integrates naturally with the .NET infrastructure most enterprise teams already operate.
Comparing .NET AI Integration Options in 2026
Three mature options exist for .NET developers integrating Ollama: OllamaSharp for direct, Ollama-specific access; Microsoft.Extensions.AI for model-agnostic, interface-driven integration; and Semantic Kernel for complex agentic workflows. Most teams will find Microsoft.Extensions.AI the right starting point — it provides the portability of the IChatClient abstraction without the complexity of Semantic Kernel’s plugin system, and OllamaSharp serves as the underlying implementation so Ollama-specific features remain accessible when needed. The investment in learning the Microsoft.Extensions.AI patterns pays off when your team later needs to swap backends, add caching, or integrate with Azure AI services — the abstraction layer makes all of these changes configuration-level rather than code-level decisions.
The .NET AI tooling ecosystem has matured quickly enough that teams no longer need to make significant trade-offs to use it. The quality of the client libraries, the breadth of the documentation, and the community of examples have all reached a level comparable to the Python ecosystem for the most common AI integration patterns. Teams already invested in .NET infrastructure — Azure services, Entity Framework, ASP.NET Core — will find the .NET AI path the most natural route to local AI integration, avoiding the context-switching and operational complexity of maintaining a separate Python service for AI features within a predominantly .NET environment.
The local AI stack — Ollama for inference, Microsoft.Extensions.AI for the .NET abstraction layer, and your existing ASP.NET Core or MAUI application as the host — is production-ready in 2026 and actively supported by Microsoft. Teams that build on it today are investing in a foundation that will remain relevant as both Ollama and the .NET AI ecosystem continue to improve, with the model-agnostic interface layer protecting the application code from changes in the underlying inference backend — a combination that positions .NET teams well for the continued evolution of local AI capabilities over the next several years — and the work you do integrating AI into your .NET applications today will compound as both model quality and tooling maturity continue to improve on a consistent trajectory — the foundation you build today is the one you will build on as the technology matures and the quality ceiling of local models continues to rise with each new generation of open-weight releases.