Local AI inference on Apple devices is most practical via Ollama running on a Mac, with iOS or macOS apps calling it over the local network. This guide covers calling a Mac-hosted Ollama from Swift, and running lightweight models directly on-device using Core ML or Apple’s Foundation Models framework for privacy-sensitive use cases.
Calling Mac-Hosted Ollama from Swift
The simplest setup: Ollama runs on a Mac, and a Swift app makes HTTP requests to it. This works for macOS apps talking to localhost, and for iOS apps when both the iPhone and Mac are on the same network.
import Foundation
struct OllamaClient {
let baseURL: URL
let model: String
init(host: String = "http://localhost:11434", model: String = "llama3.2") {
self.baseURL = URL(string: host)!
self.model = model
}
// Non-streaming chat
func chat(message: String) async throws -> String {
let url = baseURL.appendingPathComponent("api/chat")
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
let body: [String: Any] = [
"model": model,
"messages": [["role": "user", "content": message]],
"stream": false
]
request.httpBody = try JSONSerialization.data(withJSONObject: body)
let (data, _) = try await URLSession.shared.data(for: request)
let json = try JSONSerialization.jsonObject(with: data) as! [String: Any]
let msg = json["message"] as! [String: Any]
return msg["content"] as! String
}
// Streaming chat with AsyncStream
func chatStream(message: String) -> AsyncStream<String> {
AsyncStream { continuation in
Task {
let url = baseURL.appendingPathComponent("api/chat")
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
let body: [String: Any] = [
"model": model,
"messages": [["role": "user", "content": message]],
"stream": true
]
request.httpBody = try? JSONSerialization.data(withJSONObject: body)
let (bytes, _) = try! await URLSession.shared.bytes(for: request)
for try await line in bytes.lines {
if let data = line.data(using: .utf8),
let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
let msg = json["message"] as? [String: Any],
let content = msg["content"] as? String {
continuation.yield(content)
}
}
continuation.finish()
}
}
}
}
// SwiftUI Usage
struct ContentView: View {
@State private var response = ""
private let client = OllamaClient()
var body: some View {
VStack {
Text(response).padding()
Button("Ask") {
Task {
for await chunk in client.chatStream(message: "Explain Swift concurrency") {
await MainActor.run { response += chunk }
}
}
}
}
}
}
iOS App Calling LAN Ollama
For an iOS app to call Ollama on a Mac, configure Ollama to listen on all interfaces (OLLAMA_HOST=0.0.0.0:11434), and use the Mac’s local IP address (not localhost) as the host in the Swift client. Add NSAllowsArbitraryLoads to Info.plist if using HTTP rather than HTTPS on the local network, or use Bonjour/mDNS for automatic Mac discovery. The LAN approach works well for home and office environments where the phone and Mac are always on the same network.
Apple Foundation Models (On-Device, iOS 18+)
Apple’s Foundation Models framework (introduced in iOS 18 / macOS 15) runs small language models directly on-device using Neural Engine acceleration, with no network required and no data leaving the device:
import FoundationModels // iOS 18+, macOS 15+
// Check availability
guard SystemLanguageModel.default.availability == .available else {
print("Foundation Models not available on this device")
return
}
let session = LanguageModelSession()
let response = try await session.respond(to: "Summarise this text: \(articleText)")
print(response.content)
Foundation Models is ideal for privacy-sensitive on-device tasks (processing personal data, health information, private notes) where even LAN transmission to a Mac is undesirable. The model is significantly smaller than Ollama’s 7B+ models, which limits it to simpler tasks, but it runs at millisecond latency on modern Apple Silicon iPhones and Macs with zero setup.
Why Swift + Local AI
The push toward on-device and local AI in the Apple ecosystem reflects both privacy trends and Apple’s investment in Apple Silicon Neural Engine capabilities. For iOS and macOS developers, the choice between calling a Mac-hosted Ollama over the LAN and using Apple’s Foundation Models on-device involves genuine trade-offs. LAN-hosted Ollama provides much larger models (7B+ parameters) and the full Ollama model library, at the cost of requiring the phone and Mac to be on the same network. Foundation Models (iOS 18+) provides millisecond latency and true offline capability with no network dependency, at the cost of a significantly smaller model with more limited capabilities. Most production Swift applications use both: on-device models for latency-sensitive, privacy-critical tasks, and Ollama (or cloud APIs) for more complex tasks that benefit from larger models.
Error Handling and Timeouts
extension OllamaClient {
enum OllamaError: Error {
case connectionRefused
case modelNotFound(String)
case timeout
case decodingError
}
func chatSafe(message: String, timeoutSeconds: Double = 60) async throws -> String {
let url = baseURL.appendingPathComponent("api/chat")
var request = URLRequest(url: url, timeoutInterval: timeoutSeconds)
request.httpMethod = "POST"
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
let body: [String: Any] = [
"model": model, "messages": [["role": "user", "content": message]], "stream": false
]
request.httpBody = try JSONSerialization.data(withJSONObject: body)
do {
let (data, response) = try await URLSession.shared.data(for: request)
guard let http = response as? HTTPURLResponse, http.statusCode == 200 else {
throw OllamaError.modelNotFound(model)
}
guard let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
let msg = json["message"] as? [String: Any],
let content = msg["content"] as? String else {
throw OllamaError.decodingError
}
return content
} catch let error as URLError where error.code == .cannotConnectToHost {
throw OllamaError.connectionRefused
} catch let error as URLError where error.code == .timedOut {
throw OllamaError.timeout
}
}
}
SwiftUI Streaming Chat Interface
struct ChatView: View {
@State private var messages: [(role: String, content: String)] = []
@State private var input = ""
@State private var isStreaming = false
private let client = OllamaClient(host: "http://192.168.1.100:11434")
var body: some View {
VStack {
ScrollViewReader { proxy in
ScrollView {
ForEach(Array(messages.enumerated()), id: \.offset) { i, msg in
HStack {
if msg.role == "user" { Spacer() }
Text(msg.content)
.padding(10)
.background(msg.role == "user" ? Color.blue : Color(.systemGray5))
.foregroundColor(msg.role == "user" ? .white : .primary)
.cornerRadius(12)
if msg.role == "assistant" { Spacer() }
}.padding(.horizontal)
}
}.onChange(of: messages.count) { proxy.scrollTo(messages.count - 1) }
}
HStack {
TextField("Message", text: $input)
.textFieldStyle(.roundedBorder)
Button("Send") {
Task { await sendMessage() }
}.disabled(isStreaming || input.isEmpty)
}.padding()
}
}
func sendMessage() async {
let text = input; input = ""
messages.append((role: "user", content: text))
messages.append((role: "assistant", content: ""))
isStreaming = true
for await chunk in client.chatStream(message: text) {
messages[messages.count - 1].content += chunk
}
isStreaming = false
}
}
Choosing Between Ollama on Mac and Foundation Models
The decision framework: use Foundation Models for tasks that involve personal data (health, finances, private notes), need offline operation, or have latency requirements below 500ms. Use LAN-hosted Ollama for tasks requiring larger models (complex reasoning, long documents, code generation), when the Mac and iPhone are reliably on the same network, or when you need specific models from the Ollama library not available as Core ML models. For production iOS apps where you cannot guarantee LAN access, Foundation Models is the more reliable choice — it works everywhere iOS runs, with no network dependency and consistent performance. For developer tools and home automation apps where network assumptions are reasonable, LAN Ollama gives you much more capable models.
Getting Started
Copy the OllamaClient struct from this article, update the host URL to your Mac’s local IP address, ensure Ollama is running with OLLAMA_HOST=0.0.0.0:11434, and call chat(message:) from a SwiftUI button action. The LAN approach works without any special entitlements for local network access in development builds. For TestFlight and App Store distribution, add the Local Network Usage Description key to Info.plist to explain to users why the app needs local network access. From a working basic integration, add the streaming interface for better UX and the error handling extension for production reliability.
Multimodal: Sending Images from Swift
extension OllamaClient {
func captionImage(_ image: UIImage, prompt: String = "Describe this image.") async throws -> String {
guard let imageData = image.jpegData(compressionQuality: 0.8) else {
throw OllamaError.decodingError
}
let b64 = imageData.base64EncodedString()
let url = baseURL.appendingPathComponent("api/chat")
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.addValue("application/json", forHTTPHeaderField: "Content-Type")
let body: [String: Any] = [
"model": "llava", // or moondream, gemma3:4b
"messages": [[
"role": "user",
"content": prompt,
"images": [b64]
]],
"stream": false
]
request.httpBody = try JSONSerialization.data(withJSONObject: body)
let (data, _) = try await URLSession.shared.data(for: request)
let json = try JSONSerialization.jsonObject(with: data) as! [String: Any]
let msg = json["message"] as! [String: Any]
return msg["content"] as! String
}
}
// Usage in SwiftUI:
Button("Caption Photo") {
Task {
if let image = selectedImage {
caption = try await client.captionImage(image)
}
}
}
Offline-First Architecture
For iOS apps that need AI capabilities both online and offline, a layered approach works well: try Foundation Models first (always available, zero latency, no network), fall back to LAN Ollama if Foundation Models is unavailable or the task requires more capability, and fall back to a cloud API if LAN Ollama is unreachable. This hierarchy gives you the best experience for each network condition without hard-coding a dependency on any single AI backend.
func smartChat(message: String) async -> String {
// Try on-device first (iOS 18+)
if #available(iOS 18, *),
SystemLanguageModel.default.availability == .available {
let session = LanguageModelSession()
if let response = try? await session.respond(to: message) {
return response.content
}
}
// Fall back to LAN Ollama
if let response = try? await ollamaClient.chat(message: message) {
return response
}
// Fall back message
return "AI assistant unavailable — check your connection."
}
The Apple Intelligence Connection
Apple Intelligence (iOS 18+) and Foundation Models are part of Apple’s broader local AI strategy. As Apple continues to invest in on-device ML — larger models, better Neural Engine utilisation, multimodal capabilities — the quality ceiling for on-device Swift AI development rises. Ollama on Mac fills the gap between what Foundation Models can do today and what developers need — larger models, diverse open-weight options, and the ability to run the same models that the broader AI community uses for development and research. The two tools complement each other: Foundation Models for polished on-device UX, Ollama for development, experimentation, and tasks that need larger model capability than Apple’s on-device models currently provide.
Swift AI Development in 2026
The Apple platform AI ecosystem in 2026 is bifurcating between on-device models (Foundation Models, Core ML, Apple Intelligence) and off-device inference (LAN Ollama, cloud APIs). Swift developers have access to all of these options, and the best applications use them strategically — on-device for privacy-sensitive, latency-sensitive tasks; LAN Ollama for capability-intensive tasks where network access is reliable; and cloud APIs as the fallback or for tasks requiring frontier model quality. The patterns in this article — the URLSession-based OllamaClient, the AsyncStream streaming interface, the offline-first hierarchy — give you the building blocks for a Swift application that uses the right AI backend for each task automatically.
As Foundation Models matures with each iOS release and Apple’s on-device model quality improves, the balance will shift somewhat toward on-device inference for more task types. But Ollama on Mac will remain relevant for development workflows (testing AI features without internet), for Mac applications where GPU-accelerated 7B+ inference is viable, and for teams that need specific open-weight models not available in Foundation Models. The two ecosystems are complementary rather than competitive, and Swift developers building AI features today are well-positioned to take advantage of improvements in both as the platform matures.
Managing Ollama from Swift
extension OllamaClient {
struct ModelInfo {
let name: String
let sizeGB: Double
}
func listModels() async throws -> [ModelInfo] {
let url = baseURL.appendingPathComponent("api/tags")
let (data, _) = try await URLSession.shared.data(from: url)
let json = try JSONSerialization.jsonObject(with: data) as! [String: Any]
let models = json["models"] as! [[String: Any]]
return models.map {
ModelInfo(
name: $0["name"] as! String,
sizeGB: (($0["size"] as? Int) ?? 0) / 1_000_000_000
)
}
}
func isRunning() async -> Bool {
let url = baseURL.appendingPathComponent("")
return (try? await URLSession.shared.data(from: url)) != nil
}
}
// Usage: show model picker in SwiftUI
@State private var models: [OllamaClient.ModelInfo] = []
@State private var selectedModel = "llama3.2"
Task {
if await client.isRunning() {
models = (try? await client.listModels()) ?? []
}
}
Shipping Swift + Ollama Apps
For distributing a macOS app that uses LAN Ollama, include setup instructions that guide users to install Ollama and configure OLLAMA_HOST=0.0.0.0:11434. A helper view in the app that checks isRunning() on launch and shows setup guidance if Ollama is not found improves the first-run experience significantly. For iOS apps, the LAN dependency means the app should gracefully degrade when Ollama is not reachable — show a clear message explaining the requirement rather than appearing broken. The Foundation Models fallback pattern described in this article is the most user-friendly approach for iOS apps that want AI features to work even when the Mac server is unavailable.
Getting Started
Copy the OllamaClient struct, set the host to your Mac’s local IP, ensure Ollama is running with OLLAMA_HOST=0.0.0.0, and call the non-streaming chat method from a SwiftUI button action. Once that works, add the streaming interface for real-time token display and the error handling extension for production reliability. Test on a physical device (not just Simulator) to experience actual LAN latency, which varies from under 100ms for a wired Mac to 200–500ms for Wi-Fi depending on network conditions. The patterns in this article are the foundation for any Swift application that needs local AI capabilities — whether as the primary AI backend on Mac, as a development tool for iOS AI features, or as one tier in a hybrid on-device plus LAN inference architecture.
Testing Swift + Ollama Applications
Testing AI-dependent Swift code requires handling the non-determinism of model responses and the network dependency on a running Ollama instance. For unit tests, create a protocol that matches OllamaClient’s interface and provide a mock implementation that returns canned responses — this lets you test your business logic without requiring Ollama to be running during the test suite. For integration tests that exercise the actual Ollama connection, mark them with a custom condition that skips them when the OLLAMA_HOST environment variable is not set, allowing CI to run unit tests only while local development runs the full suite:
// Protocol for testability
protocol AIClient {
func chat(message: String) async throws -> String
}
extension OllamaClient: AIClient {}
// Mock for unit tests
struct MockAIClient: AIClient {
let responses: [String]
private var index = 0
mutating func chat(message: String) async throws -> String {
defer { index = (index + 1) % responses.count }
return responses[index]
}
}
// In tests
func testSummarization() async throws {
var client = MockAIClient(responses: ["Summary: The document covers AI basics."])
let result = try await summarize(text: sampleDoc, using: client)
XCTAssertTrue(result.contains("Summary"))
}
The Cross-Platform Perspective
Swift’s expansion beyond Apple platforms — Swift on Linux, Swift on Windows, Swift on the server via Vapor and Hummingbird — means the OllamaClient patterns in this article are not exclusively relevant to iOS and macOS development. A Swift server-side application running on Linux can call a local Ollama instance using the same URLSession-based client code with no platform-specific changes (URLSession is available on Linux via the swift-corelibs-foundation library). For teams that have adopted Swift as a server-side language, local Ollama integration follows the same patterns as for client-side development, and the code reuse potential across client and server in a full-stack Swift application is significant. The Foundation Models framework is the Apple-specific part of this article — the OllamaClient and streaming patterns are cross-platform by nature.
Summary
Swift developers in 2026 have a rich set of options for integrating local AI into their applications. The URLSession-based OllamaClient gives you direct access to the full Ollama model library from any Swift environment. The Foundation Models framework gives you true on-device inference for privacy-sensitive and latency-sensitive tasks on iOS 18+ and macOS 15+. The hybrid architecture — Foundation Models first, LAN Ollama fallback — gives you the best of both. All three approaches use familiar Swift async/await patterns and integrate naturally with SwiftUI’s reactive state model. The choice between them depends on your latency requirements, privacy constraints, network assumptions, and the model capabilities your application needs — all criteria that can be evaluated clearly against the concrete options described in this article — making informed trade-offs rather than defaulting to either cloud APIs or on-device models for all tasks regardless of fit — a nuanced approach that serves users better than any single-backend strategy and positions your application well as both on-device and server-side local AI capabilities continue to improve along with the broader Apple Intelligence initiative and the open-weight model ecosystem that Ollama provides access to — a combination that gives Swift developers more local AI flexibility than any other mobile platform in 2026 — a genuine platform advantage that reflects years of investment in on-device ML that no other mobile ecosystem can currently match — and one that will only strengthen as Apple continues to invest in its on-device AI capabilities with each major OS release, making Swift the best-positioned language for native AI applications on Apple hardware both today and in the future.