How to Get Structured JSON Output from Ollama with Pydantic

Getting an LLM to return valid, consistently structured data is one of the most practically useful capabilities for building real applications. Instead of parsing free-form text, you define a Pydantic model describing exactly what fields you want, and the LLM fills them in. This guide covers several approaches to structured output with Ollama, from manual JSON prompting to Ollama’s native structured output feature, with Pydantic validation throughout.

The Problem with Free-Form Output

Without structured output, a prompt like “extract the name, email, and phone from this text” returns something like “The name is John Smith, email john@example.com, phone 555-1234” — which requires fragile string parsing that breaks whenever the model varies its phrasing. Structured output gives you {"name": "John Smith", "email": "john@example.com", "phone": "555-1234"} every time, ready for direct use in code.

Approach 1: JSON Prompt + Pydantic Validation

from pydantic import BaseModel, ValidationError
import ollama
import json
import re

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: str | None = None

def extract_contact(text: str, model: str = 'llama3.2') -> ContactInfo | None:
    prompt = f"""Extract contact information from this text and return ONLY valid JSON.
No explanation, no markdown, just the JSON object.

Required fields: name (string), email (string), phone (string or null if not present)

Text: {text}

JSON:"""
    response = ollama.chat(
        model=model,
        messages=[{'role': 'user', 'content': prompt}],
        options={'temperature': 0.0}  # deterministic output
    )
    raw = response['message']['content'].strip()
    # Strip markdown code fences if present
    raw = re.sub(r'^```(?:json)?\n?|```$', '', raw, flags=re.MULTILINE).strip()
    try:
        return ContactInfo.model_validate_json(raw)
    except (ValidationError, json.JSONDecodeError) as e:
        print(f'Validation error: {e}\nRaw: {raw}')
        return None

result = extract_contact('Call John Smith at john@acme.com or 555-867-5309')
if result:
    print(result.model_dump())
    # {'name': 'John Smith', 'email': 'john@acme.com', 'phone': '555-867-5309'}

Approach 2: Ollama’s Native Structured Output (format parameter)

Ollama supports a format parameter that accepts a JSON schema and constrains generation to match it — the model cannot produce output that violates the schema. This is the most reliable approach:

import ollama
from pydantic import BaseModel

class ProductReview(BaseModel):
    product_name: str
    rating: int          # 1-5
    sentiment: str       # 'positive', 'negative', 'neutral'
    key_points: list[str]
    would_recommend: bool

def analyse_review(review_text: str, model: str = 'llama3.2') -> ProductReview:
    response = ollama.chat(
        model=model,
        messages=[{
            'role': 'user',
            'content': f'Analyse this product review:\n\n{review_text}'
        }],
        format=ProductReview.model_json_schema(),  # pass Pydantic schema directly
        options={'temperature': 0.0}
    )
    return ProductReview.model_validate_json(response['message']['content'])

review = """The noise-cancelling headphones are incredible for the price.
Battery lasts 30 hours easily. Sound quality is excellent for music.
Slightly uncomfortable after 3+ hours. Overall a great buy."""

result = analyse_review(review)
print(result.rating)          # 4
print(result.sentiment)       # 'positive'
print(result.key_points)      # ['Excellent noise-cancelling', '30 hour battery', ...]
print(result.would_recommend) # True

Nested Models

from pydantic import BaseModel
from typing import Optional
import ollama

class Address(BaseModel):
    street: Optional[str] = None
    city: str
    country: str

class Person(BaseModel):
    full_name: str
    age: Optional[int] = None
    occupation: str
    address: Address

def extract_person(text: str) -> Person:
    response = ollama.chat(
        model='llama3.2',
        messages=[{'role':'user','content': f'Extract person info from:\n\n{text}'}],
        format=Person.model_json_schema(),
        options={'temperature': 0.0}
    )
    return Person.model_validate_json(response['message']['content'])

result = extract_person(
    'Dr. Sarah Chen, 34, works as a surgeon in Melbourne, Australia.'
)
print(result.full_name)         # 'Dr. Sarah Chen'
print(result.address.city)      # 'Melbourne'
print(result.address.country)   # 'Australia'

Batch Extraction with Error Handling

from pydantic import BaseModel, Field
from typing import Optional
import ollama

class JobPosting(BaseModel):
    title: str
    company: str
    location: str
    salary_min: Optional[int] = Field(None, description='Minimum salary in USD')
    salary_max: Optional[int] = Field(None, description='Maximum salary in USD')
    remote: bool = False
    skills: list[str] = Field(default_factory=list)

def extract_job_postings(texts: list[str], model: str = 'llama3.2') -> list[JobPosting | None]:
    results = []
    for text in texts:
        try:
            r = ollama.chat(
                model=model,
                messages=[{'role':'user','content': f'Extract job posting details:\n\n{text}'}],
                format=JobPosting.model_json_schema(),
                options={'temperature': 0.0}
            )
            results.append(JobPosting.model_validate_json(r['message']['content']))
        except Exception as e:
            print(f'Failed to extract: {e}')
            results.append(None)
    return results

Which Models Work Best

Ollama’s format parameter works with any model but quality varies. Llama 3.2 8B, Qwen2.5 7B, and Mistral 7B all handle structured output reliably for straightforward schemas. For complex nested schemas with many optional fields or constrained values, larger models (Mistral Nemo 12B, Qwen2.5 14B) produce more consistent results. The Approach 1 (JSON prompt) method is more sensitive to model quality than the native format parameter — smaller models are more likely to add unwanted text around the JSON when using prompts alone, so the native format parameter is strongly preferred when available.

Why Structured Output Changes How You Build with LLMs

Free-form text output requires your application to parse natural language — a fragile dependency on a model behaving consistently across all inputs. Structured output inverts this relationship: you define the data contract, and the model fills it. This shift is what makes LLMs genuinely useful as components in larger software systems rather than just interactive tools. When you can reliably extract ProductReview(rating=4, sentiment='positive', would_recommend=True) from any review text, you can build a review analysis pipeline that feeds into a database, a dashboard, or an automated reporting system without a parsing layer that breaks on every stylistic variation the model produces.

The practical impact is significant. Information extraction tasks that previously required fine-tuning a specialised NLP model — named entity recognition, relationship extraction, document classification, form parsing — can now be handled by a general-purpose 7B model with a Pydantic schema. The quality is not always equal to a fine-tuned specialist, but it is often good enough for production use, available immediately without training data, and flexible enough to handle new extraction tasks by changing the schema rather than retraining a model.

The format Parameter Internals

Ollama’s format parameter uses constrained decoding under the hood — it modifies the token generation process to only allow token sequences that are consistent with the provided JSON schema at each generation step. This is fundamentally more reliable than prompt-based approaches because the constraint is applied at the level of the model’s output distribution, not as a post-hoc filter. The model literally cannot produce tokens that would create an invalid JSON structure or violate field type constraints. This means you do not need to worry about the model adding explanatory text, using markdown code fences, or returning a partial JSON object — the schema constraint prevents all of these.

The constraint does not guarantee that extracted values are accurate — if the text says a salary is “competitive” and your schema requires an integer, the model will still try to produce a number but might hallucinate one. The format parameter constrains the structure, not the semantic accuracy. Always validate that extracted values are plausible for your use case, especially for required fields that may not always be present in the source text.

Handling Missing and Optional Fields

Real-world data extraction frequently encounters texts where some expected fields are absent. Pydantic’s Optional type and default values tell both Pydantic and the LLM’s schema generation that a field can be null or missing:

from pydantic import BaseModel, Field
from typing import Optional

class EventInfo(BaseModel):
    name: str                           # Required — always present
    date: Optional[str] = None          # Optional — null if not found
    location: Optional[str] = None      # Optional
    price: Optional[float] = None       # Optional
    capacity: Optional[int] = None      # Optional
    description: str = ''               # Default empty string
    tags: list[str] = Field(default_factory=list)  # Default empty list

# The LLM will set optional fields to null rather than inventing values
result = extract_with_format('Join us for the annual tech meetup this Friday!', EventInfo)
print(result.date)     # None (not mentioned)
print(result.name)     # 'Annual Tech Meetup'
print(result.price)    # None (not mentioned)

Using Optional types is important for production extraction pipelines. Without them, the model is forced to invent a value for every field regardless of whether the source text contains that information, which produces plausible-sounding but fabricated data. Optional fields with None defaults give the model the correct signal that absence is a valid response.

Validation with Field Constraints

Pydantic’s field validators integrate with Ollama’s format parameter to add semantic constraints beyond basic types:

from pydantic import BaseModel, Field, field_validator
from typing import Literal

class SentimentAnalysis(BaseModel):
    text_excerpt: str = Field(description='Most sentiment-revealing sentence')
    sentiment: Literal['positive', 'negative', 'neutral', 'mixed']
    confidence: float = Field(ge=0.0, le=1.0, description='Confidence score 0-1')
    emotions: list[Literal['joy', 'anger', 'fear', 'surprise', 'sadness', 'disgust']]
    intensity: int = Field(ge=1, le=5, description='Emotional intensity 1-5')

    @field_validator('confidence')
    @classmethod
    def validate_confidence(cls, v):
        return round(v, 2)  # Always 2 decimal places

response = ollama.chat(
    model='llama3.2',
    messages=[{'role':'user','content': f'Analyse: {text}'}],
    format=SentimentAnalysis.model_json_schema()
)
result = SentimentAnalysis.model_validate_json(response['message']['content'])
# sentiment is guaranteed to be one of the 4 valid values
# confidence is guaranteed to be between 0.0 and 1.0

Literal types are particularly useful for classification tasks — they constrain the model to a defined set of valid category values, which is exactly what you want for sentiment labels, priority levels, document types, or any other discrete classification task.

Building a Data Extraction Pipeline

The real value of Pydantic structured output emerges when you chain multiple extraction steps or process large volumes of text. A production extraction pipeline typically includes: schema definition with appropriate Optional fields, extraction function with the format parameter, Pydantic validation, storage to a database or structured file, and monitoring of extraction quality over time. The last step is often overlooked but important — extraction accuracy degrades when source text patterns shift in ways the model was not tested on, and logging validation errors and periodic spot-checks of extracted data are the practical quality controls that keep production pipelines reliable.

Structured Output vs Fine-Tuning for Extraction

Before the availability of reliable structured output in general-purpose LLMs, building a data extraction pipeline typically required fine-tuning a specialised model on labelled examples of your specific extraction task. This was time-consuming, required training data, and produced a model locked to that specific task. Structured output with a general 7–8B model handles many extraction tasks without any training, and can adapt to new extraction schemas in seconds by changing the Pydantic model definition.

When does fine-tuning still make sense? When accuracy requirements are very high and you have labelled training data, when the extraction involves domain-specific understanding the base model lacks (medical coding, legal citation parsing, specialised financial instruments), or when you need to process very high volumes efficiently and want to use a smaller model that has been specialised for your task. For most extraction tasks at moderate volume with acceptable accuracy requirements, structured output with a general model is the faster and more flexible path. Start here and only invest in fine-tuning if you hit accuracy limitations that cannot be addressed by improving your Pydantic schema, prompt, or model selection.

The OpenAI SDK Approach

If you prefer the OpenAI SDK’s interface, Pydantic structured output works identically through Ollama’s compatibility endpoint:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

class Summary(BaseModel):
    main_point: str
    key_facts: list[str]
    action_items: list[str]

completion = client.beta.chat.completions.parse(
    model='llama3.2',
    messages=[{'role':'user','content': f'Summarise: {document_text}'}],
    response_format=Summary
)
result = completion.choices[0].message.parsed
print(result.main_point)
print(result.key_facts)

The OpenAI SDK’s client.beta.chat.completions.parse method handles the schema conversion and Pydantic validation automatically, providing the same structured output experience as OpenAI’s API but pointed at your local Ollama instance.

Getting Started

The fastest path: define a Pydantic model for your extraction task, run a few examples through Ollama with the format parameter, and inspect the results. For most straightforward extraction tasks — contact info, product details, sentiment analysis, document classification — you will see reliable structured output on the first attempt. Adjust the schema (add Optional fields, change types, add descriptions using Field) to improve accuracy on edge cases. The Pydantic model itself serves as executable documentation of what your extraction pipeline expects, making it easier to maintain and extend over time than equivalent regex or string-parsing code.

Practical Patterns for Production Pipelines

Several patterns from production deployments of structured extraction pipelines are worth knowing before you build your own. First, always set temperature=0.0 for extraction tasks. Higher temperatures introduce randomness that can cause the model to invent field values or choose unexpected options from a Literal type — deterministic output is what you want when accuracy matters. Second, add descriptive text to your Pydantic fields using Field(description=...) — these descriptions are included in the JSON schema passed to Ollama and help the model understand the intended meaning of each field, which measurably improves accuracy on ambiguous extractions. Third, keep schemas focused — a schema with 20 fields extracts less accurately than three schemas of 6–7 fields each applied to the same text in sequence. The model allocates its attention across all fields simultaneously, and spreading it across too many fields at once degrades per-field accuracy.

Fourth, log every ValidationError in production. When Pydantic validation fails, it almost always signals either a model output format problem (fixable by improving the prompt or schema) or an input text edge case that your schema does not handle well (fixable by adding Optional types or better descriptions). ValidationError logs are your feedback loop for improving extraction quality over time — treat them as signals to act on rather than exceptions to suppress. Fifth, test your schema against a representative sample of real inputs before deploying, not just the clean examples you wrote the schema with in mind. Edge cases in real production text — unusual formats, missing standard fields, multilingual content, OCR errors — behave very differently from tidy test inputs.

Comparing Accuracy: format Parameter vs JSON Prompt

In practice, the native format parameter is more reliable than the JSON prompt approach for almost every use case. The prompt approach requires the model to interpret your instructions about format while also performing the extraction, which introduces two failure modes: format non-compliance (the model adds text around the JSON, uses markdown code fences, or omits required fields) and extraction inaccuracy (the model misinterprets a field). The format parameter eliminates format non-compliance entirely, leaving only extraction accuracy as the variable to optimise. For new structured output tasks, always start with the native format parameter and only fall back to the JSON prompt approach if you encounter a specific case where the constrained generation produces unexpectedly poor extraction results — which is rare for well-designed schemas on capable models.

When Structured Output Is Not the Right Tool

Structured output excels at extraction and classification — pulling defined fields from text, categorising documents, scoring attributes. It is less suitable for tasks that require generating novel content within a structure, such as generating a full product description with a specified format, where the model needs significant creative latitude that the schema might inadvertently constrain. For generation tasks where structure matters, a system prompt specifying the output format usually produces better results than a rigid JSON schema. The distinction is: if you are extracting information that exists in the source text, use structured output. If you are generating new content that should follow a structure, use a well-crafted system prompt instead.

Leave a Comment