How to Use Ollama with Excel and Python

Excel spreadsheets are the universal format for business data, and adding local LLM analysis to your Excel workflows means you can summarise reports, classify data, generate insights, and answer questions about spreadsheet content — all without sending your data to a cloud API. This guide covers using Python with pandas, openpyxl, and Ollama to analyse Excel files, generate commentary on data, classify rows, and produce natural-language reports from structured tabular data.

All processing happens locally. Ollama provides the language model capabilities and Python handles all the spreadsheet I/O and data wrangling. Sensitive financial data, HR records, and customer information never leave your machine.

Setup

pip install openpyxl pandas httpx tabulate
ollama pull llama3.2

The tabulate package formats DataFrames as clean Markdown tables that LLMs parse reliably — more so than raw CSV or space-aligned text.

Analysing a Spreadsheet

Load an Excel file with pandas and send a structured summary to Ollama:

import pandas as pd, httpx

OLLAMA_URL = "http://localhost:11434"
MODEL = "llama3.2"

def ask(prompt: str) -> str:
    with httpx.Client(timeout=120) as client:
        resp = client.post(f"{OLLAMA_URL}/api/chat",
                           json={"model": MODEL,
                                 "messages": [{"role": "user", "content": prompt}],
                                 "stream": False})
    return resp.json()["message"]["content"]

def analyse_excel(path: str) -> str:
    df = pd.read_excel(path)
    summary = f"""Shape: {df.shape[0]} rows x {df.shape[1]} columns
Columns: {list(df.columns)}
Types:
{df.dtypes.to_string()}
Missing:
{df.isnull().sum().to_string()}
Stats:
{df.describe().to_string()}
Sample:
{df.head(5).to_markdown(index=False)}"""
    return ask(f"Analyse this spreadsheet:
1. What does the data represent?
2. Key observations
3. Data quality issues
4. Suggested analyses

{summary}")

The summary packs the essential metadata — shape, types, missing values, statistics, and a 5-row sample — into a compact format. For a 10,000-row spreadsheet this is typically under 2,000 tokens, leaving plenty of room for the model’s analysis in the response.

Classifying Rows

Add a classification column to a DataFrame using Ollama with JSON schema for reliable output:

import json
from tqdm import tqdm

def classify_rows(df: pd.DataFrame, text_col: str, categories: list[str]) -> pd.DataFrame:
    schema = {
        "type": "object",
        "properties": {
            "category": {"type": "string", "enum": categories},
            "confidence": {"type": "number"}
        },
        "required": ["category", "confidence"]
    }
    results = []
    for text in tqdm(df[text_col].fillna(""), desc="Classifying"):
        with httpx.Client(timeout=30) as client:
            resp = client.post(f"{OLLAMA_URL}/api/chat", json={
                "model": MODEL,
                "messages": [{"role": "user",
                              "content": f"Classify: {str(text)[:400]}"}],
                "format": schema,
                "stream": False
            })
        data = json.loads(resp.json()["message"]["content"])
        results.append(data["category"])
    df = df.copy()
    df["category"] = results
    return df

Schema-constrained output guarantees the model always returns one of the defined categories, eliminating the need to post-process free-form responses. The confidence score gives you a way to flag borderline classifications for human review — filter for rows where confidence is below 0.7 and review those manually.

Generating a Business Report

Generate a full written report from spreadsheet data:

def generate_report(df: pd.DataFrame, context: str = "") -> str:
    metrics = []
    for col in df.select_dtypes(include="number").columns:
        metrics.append(f"{col}: total={df[col].sum():.2f}, mean={df[col].mean():.2f}, max={df[col].max():.2f}")
    prompt = f"""Write a professional business report.
{'Context: ' + context if context else ''}

Metrics:
{chr(10).join(metrics)}

Data sample:
{df.head(10).to_markdown(index=False)}

Include: executive summary, key findings, and recommendations."""
    return ask(prompt)

# Usage
df = pd.read_excel("sales_q1.xlsx")
report = generate_report(df, "Q1 2026 EMEA sales data")
print(report)
with open("report.txt", "w") as f:
    f.write(report)

Including the actual numeric metrics alongside the Markdown table gives the model concrete figures to cite in the report, producing more specific and accurate commentary than when only a visual sample is provided. Always cross-check any numbers the model cites against the source data before sharing the report — LLMs occasionally misquote specific figures even when the surrounding analysis is sound.

Interactive Q&A Over a Spreadsheet

Build a simple Q&A class that loads a spreadsheet once and answers multiple questions about it:

class ExcelQA:
    def __init__(self, path: str):
        self.df = pd.read_excel(path)
        self.context = f"""Columns: {list(self.df.columns)}
Shape: {self.df.shape}
Statistics:
{self.df.describe().to_markdown()}
Sample:
{self.df.head(5).to_markdown(index=False)}"""

    def ask(self, question: str) -> str:
        return ask(f"Answer this question about the spreadsheet accurately.

{self.context}

Question: {question}")

qa = ExcelQA("sales.xlsx")
print(qa.ask("Which product had the highest revenue?"))
print(qa.ask("Are there any unusual spikes in the monthly figures?"))
print(qa.ask("What trends are visible in the data?"))

The context is built once at initialisation and reused for every question, making subsequent queries fast. For large spreadsheets with many columns, select only the relevant ones before building the context — including 50 columns when the analysis concerns 5 of them wastes the context window and dilutes the model’s focus on what matters.

Writing Results Back to Excel

Write Ollama’s output back into Excel with formatting using openpyxl:

from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font

def enrich_excel(input_path: str, output_path: str, text_col: str, out_col: str):
    df = pd.read_excel(input_path)
    summaries = []
    for text in tqdm(df[text_col].fillna(""), desc="Enriching"):
        summaries.append(ask(f"Summarise in one sentence: {str(text)[:400]}") if text.strip() else "")
    df[out_col] = summaries
    df.to_excel(output_path, index=False)

    wb = load_workbook(output_path)
    ws = wb.active
    col_idx = list(df.columns).index(out_col) + 1
    ws.cell(1, col_idx).font = Font(bold=True)
    ws.cell(1, col_idx).fill = PatternFill("solid", fgColor="FFFF99")
    wb.save(output_path)
    print(f"Saved {output_path}")

Writing back with openpyxl gives you control over cell formatting — bold headers, background colours, column widths — so the output looks professional rather than like a plain data dump. This matters when the enriched spreadsheet will be shared with colleagues who expect business-ready formatting.

Practical Tips

Format numbers with appropriate units in your prompts — “revenue in USD millions” is clearer than raw floats. For date columns, convert them to human-readable strings before including in prompts — “January 2026” is more parseable than a Unix timestamp or an Excel serial number. For sheets with many tabs, process each sheet separately and synthesise the results in a final call. Cache Ollama responses keyed on the cell content hash to avoid re-processing unchanged rows when you rerun a classification on an updated spreadsheet — this can save significant time on large files where only a small percentage of rows have changed since the last run.

Working with Multiple Sheets

Many real-world Excel files contain multiple sheets — a summary sheet, monthly breakdowns, a raw data tab, and lookup tables. When analysing these files, treat each sheet as a separate data source and process them individually before synthesising the results. Use pd.ExcelFile(path).sheet_names to list all sheets, then loop over them with pd.read_excel(path, sheet_name=name). Build a brief summary of each sheet’s content and pass all the summaries together to Ollama in a single final call, asking it to synthesise the findings across sheets into a coherent overall analysis.

For files where the sheets follow a consistent structure — one sheet per month, one sheet per region, one sheet per product — a loop that processes each sheet and writes results to a new tab in an output workbook is a clean approach. openpyxl’s Workbook class lets you create sheets programmatically and control their names, order, and formatting. The output workbook can include the original data sheets alongside new AI-generated summary sheets, keeping everything in a single file that is easy to share.

Detecting Anomalies with Ollama

LLMs are surprisingly good at spotting anomalies in tabular data when you frame the prompt correctly. Rather than asking the model to perform statistical analysis (which is better done with pandas), ask it to identify patterns that look unusual given the context of the data. Send a sample of rows alongside the statistical summary and ask: “Which rows look unusual or inconsistent with the rest of the data, and why?” The model draws on its broad knowledge of what reasonable business data looks like — typical revenue ranges, expected date orderings, plausible quantities and prices — to flag rows that a simple outlier detection algorithm would miss because the anomaly is contextual rather than purely statistical.

This approach works particularly well for data quality auditing: sending 50 random rows alongside the summary statistics and asking the model to flag anything that looks wrong catches a wide range of issues — inconsistent date formats, values in the wrong column, implausible combinations of fields, and obvious data entry errors. It is not a replacement for proper data validation rules, but it is a fast first pass that catches the kind of issues that slip through automated checks because they require domain knowledge to recognise.

Generating Formulas and Transformation Code

One of the most productive uses of Ollama in an Excel workflow is generating pandas transformation code or Excel formulas. Describe what you want to do with the data — “calculate a 7-day rolling average of the Revenue column, grouped by Region” or “create a pivot table showing average conversion rate by channel and month” — and ask Ollama for the pandas code to do it. For Excel users who are less comfortable with Python, ask for the equivalent Excel formula instead. The model produces working code for the vast majority of common data manipulation tasks, and having it generate the code from a plain English description is faster than looking up the syntax in documentation.

Include a description of your DataFrame’s columns and dtypes in the prompt so the generated code uses the correct column names and handles the right data types. A prompt like “My DataFrame has columns: Date (datetime), Region (string), Revenue (float), Units (int). Write pandas code to calculate month-over-month revenue growth by Region” produces code that is specific to your data structure and usually runs correctly on the first attempt. Ask the model to add comments explaining each step, and you get code that is both functional and readable for future maintenance.

Excel vs CSV for Ollama Workflows

CSV files are simpler to work with than Excel in Python — no need for openpyxl, no sheet management, and faster loading with pandas. If your workflow allows it, export Excel files to CSV before processing with Ollama. The text content is identical from the model’s perspective, and CSV avoids the occasional surprises that come from Excel-specific features like merged cells, conditional formatting, and named ranges that pandas and openpyxl handle inconsistently.

Excel is the right format to work with directly when you need to preserve the workbook structure — multiple sheets, formulas, formatting — or when the output needs to go back to an Excel user in a format they can open and edit without conversion. openpyxl handles these cases well, and the combination of pandas for data processing and openpyxl for formatted output covers the full range of Excel automation scenarios that come up in practice.

Scheduling Automated Excel Reports

Combine the patterns above into a scheduled reporting pipeline: a Python script that reads a source Excel file (or queries a database and writes to Excel), runs Ollama analysis, generates a written report, and saves the output to a new file with a timestamped name. Schedule it with cron on Linux or Task Scheduler on Windows to run weekly, daily, or whenever the source data updates. Email the output file automatically using Python’s smtplib or a service like SendGrid.

This is a genuinely useful automation that many teams pay for in the form of expensive BI tools or consultant time. A Python script that reads your CRM export, classifies deals by stage and sentiment from the notes field, generates a summary of pipeline health, and emails a formatted Excel report every Monday morning takes a few hours to build and then runs indefinitely without maintenance. The local Ollama integration keeps the costs at zero beyond the electricity for the GPU, and sensitive deal data never leaves your infrastructure.

Choosing the Right Model for Excel Work

For general spreadsheet analysis — summarising data, generating reports, answering questions about business metrics — llama3.2 handles the task well and produces clear, well-structured prose that reads like a human analyst wrote it. For spreadsheets containing code, technical specifications, or data science output — model evaluation results, feature importance tables, experiment logs — qwen2.5-coder:7b understands the domain better and produces more technically accurate commentary.

For classification tasks on large DataFrames where you need to process thousands of rows, model speed matters more than quality for simple categories. A 3B model that classifies a sentiment in under a second is more practical for batch processing than an 8B model that takes five seconds per row when you have 10,000 rows to classify. Reserve the larger, slower model for the final synthesis and report generation steps where quality matters most and you are only making a handful of calls.

The combination of pandas for data manipulation, Ollama for language understanding, and openpyxl for formatted output covers the full range of AI-powered Excel automation. Whether you are building a one-off analysis script, a recurring report generator, or an interactive Q&A tool over your business data, the patterns in this guide give you a complete foundation to build on — entirely locally, entirely under your control, and entirely free of per-query API costs.

Start with the analyse function to understand what is in a new spreadsheet, add classification when you need to categorise rows at scale, and build the report generator when stakeholders need written summaries rather than raw data. Each capability is independent and can be added incrementally as your needs grow, without rearchitecting the code you have already written.

Once the pipeline is working, the biggest wins come from prompt refinement rather than code changes — a better description of what the data represents and what the reader needs from the report consistently produces more useful output than switching models or tuning parameters.

Leave a Comment