How to Use Ollama with the Matrix Protocol

Puppeteer is Google’s Node.js library for controlling a headless Chrome browser. Paired with Ollama, it gives you a powerful combination for automating web tasks that require intelligence: scraping content that needs JavaScript execution, extracting structured data from dynamic pages, automating form submissions with AI-generated content, and building screenshot-based visual analysis pipelines. This guide covers the core Puppeteer and Ollama integration patterns using Node.js — from basic page scraping and summarisation to structured extraction, site monitoring, and visual analysis with a vision model.

Puppeteer and Playwright serve similar purposes — both control a browser programmatically — but Puppeteer has been around longer, has a larger ecosystem of community plugins, and is the de facto standard in the Node.js scraping community. If you are already in a Node.js project or ecosystem, Puppeteer is the natural choice.

Setup

npm init -y
npm install puppeteer node-fetch
ollama pull llama3.2
ollama pull llava  # for vision analysis

Puppeteer downloads a compatible version of Chromium automatically during installation. The node-fetch package provides the fetch API for calling the Ollama HTTP endpoint from Node.js.

Scraping and Summarising

The fundamental pattern: launch a browser, navigate to a URL, extract text, send to Ollama:

const puppeteer = require('puppeteer');
const fetch = require('node-fetch');

const OLLAMA = 'http://localhost:11434';
const MODEL = 'llama3.2';

async function ask(prompt) {
  const resp = await fetch(`${OLLAMA}/api/chat`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: MODEL,
      messages: [{ role: 'user', content: prompt }],
      stream: false
    })
  });
  const data = await resp.json();
  return data.message.content;
}

async function scrapeAndSummarise(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'domcontentloaded' });

  const text = await page.evaluate(() => {
    document.querySelectorAll('nav,footer,header,aside,script,style').forEach(el => el.remove());
    return document.body.innerText;
  });

  await browser.close();

  return ask(`Summarise this web page in 5 bullet points:

${text.slice(0, 6000)}`);
}

scrapeAndSummarise('https://example.com')
  .then(console.log)
  .catch(console.error);

The headless: 'new' option uses Puppeteer’s newer headless mode introduced in Chrome 112, which is more stable and less detectable than the legacy headless mode. Removing nav, footer, header, aside, script, and style elements before extracting text keeps the content clean and prevents boilerplate from consuming context window tokens.

Structured Data Extraction

Extract structured data from pages using Ollama’s JSON schema mode:

async function extractProducts(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });
  const text = await page.evaluate(() => document.body.innerText);
  await browser.close();

  const schema = {
    type: 'object',
    properties: {
      products: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            price: { type: 'number' },
            currency: { type: 'string' },
            in_stock: { type: 'boolean' },
            rating: { type: 'number' }
          },
          required: ['name', 'price']
        }
      }
    }
  };

  const resp = await fetch(`${OLLAMA}/api/chat`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: MODEL,
      messages: [{ role: 'user', content: `Extract products from:

${text.slice(0, 8000)}` }],
      format: schema,
      stream: false
    })
  });
  const data = await resp.json();
  return JSON.parse(data.message.content).products;
}

The networkidle2 wait condition waits until there are no more than 2 active network requests for at least 500ms, which ensures dynamic content loaded via API calls has finished rendering before the text is extracted. This is more reliable than domcontentloaded for JavaScript-heavy pages where the initial HTML is just a loading skeleton.

Screenshot Analysis with a Vision Model

Take a screenshot and send it to a vision-capable model for visual analysis:

const fs = require('fs');

async function analyseScreenshot(url, question = 'Describe what you see on this page') {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 900 });
  await page.goto(url, { waitUntil: 'networkidle2' });
  const screenshot = await page.screenshot({ encoding: 'base64' });
  await browser.close();

  const resp = await fetch(`${OLLAMA}/api/chat`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'llava',
      messages: [{
        role: 'user',
        content: question,
        images: [screenshot]
      }],
      stream: false
    })
  });
  const data = await resp.json();
  return data.message.content;
}

analyseScreenshot('https://example.com', 'What is the main call to action on this page?')
  .then(console.log);

Setting a consistent viewport with setViewport before navigating ensures screenshots are reproducible and comparable across runs. This is important for monitoring use cases where you want to detect visual changes to a page over time. The encoding: 'base64' option returns the screenshot as a base64 string ready to pass directly to Ollama without any file I/O.

Batch Scraping with Concurrency Control

Process multiple URLs concurrently with a limit on simultaneous browser pages:

async function batchScrape(urls, maxConcurrent = 3) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const semaphore = { count: 0, queue: [] };

  const acquire = () => new Promise(resolve => {
    if (semaphore.count < maxConcurrent) {
      semaphore.count++;
      resolve();
    } else {
      semaphore.queue.push(resolve);
    }
  });

  const release = () => {
    semaphore.count--;
    if (semaphore.queue.length > 0) {
      semaphore.count++;
      semaphore.queue.shift()();
    }
  };

  const scrapeOne = async (url) => {
    await acquire();
    try {
      const page = await browser.newPage();
      await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 20000 });
      const text = await page.evaluate(() => document.body.innerText);
      await page.close();
      const summary = await ask(`Summarise in 2 sentences: ${text.slice(0, 4000)}`);
      return { url, summary, ok: true };
    } catch (e) {
      return { url, error: e.message, ok: false };
    } finally {
      release();
    }
  };

  const results = await Promise.all(urls.map(scrapeOne));
  await browser.close();
  return results;
}

Sharing a single browser instance across all pages is more efficient than launching a new browser per URL — Chromium startup time adds up across many URLs. Each URL gets its own page (tab) within the shared browser, isolated from other pages but sharing the browser process. The semaphore limits active pages to maxConcurrent, balancing throughput against memory usage.

AI-Powered Form Automation

Use Ollama to generate context-appropriate content for form fields — useful for testing forms with realistic data or automating content submission workflows you own:

async function fillFormWithAI(url, formContext) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.goto(url);

  // Discover form fields
  const fields = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('input,textarea')).map(el => ({
      name: el.name || el.id || el.placeholder,
      type: el.type,
      placeholder: el.placeholder
    })).filter(f => f.name && f.type !== 'hidden' && f.type !== 'submit');
  });

  // Ask Ollama to generate appropriate values
  const generated = JSON.parse(await ask(
    `Generate realistic test data for these form fields: ${JSON.stringify(fields)}.
    Context: ${formContext}.
    Return as JSON object with field names as keys.`
  ));

  // Fill the fields
  for (const [name, value] of Object.entries(generated)) {
    const selector = `[name="${name}"],[id="${name}"]`;
    await page.focus(selector).catch(() => {});
    await page.type(selector, String(value), { delay: 50 }).catch(() => {});
  }

  await browser.close();
  return generated;
}

This pattern discovers form fields dynamically rather than hardcoding selectors, making it reusable across different forms. Always use this only on forms you own or have explicit permission to automate — automated form submission on third-party sites without permission violates terms of service and potentially applicable laws.

Puppeteer vs Playwright for Ollama Workflows

Both Puppeteer and Playwright control Chromium browsers from JavaScript, but there are meaningful differences. Playwright supports multiple browsers (Chromium, Firefox, WebKit) from a single API and has better support for the async patterns common in modern JavaScript. Puppeteer is Chromium-only but has a larger ecosystem of plugins for specific tasks like PDF generation, ad blocking, and stealth mode (making the browser less detectable as automated). For new projects, Playwright is generally the better choice for its multi-browser support and more ergonomic async API. For projects that need specific Puppeteer plugins or that are already using Puppeteer, the Ollama integration patterns are identical — the only differences are in the browser control API, not in how Ollama is called.

The Ollama API call is identical regardless of whether you use Puppeteer or Playwright to collect the content — a POST request to /api/chat with the extracted text as the user message. The browser automation layer and the LLM layer are completely decoupled, so you can swap one without changing the other. Choose the browser automation library based on your existing stack and specific requirements, and use the Ollama integration patterns from this guide with either library.

Multi-Turn Conversation and Mention Filtering

Add conversation history per room and respond only when the bot is mentioned. Use a defaultdict of deques to store per-room history with a rolling window of 20 messages. In direct message rooms (two participants), respond to all messages. In group rooms, respond only when the bot’s display name appears in the message body. Strip the mention before passing the prompt to Ollama so the model sees a clean question without the bot’s name prefixed. Append both the user message and the assistant reply to the room history after each exchange, so subsequent messages have full conversational context.

Add a !reset command that clears the room history, giving users a way to start fresh. Make the reset command ephemeral by sending a confirmation message that is not added to the history, so the next conversation starts from a clean slate without the reset message appearing as context.

Session Persistence

matrix-nio supports persistent sessions via a SQLite store, which is essential for production bots. Without persistence, the bot loses its sync token on restart and replays all recent messages — potentially responding to messages that were sent hours ago. Add the store to the client constructor:

from nio import AsyncClient, SqliteStore

client = AsyncClient(
    HOMESERVER,
    USERNAME,
    store_path="./bot_store/",
    config=AsyncClientConfig(store_sync_tokens=True)
)
# Login and restore state
resp = await client.login(PASSWORD)
if client.should_upload_keys:
    await client.keys_upload()

The SQLite store saves the sync token, device keys, and room state between restarts. The bot resumes from exactly where it left off, processing only new messages that arrived after the last sync. This also enables end-to-end encryption support — matrix-nio handles E2E encryption automatically when the store is configured, allowing the bot to participate in encrypted rooms.

End-to-End Encryption

Matrix supports end-to-end encrypted rooms, and matrix-nio handles encryption transparently once the store is configured. To enable E2E support, install the cryptography dependencies: pip install matrix-nio[e2e]. The bot will automatically decrypt incoming encrypted messages and send encrypted replies in rooms that have encryption enabled.

E2E encryption in Matrix bots requires device verification — other users in the room need to trust the bot’s device key for the encryption to work correctly. In practice, for a personal or team bot on a homeserver you control, you can trust the bot’s device from the Element client using the device verification flow. For a public bot, document the device verification steps clearly so users know how to verify the bot before starting encrypted conversations. The privacy benefit of E2E encryption is significant — even if someone gains access to the homeserver’s message database, the encrypted messages cannot be read without the bot’s private device key.

Running the Bot as a Service

Run the bot as a systemd service for reliable operation on a Linux server:

[Unit]
Description=Matrix Ollama Bot
After=network.target ollama.service

[Service]
Type=simple
User=youruser
WorkingDirectory=/home/youruser/matrix-bot
ExecStart=/home/youruser/matrix-bot/venv/bin/python bot.py
Restart=on-failure
RestartSec=10
EnvironmentFile=/home/youruser/matrix-bot/.env

[Install]
WantedBy=multi-user.target

Enable with sudo systemctl enable --now matrix-ollama-bot. The After=ollama.service directive ensures the bot starts only after Ollama is available, preventing connection errors on boot. Logs are accessible with journalctl -u matrix-ollama-bot -f. The EnvironmentFile directive loads the .env file containing the homeserver credentials, keeping sensitive values out of the unit file.

Why Matrix for an Ollama Bot

Matrix offers several advantages over platform-specific messaging APIs for a local LLM bot. The protocol is open and decentralised — you are not dependent on a single company’s API remaining stable or affordable. You can bridge Matrix rooms to other platforms (Discord, Slack, Telegram, IRC) using the wide range of available bridges, so your Ollama bot becomes accessible from multiple platforms through a single Matrix room. The protocol supports end-to-end encryption natively, making it suitable for sensitive conversations. And running your own homeserver means you have full control over user data, message retention, and access policies — the bot never sends data to a third-party service beyond the homeserver you control.

For teams already using Matrix as their primary communication platform, adding an Ollama bot to the infrastructure is a natural extension. The bot can be invited to specific rooms — a project room, a support room, a general knowledge room — and configured with different system prompts per room to serve different purposes. A development room bot might have a system prompt focused on code review and technical questions; a project management room bot might be oriented toward summarising discussions and tracking action items; a general room bot might simply answer questions about company policies and documentation.

Compared to WhatsApp and Discord bots, Matrix has the most favourable terms of service for automation — the protocol explicitly supports bots and integrations, the homeserver admin controls access, and there are no restrictions on the volume of automated messages within your own infrastructure. For organisations that prioritise data sovereignty, open protocols, and self-hosting, Matrix paired with Ollama is the best available foundation for a private, capable, and fully controlled AI assistant.

Inviting the Bot to Rooms

Once the bot is running and logged in, invite it to rooms from any Matrix client using the standard invite flow — search for the bot’s Matrix ID (@botname:homeserver.org) and send the invite. The bot can be configured to auto-accept invites by handling the InviteMemberEvent in matrix-nio and calling client.join(room_id) when the event sender matches a list of trusted users. For a personal bot, auto-accept invites only from your own Matrix ID. For a team bot, auto-accept from anyone on the same homeserver. For a public bot, require manual approval to prevent unwanted rooms from consuming the bot’s resources.

You can also configure the bot to leave rooms when it is kicked or when the room becomes inactive — check the member count periodically and leave rooms where no messages have been sent in the past 30 days. This keeps the bot’s room list manageable and prevents it from accumulating stale rooms that consume sync bandwidth without providing any value. The matrix-nio API makes both joining and leaving rooms straightforward, and combining these automated behaviours with the Ollama integration gives you a bot that is genuinely self-managing once deployed.

The combination of Matrix’s open federated protocol, self-hosted homeservers, and end-to-end encryption with Ollama’s local inference makes this the most privacy-respecting architecture available for an AI messaging bot. Your conversations stay on your infrastructure, your model runs on your hardware, and the only external dependency is the Matrix federation protocol itself — which you can eliminate entirely by running a fully isolated homeserver with no federation enabled. For individuals and organisations where data sovereignty is a genuine requirement rather than just a preference, this stack is hard to beat.

Start with a direct message bot on your own homeserver, get the Ollama integration working reliably, then expand to group rooms and add encryption support as your confidence grows.