How to Build a Local AI Slack Bot with Ollama

A Slack bot backed by a local LLM gives your team an AI assistant that lives inside your existing workspace, handles questions, summarises threads, and processes requests — without sending any messages to a cloud AI service. This guide covers building a complete Slack bot using Slack’s Bolt framework for Node.js and Ollama for inference, deployed either locally or on a team server.

Prerequisites

  • Ollama running with a model pulled (ollama pull llama3.2)
  • A Slack workspace where you can install apps
  • Node.js 18+ and npm

Creating the Slack App

Go to api.slack.com/apps and click Create New App → From scratch. Give it a name and select your workspace. Under OAuth & Permissions, add these Bot Token Scopes: app_mentions:read, channels:history, chat:write, im:history, im:write. Under Event Subscriptions, enable events and add app_mention and message.im. Install the app to your workspace and copy the Bot Token (starts with xoxb-) and Signing Secret.

Project Setup

mkdir ollama-slack-bot && cd ollama-slack-bot
npm init -y
npm install @slack/bolt ollama dotenv
# .env
SLACK_BOT_TOKEN=xoxb-your-token
SLACK_SIGNING_SECRET=your-signing-secret
SLACK_APP_TOKEN=xapp-your-app-token  # for Socket Mode
OLLAMA_MODEL=llama3.2
OLLAMA_HOST=http://localhost:11434

Basic Bot: Respond to Mentions

// app.js
require('dotenv').config();
const { App } = require('@slack/bolt');
const ollama = require('ollama').default;

const app = new App({
  token: process.env.SLACK_BOT_TOKEN,
  signingSecret: process.env.SLACK_SIGNING_SECRET,
  socketMode: true,
  appToken: process.env.SLACK_APP_TOKEN,
});

// Respond when @mentioned in a channel
app.event('app_mention', async ({ event, say }) => {
  // Remove the bot mention from the message
  const userMessage = event.text.replace(/<@[A-Z0-9]+>/g, '').trim();
  if (!userMessage) return say('How can I help?');

  try {
    // Show typing indicator
    await say({ text: '_Thinking..._' });

    const response = await ollama.chat({
      model: process.env.OLLAMA_MODEL,
      messages: [{ role: 'user', content: userMessage }],
      options: { temperature: 0.7 }
    });

    await say(response.message.content);
  } catch (err) {
    console.error(err);
    await say('Sorry, I encountered an error.');
  }
});

// Respond to direct messages
app.message(async ({ message, say }) => {
  if (message.bot_id) return; // ignore messages from bots
  const response = await ollama.chat({
    model: process.env.OLLAMA_MODEL,
    messages: [{ role: 'user', content: message.text }]
  });
  await say(response.message.content);
});

(async () => {
  await app.start();
  console.log('Ollama Slack bot running!');
})();

Conversation History

// Maintain per-user conversation history
const userHistories = new Map();
const MAX_HISTORY = 10; // keep last 10 exchanges

app.event('app_mention', async ({ event, say, client }) => {
  const userId = event.user;
  const userMessage = event.text.replace(/<@[A-Z0-9]+>/g, '').trim();

  // Get or initialise history for this user
  if (!userHistories.has(userId)) {
    userHistories.set(userId, [
      { role: 'system', content: 'You are a helpful assistant for a software team. Be concise and technical when appropriate.' }
    ]);
  }
  const history = userHistories.get(userId);
  history.push({ role: 'user', content: userMessage });

  const response = await ollama.chat({
    model: process.env.OLLAMA_MODEL,
    messages: history
  });

  const reply = response.message.content;
  history.push({ role: 'assistant', content: reply });

  // Trim to max history length (keep system message)
  if (history.length > MAX_HISTORY * 2 + 1) {
    history.splice(1, 2); // remove oldest exchange
  }

  await say(`<@${userId}> ${reply}`);
});

Thread Summarisation Command

// Slash command: /summarise (summarise the current channel's recent messages)
app.command('/summarise', async ({ command, ack, respond, client }) => {
  await ack();

  // Fetch last 20 messages from the channel
  const result = await client.conversations.history({
    channel: command.channel_id,
    limit: 20
  });

  const messages = result.messages
    .reverse()
    .filter(m => !m.bot_id && m.text)
    .map(m => m.text)
    .join('\n');

  if (!messages) return respond('No recent messages to summarise.');

  const response = await ollama.chat({
    model: process.env.OLLAMA_MODEL,
    messages: [{
      role: 'user',
      content: `Summarise these Slack messages in 3-5 bullet points:\n\n${messages}`
    }]
  });

  await respond(response.message.content);
});

Running the Bot

# Development
node app.js

# With auto-restart
npm install -g nodemon
nodemon app.js

# As a systemd service (on Linux server)
# Create /etc/systemd/system/slack-bot.service
# ExecStart=node /path/to/app.js

Socket Mode vs HTTP Mode

The example above uses Socket Mode — a persistent WebSocket connection from your bot to Slack’s servers. This is the easiest setup because it does not require a public URL or ngrok tunnel: your bot connects out to Slack rather than Slack connecting in to your server. Socket Mode requires an App-Level Token (xapp-) in addition to the Bot Token. For production deployments on a server with a public IP, HTTP mode (using a public webhook endpoint) is more robust and does not require a persistent WebSocket connection, but requires configuring a public URL and TLS.

Why a Local LLM Slack Bot?

Most teams already use Slack as the communication hub where questions get asked, decisions get made, and context gets shared. A bot that lives in that environment — answering questions, summarising threads, helping with writing tasks — removes the friction of context-switching to a separate AI chat interface. The local inference piece adds two benefits on top of standard cloud-backed bots: every message processed stays within your infrastructure (no Slack message content going to OpenAI), and there is no per-query cost as your team uses it throughout the day.

The practical use cases that work best with a Slack bot format are those where the answer is needed in context — during a discussion rather than in a separate tab. Answering quick technical questions, summarising long threads before someone joins a meeting, drafting a message in a specific tone, or looking something up in internal documentation all fit naturally into Slack’s conversational format. The bot becomes part of the team’s workflow rather than a separate tool they have to remember to use.

Setting Up Socket Mode Properly

Socket Mode requires an App-Level Token in addition to the Bot Token. In your Slack app settings, navigate to Basic Information → App-Level Tokens and click Generate Token. Give it a name and add the connections:write scope. The token it generates (starting with xapp-) is the SLACK_APP_TOKEN in your .env file. This token is separate from the Bot Token (xoxb-) and is specifically for establishing the WebSocket connection. Keep both tokens secret — the App Token allows creating connections on behalf of your app, which could be abused if leaked.

The Socket Mode approach is strongly recommended for development and for bots running on internal servers without public IP addresses. It eliminates the need for ngrok tunnels during development, does not require TLS certificate setup, and works from behind firewalls and NAT without any network configuration changes. The tradeoff is a persistent outbound WebSocket connection — if your server has strict egress filtering, you need to allow WebSocket connections to Slack’s Socket Mode endpoints.

Privacy Considerations

Before deploying a Slack bot that processes messages, your team should understand what the bot sees and what it does with that information. In the implementation above, message content is sent to Ollama running locally — no message content leaves your infrastructure. The bot token, however, grants access to read messages in channels it is added to. Keep the token secure, rotate it if there is any possibility of compromise, and follow Slack’s guidance on token storage (never commit tokens to version control). For teams with strict compliance requirements, document the data flow explicitly: messages travel from Slack’s servers to your bot process to local Ollama inference, with no third-party AI service involved.

Consider which channels the bot should join carefully. Adding a bot to every channel gives it access to all conversations in those channels. A more conservative approach is to create a dedicated #ai-assistant channel where team members explicitly bring questions to the bot, rather than having it listen in all channels. This limits the bot’s access scope and makes it clear to team members when their messages might be processed by the bot.

Extending the Bot: Document Q&A

The most powerful extension is giving the bot access to internal documentation. Pre-embed your team’s documentation (wikis, runbooks, product specs) with nomic-embed-text and store the embeddings in a vector store. When the bot receives a question, retrieve the most relevant document chunks and include them in the context alongside the user’s question:

const { Chroma } = require('chromadb');
const ollamaClient = require('ollama').default;

const chromaClient = new Chroma();
const collection = await chromaClient.getCollection({ name: 'team-docs' });

async function answerWithContext(question) {
  // Get embedding for the question
  const embed = await ollamaClient.embeddings({
    model: 'nomic-embed-text',
    prompt: question
  });

  // Find relevant docs
  const results = await collection.query({
    queryEmbeddings: [embed.embedding],
    nResults: 3
  });

  const context = results.documents[0].join('\n\n');

  // Answer with context
  const response = await ollamaClient.chat({
    model: process.env.OLLAMA_MODEL,
    messages: [
      { role: 'system', content: `Answer using this context:\n\n${context}` },
      { role: 'user', content: question }
    ]
  });
  return response.message.content;
}

Rate Limiting and Queue Management

On a busy team Slack, multiple people might send questions to the bot simultaneously. Since Ollama processes requests sequentially by default, rapid simultaneous requests queue up — the second user’s response does not start generating until the first is complete. For small teams this is acceptable, and Bolt handles the queueing gracefully by holding the Slack event open until the response is ready (up to 3 seconds before a timeout, after which you should use the respond URL pattern instead of direct say). For larger teams, configure OLLAMA_NUM_PARALLEL=2 or higher on the Ollama server, which allows concurrent requests at the cost of higher VRAM usage, and use a job queue in the bot to manage request ordering and user feedback during wait times.

Deployment on a Team Server

For a bot that runs continuously and is available to the whole team, deploy it on the same server as Ollama. A simple systemd service (following the same pattern as the Ollama systemd article) keeps the bot running across reboots and restarts it on failure. Store the .env file with appropriate file permissions (readable only by the service user), and configure the service to load the environment from that file. The bot process is lightweight — it uses minimal CPU and memory when idle, with resource spikes only during inference. It can run comfortably on the same machine as Ollama without competing significantly for resources.

Model Selection for a Team Bot

The best model for a Slack bot is one that responds quickly and follows instructions reliably — quality on individual responses matters less than consistency and speed, because Slack is an interactive medium where users expect faster feedback than they might in a dedicated chat interface. Llama 3.2 8B is the default recommendation: good instruction following, solid general knowledge, fast on a mid-range GPU. For a team that primarily asks technical questions, Qwen2.5 7B or Qwen2.5-Coder 7B perform better on code-adjacent questions. For a team that frequently asks about internal documentation, configure a 32K context window model (Mistral Nemo or Llama 3.2 with num_ctx=32768) to handle long document chunks in the RAG pipeline without truncation.

Set the system prompt to match your team’s context. A generic “You are a helpful assistant” is a weak starting point compared to “You are an assistant for a software engineering team at [Company]. You know our tech stack is Python, PostgreSQL, and AWS. Be concise and technical. When unsure, say so.” The system prompt is invisible to users but shapes every response — investing 15 minutes in a good system prompt measurably improves the bot’s usefulness for your team’s specific questions compared to the generic default.

Getting Started

The minimum viable Slack bot is about 30 lines of code — the basic mention handler above. Create the Slack app, install the dependencies, add your tokens to .env, and run the bot. Within 15 minutes you will have a working bot that your team can @mention for questions. Add the conversation history and slash commands as second and third steps based on which capabilities your team actually uses. The document Q&A extension is the highest-value addition for teams with substantial internal documentation — it converts the bot from a generic assistant into a specialist that knows your team’s specific systems, processes, and codebase. Build incrementally, starting with the simplest version that is useful, and add features based on what your team actually asks the bot to do rather than what you anticipate they will want.

Building a Discord Bot Instead

The same architecture applies to Discord with minimal changes. Discord’s bot framework uses a different library (discord.js instead of Bolt) and a different event model, but the Ollama integration is identical — you replace the Slack event handlers with Discord’s message event listeners and the Ollama API calls stay exactly the same. For teams already on Discord, the pattern is: create a bot application at discord.com/developers, invite it to your server with message read/write permissions, use discord.js to listen for messages and slash commands, and call Ollama with the message content. The conversation history management, rate limiting patterns, and system prompt guidance from this article apply equally to a Discord bot. The choice between Slack and Discord is entirely about which platform your team uses — the local AI integration approach is the same for both.

Whichever platform you build for, a team AI bot backed by local inference is among the most practically visible demonstrations of local AI’s value — it is something every team member interacts with during their normal workday, its responses are immediately comparable to expectations, and the absence of per-message cloud costs means you can encourage heavy use without worrying about API bills scaling with adoption. The bot essentially pays for the server’s electricity rather than per-token fees, which changes the economics of AI tool adoption fundamentally for budget-conscious teams.

The code in this article is intentionally kept simple — no complex abstractions, no heavy frameworks beyond Bolt itself. This makes it easy to understand, modify, and extend for your specific team’s needs. Start with the basic mention handler, deploy it, and watch how your team actually uses it before investing in more complex features. Real usage patterns almost always differ from anticipated ones, and building on top of observed behaviour rather than predictions produces a much more useful tool in practice than anything designed purely by anticipation could be — and that principle applies as much to AI tooling as to any other software you build for your team — prioritise working software over premature optimization.

Leave a Comment