WhatsApp is one of the world’s most widely used messaging platforms, and building a bot that connects it to a local LLM opens up practical use cases: a personal AI assistant you can query from your phone, a customer support bot for a small business, a family knowledge base you can ask questions to, or a team assistant that answers questions about internal documents. This guide walks through building a WhatsApp bot connected to Ollama using the Whatsapp-web.js library in Node.js, which uses a WhatsApp Web session to send and receive messages without requiring the official WhatsApp Business API.
The approach uses a QR code login to connect to your WhatsApp account, then intercepts incoming messages and responds via Ollama. All message content is processed locally — nothing goes to a cloud API except the WhatsApp Web connection itself to Meta’s servers.
Prerequisites and Setup
You need Node.js 18 or later, Ollama running with a model pulled, and a WhatsApp account on a phone. Install the dependencies:
mkdir whatsapp-ollama && cd whatsapp-ollama npm init -y npm install whatsapp-web.js qrcode-terminal node-fetch
whatsapp-web.js automates WhatsApp Web in a headless Chromium browser managed by Puppeteer. qrcode-terminal renders the login QR code in your terminal so you can scan it with your phone. node-fetch makes HTTP requests to the Ollama API.
Basic Bot
Here is a minimal bot that responds to every message with an Ollama reply:
const { Client, LocalAuth } = require('whatsapp-web.js');
const qrcode = require('qrcode-terminal');
const fetch = require('node-fetch');
const OLLAMA_URL = 'http://localhost:11434';
const MODEL = 'llama3.2';
const client = new Client({
authStrategy: new LocalAuth(),
puppeteer: { headless: true, args: ['--no-sandbox'] }
});
client.on('qr', qr => {
qrcode.generate(qr, { small: true });
console.log('Scan the QR code above with WhatsApp on your phone');
});
client.on('ready', () => console.log('Bot is ready!'));
client.on('message', async msg => {
if (msg.fromMe) return; // ignore own messages
try {
const response = await fetch(`${OLLAMA_URL}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: MODEL,
messages: [{ role: 'user', content: msg.body }],
stream: false
})
});
const data = await response.json();
await msg.reply(data.message.content);
} catch (err) {
await msg.reply('Sorry, I couldn\'t process that request.');
console.error(err);
}
});
client.initialize();Run with node bot.js, scan the QR code that appears in the terminal with your WhatsApp phone app, and the bot comes online. The LocalAuth strategy saves the session to disk so you only need to scan the QR code once — subsequent runs reuse the saved session. The --no-sandbox flag is required when running as root on Linux servers; omit it on macOS and standard Linux user accounts.
Adding Conversation History
Track per-chat history for multi-turn conversations:
const histories = new Map();
const MAX_HISTORY = 20;
function getHistory(chatId) {
if (!histories.has(chatId)) histories.set(chatId, []);
return histories.get(chatId);
}
client.on('message', async msg => {
if (msg.fromMe) return;
const chatId = msg.from;
const history = getHistory(chatId);
// Special commands
if (msg.body.toLowerCase() === '!reset') {
histories.delete(chatId);
await msg.reply('Conversation history cleared.');
return;
}
history.push({ role: 'user', content: msg.body });
if (history.length > MAX_HISTORY) history.splice(0, history.length - MAX_HISTORY);
try {
const response = await fetch(`${OLLAMA_URL}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: MODEL, messages: history, stream: false })
});
const data = await response.json();
const reply = data.message.content;
history.push({ role: 'assistant', content: reply });
await msg.reply(reply);
} catch (err) {
history.pop();
await msg.reply('Error communicating with Ollama.');
}
});The !reset command clears the conversation history for the chat that sent it, giving users a way to start fresh. History is keyed by msg.from — the sender’s WhatsApp ID — so each contact gets their own independent conversation context. Capping history at 20 messages prevents memory growth for long conversations.
Group Chat Support
Respond only when the bot is mentioned in a group chat using @botname:
const BOT_NAME = 'OllamaBot';
client.on('message', async msg => {
if (msg.fromMe) return;
const chat = await msg.getChat();
const isGroup = chat.isGroup;
// In groups, only respond when mentioned
if (isGroup && !msg.body.toLowerCase().includes(`@${BOT_NAME.toLowerCase()}`)) {
return;
}
// Strip the mention from the prompt
const prompt = msg.body
.replace(new RegExp(`@${BOT_NAME}`, 'gi'), '')
.trim();
if (!prompt) return;
const response = await fetch(`${OLLAMA_URL}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: MODEL,
messages: [{ role: 'user', content: prompt }],
stream: false
})
});
const data = await response.json();
await msg.reply(data.message.content);
});This pattern keeps the bot quiet in busy group chats until it is explicitly invoked. Direct messages always get a response regardless of mention. For a family or team group, this is much more pleasant than having the bot reply to every message in the group — people opt in to AI responses by addressing the bot by name.
System Prompt and Personality
Give the bot a consistent personality by prepending a system prompt to every conversation. Store the system prompt per chat so different contacts or groups can have different bot personas:
const DEFAULT_SYSTEM = 'You are a helpful assistant on WhatsApp. Keep replies concise — WhatsApp messages should be readable on a phone screen. Use plain text, avoid markdown formatting like ** or ## which do not render in WhatsApp.';
const systemPrompts = new Map();
function buildMessages(chatId, userMessage) {
const system = systemPrompts.get(chatId) || DEFAULT_SYSTEM;
const history = getHistory(chatId);
return [
{ role: 'system', content: system },
...history,
{ role: 'user', content: userMessage }
];
}The note about avoiding markdown is important. WhatsApp renders plain text — asterisks, hash symbols, and underscores appear as literal characters rather than formatting. Ask the model to write in plain conversational English without bullet points or headers, and responses will look natural in the chat interface rather than cluttered with visible punctuation.
Keeping the Bot Running
Run the bot as a persistent background process using pm2:
npm install -g pm2 pm2 start bot.js --name whatsapp-ollama pm2 save pm2 startup # follow the printed command to enable autostart
pm2 restarts the bot automatically if it crashes, logs output to a rotating log file accessible with pm2 logs whatsapp-ollama, and starts on boot after running pm2 startup. The WhatsApp session is persisted to disk by LocalAuth, so the bot reconnects automatically without requiring a new QR scan after restarts — unless the session expires, which happens when WhatsApp logs out the web session after extended inactivity or when logging in on another device.
Important Limitations
A few important caveats for this approach. Whatsapp-web.js uses an unofficial WhatsApp Web API by automating the browser interface. WhatsApp’s terms of service prohibit automated use of personal accounts, so there is a risk of account suspension if WhatsApp detects bot behaviour. For a personal assistant used by one or two people, this risk is low and the detection unlikely. For anything that sends high volumes of messages or to large groups, use the official WhatsApp Business API instead, which is designed for automated messaging and comes with proper rate limits and support.
WhatsApp messages are end-to-end encrypted between devices, but the whatsapp-web.js approach decrypts them on your machine before passing them to Ollama — which is fine from a privacy perspective since Ollama processes them locally. The bot cannot access messages that arrived while it was offline, and message delivery depends on your machine being on and connected to the internet.
Practical Use Cases
The most practical use case for a personal WhatsApp Ollama bot is as an always-available AI assistant on your phone. You are already in WhatsApp throughout the day — asking it a question is faster than opening a separate app and waiting for it to load. Text something like “Summarise the key arguments for and against remote work” while commuting, and get a thoughtful response by the time you arrive. Ask it to draft a short email reply, explain a concept you encountered in a meeting, convert currencies, or translate a phrase — all the small quick-answer tasks where reaching for your laptop feels like overkill.
For a small business, a WhatsApp bot connected to Ollama and a document collection can handle common customer questions automatically. Load your FAQ, product catalogue, and policy documents into the bot’s context, and it can answer questions about opening hours, return policies, product specifications, and pricing without any human involvement. Route unanswered questions — anything the bot responds to with uncertainty — to a human agent by forwarding the chat to a staff member’s WhatsApp. This is a genuine automation win that costs nothing beyond the hardware to run Ollama, compared to the per-message costs of cloud-based WhatsApp chatbot services.
Family and friend groups are another natural fit. A shared group with a bot that can answer questions, settle debates, convert recipes for different serving sizes, suggest gift ideas, or explain news events is genuinely useful for the kind of quick information lookups that come up naturally in group conversation. The key design choice is the mention-only trigger — the bot stays quiet unless addressed, which keeps it from dominating the conversation with unrequested responses.
Handling Media Messages
WhatsApp messages can include images, audio, documents, and other media. For a vision-capable Ollama model like llava or gemma3, you can extend the bot to describe or analyse images that users send. When a message has media attached, download it, convert it to base64, and include it in the Ollama API request alongside the text prompt.
Check whether a message has media with msg.hasMedia, download it with await msg.downloadMedia(), which returns a MessageMedia object containing the base64-encoded data and mime type. Pass the base64 data in the images field of the Ollama messages format for vision-capable models. This lets users send a photo of a handwritten note and ask the bot to transcribe it, share a screenshot of an error message and ask for help debugging it, or send a picture of food and ask for nutritional information — all processed locally without any image data leaving your machine.
For voice messages, WhatsApp sends them as .ogg audio files. Download the audio, convert it to wav using ffmpeg (pip install ffmpeg-python), transcribe it with Whisper running locally through Ollama’s API or the standalone whisper package, and then send the transcription to Ollama for analysis. This gives the bot the ability to understand voice messages, which is particularly useful for users who prefer speaking over typing — a common preference on mobile, where voice messages are heavily used.
Rate Limiting and Abuse Prevention
Without rate limiting, a single user could send hundreds of messages in rapid succession and queue up hundreds of Ollama requests, making the bot unresponsive for everyone else. Add a simple per-user cooldown using a Map of timestamps: record the last message time per sender, and if a new message arrives within the cooldown period, reply with a “please wait” message rather than forwarding to Ollama. A 5 to 10 second cooldown per user is enough to prevent accidental spam while being completely invisible in normal use — no one sends multiple WhatsApp messages within 5 seconds expecting immediate responses from a bot.
For group chats where many people might send messages simultaneously, consider a global queue that processes one Ollama request at a time rather than firing concurrent requests. Ollama handles concurrent requests by queuing them internally anyway, but managing the queue at the bot level gives you better control over error handling and response ordering. Node.js’s async/await makes it straightforward to implement a sequential processing queue with a simple Promise chain.
Extending with External Integrations
Once the basic bot is working, extending it with external integrations is straightforward because all the extension happens at the bot layer — you call external services before or after the Ollama call and include the results in the prompt. Common extensions include weather data (fetch the current weather from an open API and include it in the system prompt so the bot can answer “should I bring an umbrella?”), calendar integration (check your calendar API and let the bot know about upcoming events so it can answer scheduling questions), and note taking (save notable responses to a Markdown file or database so you can retrieve past conversations later).
For a more sophisticated personal assistant, implement a simple tool-calling pattern: check the user’s message for intent patterns (questions about time, weather, unit conversions, calculations), call the appropriate API or function, and include the result in the Ollama prompt alongside the user’s question. The model then synthesises a natural language answer that incorporates the real-time data. This gives you a bot that answers “what’s 15% of 87.50?” with the correct result rather than an approximation, and “what time is it in Tokyo?” with the actual current time — practical improvements that make the bot significantly more useful for everyday queries.
Why WhatsApp Over Other Messaging Platforms
The case for building an Ollama bot on WhatsApp rather than Telegram, Discord, or Signal comes down to reach. WhatsApp is the primary messaging app for most people outside the US and for many within it — if you want an AI assistant that your non-technical family members, clients, or customers will actually use, WhatsApp is where they already are. Telegram and Discord have more capable bot APIs and fewer terms of service concerns around automation, making them better choices for developer communities and power users. For general consumer use and small business applications where the audience uses WhatsApp by default, the unofficial whatsapp-web.js approach is a practical compromise that gets you to working software quickly.
The bot described in this guide is a working starting point that you can have running in under an hour. Refine the system prompt, add the integrations that matter to your use case, and you have a genuinely useful local AI assistant that lives where you already spend your time.