How to Build a Telegram Bot with Ollama

Telegram bots are a practical way to expose a local LLM to yourself and your team — the Telegram app is already on everyone’s phone, the bot API is simple, and a bot running on a home server or VPS lets you chat with a local AI model from anywhere with no dedicated frontend to build. This guide covers building a Telegram bot backed by Ollama using the python-telegram-bot library.

Prerequisites

pip install python-telegram-bot ollama
# Create a bot via BotFather on Telegram and get your TOKEN

Basic Bot: Respond to Messages

import logging
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
import ollama

logging.basicConfig(level=logging.INFO)

MODEL = 'llama3.2'
TOKEN = 'YOUR_BOT_TOKEN_HERE'

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_text = update.message.text
    if not user_text:
        return

    # Show typing indicator
    await context.bot.send_chat_action(
        chat_id=update.effective_chat.id,
        action='typing'
    )

    response = ollama.chat(
        model=MODEL,
        messages=[{'role': 'user', 'content': user_text}]
    )
    await update.message.reply_text(response['message']['content'])

def main():
    app = Application.builder().token(TOKEN).build()
    app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
    app.run_polling()

if __name__ == '__main__':
    main()

Conversation History

from collections import defaultdict

# Per-user conversation history
user_histories = defaultdict(list)
MAX_HISTORY = 20  # messages

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_id = update.effective_user.id
    user_text = update.message.text

    history = user_histories[user_id]
    history.append({'role': 'user', 'content': user_text})

    await context.bot.send_chat_action(chat_id=update.effective_chat.id, action='typing')

    response = ollama.chat(model=MODEL, messages=history)
    reply = response['message']['content']
    history.append({'role': 'assistant', 'content': reply})

    # Trim history
    if len(history) > MAX_HISTORY:
        user_histories[user_id] = history[-MAX_HISTORY:]

    await update.message.reply_text(reply)

Commands

from telegram.ext import CommandHandler

async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text(
        'Hello! I am an AI assistant powered by a local LLM.\n'
        'Commands: /clear (reset history), /model (show current model)'
    )

async def clear(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_id = update.effective_user.id
    user_histories[user_id] = []
    await update.message.reply_text('Conversation history cleared.')

async def model_info(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text(f'Current model: {MODEL}')

# Register handlers
app.add_handler(CommandHandler('start', start))
app.add_handler(CommandHandler('clear', clear))
app.add_handler(CommandHandler('model', model_info))
app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))

Restricting Access

ALLOWED_USER_IDS = {123456789, 987654321}  # Your Telegram user ID(s)

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_id = update.effective_user.id
    if ALLOWED_USER_IDS and user_id not in ALLOWED_USER_IDS:
        await update.message.reply_text('Unauthorised.')
        return
    # ... rest of handler

Running as a Service

sudo nano /etc/systemd/system/telegram-bot.service

# [Unit]
# Description=Ollama Telegram Bot
# After=network.target ollama.service
# 
# [Service]
# ExecStart=python3 /home/user/bot.py
# Restart=always
# User=user
# Environment=TELEGRAM_TOKEN=your_token_here
# 
# [Install]
# WantedBy=multi-user.target

sudo systemctl enable telegram-bot
sudo systemctl start telegram-bot

Why Telegram for a Local AI Bot

Telegram is a natural platform for a personal or team AI bot for several reasons. The app is widely installed and already used for communication, eliminating the need to build a custom frontend. The Bot API is one of the simplest bot APIs available — you register a bot with BotFather in seconds and receive a token that grants full API access. Telegram bots support rich message formatting (Markdown, HTML), file sharing, inline keyboards, and voice messages, making them capable of handling diverse AI workflows. And critically, Telegram delivers messages to your bot via long-polling or webhooks without requiring a public IP address or TLS certificate — your bot can run on a home server behind NAT and still receive all messages.

The privacy story is nuanced: messages to your bot travel through Telegram’s servers (Telegram is the transport layer), but the AI inference itself happens locally on your Ollama instance. Your message text goes to Telegram, not to any AI cloud service. For most personal and team use cases this is an acceptable trade-off — Telegram’s privacy record is reasonable, and the alternative (a fully local interface) requires users to access a web UI rather than their existing messaging app. If full end-to-end local processing is required, a local web interface or a Signal/Matrix bot would be more appropriate alternatives.

Multi-User Group Chat Support

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    # Work in both private chats and group chats
    chat_id = update.effective_chat.id
    user_id = update.effective_user.id
    message_text = update.message.text

    # In group chats, only respond to direct mentions or replies
    if update.effective_chat.type != 'private':
        bot_username = (await context.bot.get_me()).username
        if not (f'@{bot_username}' in message_text or
                (update.message.reply_to_message and
                 update.message.reply_to_message.from_user.id == context.bot.id)):
            return  # Ignore unreplied group messages
        message_text = message_text.replace(f'@{bot_username}', '').strip()

    # Use per-user history (not per-chat, so history is consistent across groups)
    history = user_histories[user_id]
    history.append({'role': 'user', 'content': message_text})
    response = ollama.chat(model=MODEL, messages=history)
    reply = response['message']['content']
    history.append({'role': 'assistant', 'content': reply})
    await update.message.reply_text(reply)

Sending Long Responses

Telegram messages have a 4096 character limit. For long model responses, split them into multiple messages:

async def send_long_message(update: Update, text: str, max_len: int = 4000):
    if len(text) <= max_len:
        await update.message.reply_text(text)
        return
    # Split on paragraph boundaries when possible
    parts = []
    current = ''
    for para in text.split('\n\n'):
        if len(current) + len(para) + 2 <= max_len:
            current += ('\n\n' if current else '') + para
        else:
            if current:
                parts.append(current)
            current = para
    if current:
        parts.append(current)
    for part in parts:
        await update.message.reply_text(part)

Getting Started

Message @BotFather on Telegram, create a new bot with /newbot, and copy the token. Install python-telegram-bot and run the basic example from this article. Add yourself to ALLOWED_USER_IDS (use @userinfobot to get your Telegram user ID), deploy as a systemd service alongside Ollama, and you have a personal AI assistant accessible from Telegram on any device. The entire setup takes about 30 minutes and produces a genuinely useful daily-use tool that costs nothing to run and keeps your conversations local.

Why a Telegram Bot Is Better Than a Web UI for Personal Use

For a personal AI assistant, a Telegram bot has practical advantages over a custom web UI. The app is already on your phone and laptop — no URL to remember, no separate tab to open. Telegram's notification system means the bot can respond even when you are not actively looking at the interface. The history is persistent across devices through Telegram's sync. You can access it from any device that has Telegram installed, including devices where you cannot run a web server or access a local network. And building the bot requires significantly less frontend work than a custom web interface — the Telegram Bot API handles all the UI complexity, leaving you to focus on the AI integration.

For team use, a shared Telegram bot (with all team members whitelisted by user ID) gives everyone AI assistant access through the tool they already use for team communication, without requiring a web server, DNS configuration, or TLS certificate. The bot runs on any machine with internet access — a home server, a VPS, a laptop — and Telegram's long-polling means it works behind NAT without any firewall configuration. For a small team that wants shared access to a local AI model without setting up dedicated infrastructure, a Telegram bot backed by Ollama on a shared server is one of the simplest production-viable deployments available.

Voice Message Transcription

from telegram.ext import MessageHandler, filters
import subprocess, os

async def handle_voice(update: Update, context: ContextTypes.DEFAULT_TYPE):
    # Download the voice message
    voice = update.message.voice
    file = await context.bot.get_file(voice.file_id)
    ogg_path = f'/tmp/voice_{voice.file_id}.ogg'
    mp3_path = ogg_path.replace('.ogg', '.mp3')
    await file.download_to_drive(ogg_path)

    # Convert to mp3 (Whisper works better with mp3)
    subprocess.run(['ffmpeg', '-i', ogg_path, mp3_path, '-y'], capture_output=True)

    # Transcribe with faster-whisper
    from faster_whisper import WhisperModel
    model = WhisperModel('small', device='auto')
    segments, _ = model.transcribe(mp3_path)
    transcript = ' '.join(s.text for s in segments)

    # Clean up
    os.unlink(ogg_path); os.unlink(mp3_path)

    # Send to Ollama and reply
    response = ollama.chat(model=MODEL, messages=[
        {'role': 'user', 'content': transcript}
    ])
    await update.message.reply_text(f'Heard: {transcript}\n\n{response["message"]["content"]}')

# Register handler
app.add_handler(MessageHandler(filters.VOICE, handle_voice))

Document Summarisation

from telegram.ext import MessageHandler, filters
import PyPDF2, io

async def handle_document(update: Update, context: ContextTypes.DEFAULT_TYPE):
    doc = update.message.document
    if not doc.file_name.endswith('.pdf'):
        await update.message.reply_text('Send a PDF to summarise.')
        return

    file = await context.bot.get_file(doc.file_id)
    buf = io.BytesIO()
    await file.download_to_memory(buf)
    buf.seek(0)

    reader = PyPDF2.PdfReader(buf)
    text = ' '.join(page.extract_text() for page in reader.pages[:10])  # first 10 pages
    words = text.split()
    text = ' '.join(words[:5000])  # limit to ~5000 words

    await context.bot.send_chat_action(chat_id=update.effective_chat.id, action='typing')
    response = ollama.chat(model=MODEL, messages=[{
        'role': 'user',
        'content': f'Summarise this document in 5 bullet points:\n\n{text}'
    }])
    await update.message.reply_text(response['message']['content'])

app.add_handler(MessageHandler(filters.Document.ALL, handle_document))

Scaling and Performance

A Telegram bot backed by a single Ollama instance handles requests sequentially — if two users message simultaneously, the second waits for the first's response to complete. For a personal bot or a small team (under 5 active users), this is fine in practice because conversations are rarely perfectly simultaneous. For heavier use, set OLLAMA_NUM_PARALLEL=2 on the Ollama server to allow 2 concurrent requests, and implement a user-visible queue indicator in the bot that sends a "processing your message..." reply immediately before starting inference. This gives users feedback that their message was received even if they have to wait a few seconds for the model to finish a previous response.

Using Different Models for Different Tasks

A practical enhancement is routing different types of requests to different models based on what the bot detects about the message. Code questions get a code-specialised model; general questions get a general model; document summarisation uses a larger context window model. A simple keyword-based router handles most cases:

def select_model(text: str) -> str:
    code_keywords = ['python', 'javascript', 'code', 'function', 'bug', 'error', 'script', 'class']
    if any(kw in text.lower() for kw in code_keywords):
        return 'qwen2.5-coder:7b'
    return 'llama3.2'  # general purpose fallback

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    text = update.message.text
    model = select_model(text)
    # ... rest of handler using selected model

Error Recovery and Reliability

Production bots need error handling for Ollama failures — the model may be loading (cold start latency), Ollama may have restarted, or a request may time out on slow hardware. Wrap Ollama calls in try/except and give users meaningful feedback:

import asyncio

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    text = update.message.text
    await context.bot.send_chat_action(chat_id=update.effective_chat.id, action='typing')
    try:
        loop = asyncio.get_event_loop()
        # Run synchronous ollama call in thread pool to avoid blocking
        response = await asyncio.wait_for(
            loop.run_in_executor(None, lambda: ollama.chat(
                model=MODEL,
                messages=[{'role':'user','content':text}]
            )),
            timeout=120.0
        )
        await update.message.reply_text(response['message']['content'])
    except asyncio.TimeoutError:
        await update.message.reply_text('Request timed out — the model is taking too long. Try a shorter question.')
    except Exception as e:
        await update.message.reply_text(f'Error: {str(e)[:100]}. Is Ollama running?')

Note the run_in_executor pattern — the python-telegram-bot library is async, but the Ollama Python library's synchronous methods block the event loop. Running them in a thread pool executor keeps the bot responsive to other messages while waiting for inference to complete.

Monitoring Your Bot

For a bot running as a background service, add basic logging so you can review usage and debug issues. Log each message received, the model used, response time, and any errors — this takes five lines of code but gives you visibility into how the bot is being used and whether it is performing reliably. For a personal bot, the systemd journal (journalctl -u telegram-bot -f) is sufficient for monitoring without any additional tooling. For a team bot, consider writing logs to a simple SQLite database so you can query usage patterns and review conversation quality over time.

Extending the Bot with Web Search

For questions where the local model's training data is outdated, add a simple web search step before calling Ollama. The DuckDuckGo Python library provides anonymous web search without an API key:

pip install duckduckgo-search
from duckduckgo_search import DDGS

def web_search(query: str, max_results: int = 3) -> str:
    with DDGS() as ddgs:
        results = list(ddgs.text(query, max_results=max_results))
    if not results:
        return ''
    return '\n'.join(f"{r['title']}: {r['body']}" for r in results)

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    text = update.message.text
    # Simple heuristic: if message starts with 'search:' add web context
    if text.lower().startswith('search:'):
        query = text[7:].strip()
        search_results = web_search(query)
        prompt = f'Answer based on these search results:\n{search_results}\n\nQuestion: {query}'
    else:
        prompt = text
    response = ollama.chat(model=MODEL, messages=[{'role':'user','content':prompt}])
    await update.message.reply_text(response['message']['content'])

Getting the Most from Your Bot

The most useful Telegram bot configurations for daily use: set a thoughtful system prompt that gives the model context about who you are and how you want it to respond ("You are a technical assistant for a software developer. Be concise and precise. Prefer code examples over explanations."); add /clear as a first step if you notice responses getting confused by old history; and keep ALLOWED_USER_IDS set to just yourself (and explicit teammates) to avoid your bot being used by others if they discover the username. Over time, the bot becomes one of the most frictionless ways to interact with a local AI model — no browser tab, no URL, just a message to a contact in your existing messaging app that responds with local AI intelligence.

Alternative: WhatsApp or Discord Bots

The same architecture — python event loop, Ollama inference, conversation history per user — applies equally to WhatsApp and Discord bots. WhatsApp uses the Twilio or Meta Cloud API rather than a simple token-based bot API, which adds complexity and requires a business account for production use. Discord uses discord.py, which follows the same async pattern as python-telegram-bot with similar handler registration. If your team is already on Discord, the Discord bot approach is worth considering — the same session patterns, command handlers, and document processing logic from this article transfer directly with only the transport library changing. Telegram remains the recommended starting point because its bot API is the simplest to get running from zero and works without any approval process or business account requirement.

The Local AI Assistant You Actually Use

The mark of a good personal AI tool is whether you actually use it daily. Telegram bots backed by Ollama pass this test for many developers because they sit inside an existing communication app rather than requiring a separate window or browser tab. The activation cost — opening a messaging app you already use — is so low that the bot becomes the natural first stop for quick questions, document summarisation, and code explanations throughout the day. Combined with conversation history and voice message support, a well-configured Telegram bot provides a genuinely useful AI assistant that you carry in your pocket, powered entirely by a model running on your own hardware with no ongoing API costs. The 30-minute setup investment pays dividends every day it runs.

Security Best Practices

Before deploying your bot beyond personal use, review these security considerations. Store the Telegram token in an environment variable rather than hardcoding it — if your code is ever shared or pushed to version control, a hardcoded token would expose your bot to hijacking. Use ALLOWED_USER_IDS for all deployments — Telegram bots are publicly discoverable by username, and without access control anyone who finds your bot can use your local compute. For team bots, use a group whitelist (allowed chat IDs) rather than individual user IDs if you want all members of a specific group to have access. Rotate your bot token periodically by creating a new one via BotFather and updating your service configuration — this limits the window of exposure if a token is ever leaked. And consider rate limiting per user to prevent any single user from monopolising the Ollama instance during periods of heavy use, particularly if you are sharing the bot with a team and the underlying hardware has limited capacity for parallel inference.

Getting Started

Create your bot via @BotFather, copy the token, install the dependencies (pip install python-telegram-bot ollama), paste the basic handler from this article, and run it. Add your user ID to ALLOWED_USER_IDS (find it by messaging @userinfobot), and your personal AI assistant on Telegram is live. Add conversation history, commands, and voice support incrementally as you discover what you actually use. Deploy as a systemd service when you want the bot to stay running reliably without manual restarts. The result is one of the most frictionless AI assistant setups available — accessible from your phone with a single tap, powered by a model on your own hardware, with no subscription and no data leaving your infrastructure — a combination that makes it one of the most practically useful local AI deployments for everyday personal and team use — and one that compounds in value the more you reach for it as your go-to AI interface throughout the day.

The patterns in this article — handler registration, conversation history, commands, group chat, file handling, error recovery — transfer directly to Discord, Matrix, and Signal bots with only the transport library changing. Once you have built one bot with Ollama, you have the mental model for building any messaging-platform bot powered by local AI.

Leave a Comment