AnythingLLM Setup: Chat with Your Documents Locally

AnythingLLM is an all-in-one desktop and Docker application that combines local LLM inference, RAG (retrieval-augmented generation) over your own documents, multi-user workspaces, and agent support. Unlike Open WebUI which focuses on the chat interface, AnythingLLM is built around the concept of workspaces — each one has its own document collection, embedding settings, and LLM configuration. This makes it well-suited for teams that need to query different document sets with different models.

Installation Options

AnythingLLM offers two installation paths. The desktop app is the simplest — download from anythingllm.com and it bundles everything including a built-in llama.cpp inference engine, so you can run models without Ollama or any other backend. The Docker version is better for multi-user deployments or server-based setups.

# Docker installation
docker pull mintplexlabs/anythingllm

docker run -d \
  -p 3001:3001 \
  -v ~/.anythingllm:/app/server/storage \
  --name anythingllm \
  mintplexlabs/anythingllm

# Access at http://localhost:3001

The desktop app download is the recommended starting point for most users — it is self-contained and avoids Docker complexity for single-user setups.

Connecting to Ollama

AnythingLLM supports multiple LLM backends. To use Ollama: open Settings, go to LLM Provider, select Ollama, and enter http://localhost:11434 as the base URL. AnythingLLM will fetch the list of available models from Ollama and let you select one as the default. You can also use LM Studio, OpenAI, Anthropic, or any OpenAI-compatible endpoint as the backend — making AnythingLLM a flexible frontend that works with whichever LLM provider you prefer.

Workspaces and Document Upload

The core organisational unit in AnythingLLM is the workspace. Each workspace has its own vector store, document collection, system prompt, and can be configured with a different LLM or embedding model from the global default. Create a workspace for each project, team, or document domain you want to query separately.

To add documents to a workspace, click Upload in the workspace panel. AnythingLLM supports PDFs, Word documents, text files, Markdown, URLs (it fetches and embeds the page), YouTube transcripts, and GitHub repositories. Documents are chunked and embedded automatically using whichever embedding model you have configured — nomic-embed-text via Ollama is the standard local choice.

# AnythingLLM also has a CLI for scripted document ingestion
pip install anythingllm-cli

# Upload a folder of PDFs to a workspace
anythingllm upload --workspace 'my-project' --path ./docs/

Querying Documents

Once documents are embedded, switch the workspace to Chat mode and ask questions. AnythingLLM retrieves the most relevant chunks from your document collection and passes them as context to the LLM. The interface shows which source documents were used for each answer — clicking a source opens the relevant passage, making it easy to verify the LLM’s answer against the original text.

AnythingLLM also has a Query mode (as opposed to Chat mode) that treats each message as an independent question without conversation history — useful for batch document Q&A where you want clean, context-free answers to individual questions without previous conversation influencing the response.

Agents and Tool Use

AnythingLLM includes an agent framework that lets the LLM use tools: web search, code execution, file reading, and custom tool integrations. Enable agents in workspace settings, then mention the agent in your message (or use the @agent mention) to invoke tool use. The web search integration is particularly useful — it lets the LLM search the web for current information that is not in your document collection, combining RAG over private documents with live web search in the same interface.

# Agent tool configuration in workspace settings:
# - Web Search: enable + add Tavily or SerpAPI key
# - Code Interpreter: enable for Python execution
# - SQL Connector: connect to a database for natural language queries

Multi-User Setup with Roles

AnythingLLM supports multiple users with three roles: admin, manager, and user. Admins configure the system-wide LLM and embedding settings. Managers can create and manage workspaces. Users can only chat within workspaces they have been given access to. This role structure makes AnythingLLM appropriate for small teams where different people need access to different document sets without full admin access.

Enable multi-user mode from the admin panel. Once enabled, a login screen appears and you can create accounts for team members. Each user’s conversation history is stored separately — there is no shared conversation history between users in the same workspace, only shared documents and workspace configuration.

AnythingLLM vs Open WebUI: When to Use Each

Both are local LLM frontends with RAG support, but they have different strengths. Open WebUI is better as a general-purpose chat interface with light RAG — it is simpler, faster to set up, and better for day-to-day chatting with occasional document questions. AnythingLLM is better when documents are central to the use case — when you have large document collections that need to be well-organised, when you need workspace isolation between projects, or when you need agent tool use integrated into the document workflow. For a team primarily doing document Q&A and research rather than general chat, AnythingLLM’s workspace model is more appropriate. For a team that primarily chats and occasionally refers to documents, Open WebUI is the lighter and more responsive choice.

Choosing the Right Embedding Model

The embedding model determines the quality of document retrieval in AnythingLLM. The default local option is nomic-embed-text via Ollama, which produces 768-dimensional embeddings and handles general English text well. For most document Q&A use cases this is the right starting point — pull it with ollama pull nomic-embed-text, select it in AnythingLLM’s embedding settings, and you are done.

If your documents are highly technical — code repositories, medical literature, legal documents — consider a domain-specific embedding model. BGE-M3 (available via Ollama as bge-m3) is a strong multilingual and technical embedding model that outperforms nomic-embed-text on specialised content. It is larger (slower to embed) but worth it for document collections where retrieval precision matters more than embedding speed. For code-heavy repositories, you may also want to look at embedding models specifically trained on code, though for most mixed-content repositories (documentation with code examples), a strong general embedding model like nomic-embed-text or BGE-M3 performs adequately.

One important practical consideration: if you change your embedding model after documents have already been embedded, you must re-embed all documents in the workspace. AnythingLLM will prompt you to do this when you change the embedding model. Plan your embedding model choice upfront for large document collections, since re-embedding thousands of documents takes time and compute.

Chunking Strategy and Retrieval Settings

How AnythingLLM splits documents into chunks significantly affects retrieval quality. The default chunk size (1000 tokens with 20% overlap) works well for most prose documents. For documents with distinct sections — technical manuals, legal contracts, financial reports — slightly larger chunks (1500–2000 tokens) preserve more context per chunk and reduce cases where an answer spans two adjacent chunks and neither is retrieved alone. For conversational documents or short-form content (emails, support tickets, chat logs), smaller chunks (300–500 tokens) improve precision by reducing noise in each chunk.

The number of context chunks passed to the LLM on each query (the top-k setting) defaults to 4 in AnythingLLM. Increasing this to 6–8 gives the LLM more source material to work from, which helps for complex questions that require synthesising information from multiple document sections. The tradeoff is that more chunks consume more context window tokens, leaving less room for the LLM’s response — reduce chunk size if you increase top-k to keep the total context length manageable.

Practical Document Collection Tips

AnythingLLM works best when document collections are well-organised and scoped. A workspace containing 10 highly relevant PDFs will give better answers than a workspace containing 200 loosely related documents, because retrieval precision degrades as the document set grows more diverse. Split large mixed-topic collections into multiple workspaces by subject area — one workspace for technical documentation, another for business processes, another for research papers — and direct queries to the appropriate workspace. This is one of AnythingLLM’s key advantages over Open WebUI’s simpler RAG: the workspace model encourages this kind of organisation rather than dumping everything into one flat document store.

For keeping document collections current, AnythingLLM supports re-uploading updated versions of documents. When you re-upload a file with the same name, it replaces the old embeddings automatically. For web pages that change frequently, the URL ingestion feature can be re-run manually to refresh the embeddings with the current page content. There is no automatic re-crawl on a schedule in the current version — refreshing web sources requires re-uploading the URL manually, which is a workflow limitation worth knowing before committing to heavy URL-based document collections.

API Access for Programmatic Use

AnythingLLM exposes a REST API that lets you query workspaces programmatically — useful for building internal tools that surface document answers without users needing to open the UI. The API requires an API key generated from the admin panel under API Keys.

import requests

ANYTHINGLLM_URL = 'http://localhost:3001'
API_KEY = 'your-api-key-here'

def query_workspace(workspace_slug: str, message: str) -> str:
    response = requests.post(
        f'{ANYTHINGLLM_URL}/api/v1/workspace/{workspace_slug}/chat',
        headers={'Authorization': f'Bearer {API_KEY}', 'Content-Type': 'application/json'},
        json={'message': message, 'mode': 'query'}
    )
    data = response.json()
    return data.get('textResponse', '')

# Query a workspace by its slug (visible in the workspace URL)
answer = query_workspace('my-project-docs', 'What are the main API endpoints?')
print(answer)

The workspace slug is the URL-safe version of your workspace name — visible in the browser URL when you open the workspace. The mode parameter accepts query (no conversation history) or chat (with history). This API makes AnythingLLM practical as a backend for internal knowledge base tools, Slack bots, or any application that needs to answer questions from a specific document corpus without exposing the full AnythingLLM UI to end users.

Getting Started: Recommended First Setup

For a first AnythingLLM setup that works well out of the box: download the desktop app, connect it to Ollama with llama3.2 as the chat model and nomic-embed-text as the embedding model, create one workspace for your most important document collection, upload five to ten representative PDFs, and test it with a few questions you know the answers to from those documents. Verifying that retrieval works correctly on known questions before relying on it for unknown questions is the single most important habit for getting good results from any RAG system — and AnythingLLM makes this easy because it shows you exactly which source passages it retrieved for each answer.

AnythingLLM is under active development and releases updates frequently — the agent capabilities and connector integrations in particular have expanded significantly over the past year. Checking the release notes periodically is worthwhile, as new connectors (databases, Notion, Confluence, Google Drive) are regularly added that can significantly expand what you can embed and query. The GitHub repository at github.com/Mintplex-Labs/anything-llm is the best place to track upcoming features and contribute feedback on issues you encounter with specific document types or LLM backends. For most teams that need private, organised document Q&A without sending data to cloud providers, AnythingLLM is one of the most complete solutions available today — the combination of workspace isolation, multi-user support, agent tool use, and broad LLM backend compatibility in a single installable package is difficult to match with any other open-source tool.

Leave a Comment