Jan AI: The Open-Source Local LLM Desktop App Explained

Jan is a free, open-source desktop application for running LLMs locally. It is designed around simplicity — a clean chat interface, a built-in model hub, and a local API server, all in one download with no dependencies. Unlike Ollama which is a command-line tool, Jan is a full desktop GUI application similar to LM Studio but with a more minimal design philosophy and an emphasis on being fully offline-capable out of the box.

Installation

Download Jan from jan.ai — it is available for macOS (Apple Silicon and Intel), Windows, and Linux. The installer bundles everything including llama.cpp for inference, so no separate backend is needed. Jan runs entirely offline after installation; it does not require an account, does not send telemetry by default, and does not connect to any cloud service unless you explicitly configure a remote API.

Downloading Models

Jan includes a model hub with pre-configured model cards for popular open-source models. Open the Hub tab, browse the available models, and click Download next to any you want. Each model card shows the file size, recommended hardware, and a brief description. Jan downloads GGUF files directly from Hugging Face and stores them locally.

Good starting models available in the hub:

Llama 3.2 3B Instruct Q4 — very fast, works on any modern laptop including CPU-only
Llama 3.2 8B Instruct Q4 — better quality, needs 8GB+ RAM with GPU
Qwen2.5 7B Instruct Q4 — strong general-purpose model
Mistral 7B Instruct Q4 — fast and reliable

You can also import GGUF files you have downloaded elsewhere by going to the Local Models tab and clicking Import Model — point it at any local GGUF file and it appears in your model list immediately.

The Chat Interface

Jan’s chat interface is clean and minimal. Select a model from the model selector at the top of a new thread, type your message, and the model responds. Each conversation is saved as a thread that persists across sessions. Jan supports system prompts (set them in the thread settings panel on the right), and you can adjust temperature, context length, and other generation parameters per thread.

One notable Jan feature is the ability to attach files directly to messages. You can paste images into the chat if you are using a multimodal model, or drag and drop text files to include their content in the context. This is simpler than the full RAG pipelines in Open WebUI or AnythingLLM — Jan just includes the file content directly in the context rather than chunking and embedding it — which means it works without any setup but is limited to files that fit in the model’s context window.

Jan’s Local API Server

Jan includes a built-in OpenAI-compatible API server, accessible at http://localhost:1337/v1. Enable it from Settings → Jan API Server → Start Server. Once running, you can use it exactly like the Ollama or LM Studio API:

from openai import OpenAI

client = OpenAI(base_url='http://localhost:1337/v1', api_key='jan')

response = client.chat.completions.create(
    model='llama3.2-3b-instruct',  # Jan's model ID format
    messages=[{'role': 'user', 'content': 'What is the capital of France?'}],
)
print(response.choices[0].message.content)

Jan’s model IDs in the API use a hyphenated lowercase format derived from the model name in the hub — check the model card or the Jan API documentation for the exact ID to use in requests. If you are unsure, hit http://localhost:1337/v1/models to see the IDs of all loaded models.

Hardware and Performance

Jan uses llama.cpp under the hood and automatically detects and uses your GPU. On Apple Silicon it uses Metal; on NVIDIA hardware it uses CUDA; on AMD it uses ROCm (Linux) or Vulkan (Windows). GPU utilisation is shown in the model settings panel — you can manually set the number of GPU layers if the automatic detection is not using the hardware you expect.

Jan is generally slightly slower than Ollama for the same model and hardware combination because it uses a bundled version of llama.cpp that may lag behind the latest optimisations, whereas Ollama updates its backend more frequently. For most interactive chat use cases the difference is not noticeable, but for applications where tokens-per-second matters (long generations, high-throughput batch use), Ollama or LM Studio may give better performance.

Jan vs Ollama vs LM Studio

Jan sits between Ollama and LM Studio in the local LLM tool landscape. It is more beginner-friendly than Ollama (full GUI, no command line), simpler than LM Studio (less configuration, more opinionated defaults), and fully open-source with no proprietary components. The tradeoff is fewer advanced features — Jan does not have LM Studio’s detailed hardware tuning controls or Ollama’s scriptability, and its RAG support is limited to direct file attachment rather than a proper vector store.

Jan is the right choice if you want the simplest possible path to running local models with a GUI, value fully open-source software, or want something that works completely offline without any cloud connectivity by default. It is a particularly good fit for privacy-conscious users who want to verify exactly what the application does and does not send over the network — Jan’s source code is fully auditable and its offline-first design makes it easy to verify there is no unexpected network activity.

Extensions

Jan has an extension system that adds capabilities beyond the core chat interface. The most useful extensions are: the Remote API extension (lets you connect Jan to OpenAI, Anthropic, or Groq as backends instead of running locally), the Model Importer extension (simplifies importing models from Hugging Face by URL), and community extensions that add document Q&A and web search. Extensions are installed from the Extensions tab in Settings — the official extensions are curated and stable, while community extensions vary in quality. Check the Jan Discord for the most current recommendations on which community extensions are worth using.

Why Jan Stands Out for Privacy-First Users

Most local LLM applications make some network requests even when running locally — checking for updates, fetching model metadata, or sending anonymous usage statistics. Jan’s design explicitly minimises this. By default it makes no telemetry calls and does not require any account or sign-in. The only network activity in a default Jan session is model downloads from Hugging Face, which happen only when you explicitly click Download on a model card. Once models are downloaded, Jan runs entirely offline — you can disconnect from the internet and everything continues to work normally.

This offline-first design makes Jan particularly suitable for use cases involving sensitive information: legal document review, medical note-taking, financial analysis, proprietary code assistance. Professionals in regulated industries who need AI assistance but cannot use cloud APIs due to compliance constraints will find Jan’s straightforward offline operation easier to audit and justify than tools with more complex network behaviour. The fully open-source codebase means security teams can review exactly what the application does, which is not possible with proprietary local LLM applications.

Thread Management and Organisation

Jan saves every conversation as a named thread in the left sidebar. Threads persist across application restarts and can be renamed, archived, or deleted. For users who have many ongoing conversations — multiple projects, different model experiments, various task types — Jan’s thread organisation is more systematic than Open WebUI’s flat conversation list. You can group threads by use case and quickly navigate between them without losing context.

Jan stores all thread data locally as JSON files in the Jan data directory (~/jan on macOS/Linux, %APPDATA%\jan on Windows). This makes backup and migration straightforward — copy the directory to a new machine and all your conversations and model configurations come with it. It also means conversations are easy to parse programmatically if you want to extract or analyse your interaction history.

Configuring Jan for Different Use Cases

Jan’s thread-level configuration lets you tailor settings for different types of work without creating separate model configurations. For creative writing, set temperature to 0.8–1.0 and context length to the maximum your hardware supports. For factual Q&A and code assistance, lower temperature to 0.1–0.3 for more consistent outputs. For document analysis where you are pasting long content directly, set context length as high as possible (up to the model’s maximum) and temperature to 0.2. These settings are saved per thread, so you can have different threads preconfigured for different tasks and switch between them without adjusting settings each time.

The system prompt field in thread settings is where most of Jan’s practical customisation happens. A good system prompt transforms a general-purpose model into a focused task assistant — a one-paragraph description of the role, output format, and any constraints is usually enough to produce dramatically more consistent and useful responses than prompting without a system prompt. Jan saves the system prompt with the thread, so your carefully tuned prompts persist across sessions and do not need to be re-entered each time you open a thread.

Jan as a Starting Point for Local LLM Beginners

If you are recommending a local LLM tool to someone who has never used one before, Jan is often the most approachable entry point. It requires no technical background — the installation is a standard double-click installer, model downloading is point-and-click with RAM requirements shown clearly, and the chat interface is immediately familiar to anyone who has used ChatGPT or similar tools. There is nothing to configure at the command line, no Docker to install, and no settings to tune before getting a working chat session. Once someone is comfortable with Jan and understands what local LLMs can and cannot do, they can graduate to Ollama or LM Studio when they need more control. For the first local LLM experience, Jan’s combination of simplicity, privacy, and full offline capability makes it an excellent starting point.

Updating Jan

Jan checks for updates automatically on launch and prompts you to download the latest version when one is available. Updates install by downloading the new installer and running it — your models, threads, and settings are preserved because they are stored in the Jan data directory separately from the application binaries. If you want to stay on a specific version (for example, if a new version introduces a regression with a model you rely on), Jan’s GitHub releases page lists every version with direct download links and detailed changelogs. The model files themselves never need updating separately — GGUF format is stable and model files downloaded for one version of Jan work with all subsequent versions. New model versions from model developers require downloading the new GGUF files, but this is a user choice rather than an automatic update, giving you full control over which model weights are running on your machine at all times. This level of transparency and control over the software stack is one of the defining characteristics that sets Jan apart from closed-source alternatives in the local LLM space — and increasingly an important consideration as AI tools become more deeply integrated into professional workflows.