DeepSeek Models Explained: R1, V3, and How They Compare to GPT-4o and Claude

Who Is DeepSeek and Why Do They Matter?

DeepSeek is a Chinese AI research lab founded in 2023 as a subsidiary of High-Flyer, a quantitative hedge fund. In a little over a year, they released a series of models that genuinely surprised the AI research community — not because they were incremental improvements, but because they delivered frontier-level capability at dramatically lower training costs than their Western counterparts, and because they published their methods openly. DeepSeek’s work demonstrated that the compute efficiency gap between leading Chinese and Western AI labs had narrowed substantially, and their open-source releases gave the broader community access to genuinely capable models without API costs.

DeepSeek V3: The Frontier Open-Source Model

DeepSeek V3, released in late 2024, is a 671-billion-parameter Mixture-of-Experts (MoE) model that activates 37 billion parameters per forward pass. The MoE architecture means that despite the enormous total parameter count, inference cost is closer to a dense 37B model than a dense 671B model — making it significantly cheaper to run than its parameter count suggests.

DeepSeek V3’s benchmark performance placed it at or near the top of open-source models at release, competitive with GPT-4o and Claude Sonnet on most standard evaluations. Its coding capability was particularly noted — it topped several coding benchmarks and was widely adopted by developers as a strong open alternative to frontier coding models. The model weights are available on Hugging Face under a permissive licence, though running the full model requires substantial infrastructure (multiple H100s or equivalent). Quantised versions and API access through providers like Together AI make it accessible without dedicated GPU clusters.

DeepSeek R1: Reasoning with Chain-of-Thought

DeepSeek R1, released in January 2025, is DeepSeek’s answer to OpenAI’s o1 — a model specifically trained for complex reasoning via reinforcement learning on chain-of-thought traces. R1 “thinks” before answering, generating extended internal reasoning that it then summarises into a final response. This approach produces significantly better results on mathematical reasoning, scientific problems, and complex logic tasks than standard instruction-tuned models.

What made R1’s release particularly impactful was DeepSeek’s decision to release it as open-source with full model weights. OpenAI’s o1 and o3 are closed, API-only models. R1 being open-source meant that teams could run it locally, fine-tune it, and study its reasoning training methodology — capabilities not available for any comparable Western reasoning model.

DeepSeek also released a series of R1 distilled models — smaller models (1.5B, 7B, 8B, 14B, 32B, 70B) that were trained to replicate R1’s reasoning behaviour through knowledge distillation. The R1-Distill-Qwen-32B and R1-Distill-Llama-70B variants in particular attracted significant attention for delivering surprisingly strong reasoning at sizes that fit on consumer hardware.

Benchmark Performance: R1 and V3 vs. GPT-4o and Claude

Benchmark              | DeepSeek R1 | DeepSeek V3 | GPT-4o | Claude Sonnet 4.6
-----------------------|-------------|-------------|--------|------------------
MMLU                   |    90.8     |    88.5     |  88.7  |      88.3
MATH-500               |    97.3     |    90.2     |  76.6  |      78.3
HumanEval (coding)     |    92.1     |    89.1     |  90.2  |      93.7
LiveCodeBench          |    65.9     |    65.9     |  53.6  |      N/A
GPQA Diamond (science) |    71.5     |    59.1     |  53.6  |      65.0

R1 leads significantly on mathematical reasoning (MATH-500: 97.3 vs GPT-4o’s 76.6) — a direct consequence of its reasoning training. V3 is competitive with GPT-4o and Claude Sonnet across general benchmarks. Both models significantly outperform their Western open-source contemporaries and match or exceed many frontier closed models on specific tasks.

Running DeepSeek Models Locally

DeepSeek R1 distilled models are the most practical for local deployment. The 7B and 8B variants run on a single RTX 4090 (24 GB); the 14B runs on 24 GB at Q4; the 32B requires 24+ GB at Q4; the 70B requires dual 4090s or an A100 80GB.

# Pull R1 distilled models via Ollama
ollama pull deepseek-r1:7b
ollama pull deepseek-r1:14b
ollama pull deepseek-r1:32b
ollama pull deepseek-r1:70b

# R1 shows its thinking process — you'll see ... blocks
ollama run deepseek-r1:14b "Prove that sqrt(2) is irrational."

The thinking process output can be verbose — R1 models generate extended internal monologue before the final answer. For applications that want only the final answer without the reasoning trace, strip the content between <think> tags from the output before displaying it to users.

API Access: DeepSeek Platform vs. Third-Party Providers

DeepSeek operates its own API platform (api.deepseek.com) with pricing significantly below Western providers. DeepSeek V3 and R1 are available at roughly $0.27/$1.10 per million input/output tokens — approximately 10x cheaper than GPT-4o at comparable capability levels. The API is OpenAI-compatible, meaning existing OpenAI SDK code can point at DeepSeek’s endpoint with a base URL change:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com",
    api_key="your-deepseek-api-key"
)

response = client.chat.completions.create(
    model="deepseek-reasoner",  # R1
    messages=[{"role": "user", "content": "Solve: find all integer solutions to x^2 - y^2 = 100"}]
)
print(response.choices[0].message.content)

Third-party providers (Together AI, Fireworks, Groq) also host DeepSeek models, often with similar pricing and sometimes better latency depending on your region. For teams with data residency requirements that preclude using Chinese-operated infrastructure, these third-party US-hosted options are the practical alternative.

Data Residency and Trust Considerations

DeepSeek’s API routes data through servers in China, which creates data residency concerns for many enterprise and regulated workloads. This is a genuine constraint rather than a theoretical one — legal, compliance, and security teams in financial services, healthcare, government, and other regulated industries will often rule out DeepSeek’s own API for production use regardless of the model quality. For these use cases, the practical path is self-hosting the open-source model weights on your own infrastructure, or using a US-based third-party inference provider hosting the weights. The model weights themselves are freely available and do not create data residency issues — only sending data to DeepSeek’s API does.

When to Choose DeepSeek R1 vs. V3

R1 is purpose-built for reasoning-heavy tasks: mathematics, scientific analysis, complex logic, step-by-step problem solving. If your task benefits from extended thinking and showing working, R1 is the right choice. Expect slower generation (more tokens for the reasoning trace) and a different output format that includes the thought process. V3 is a general-purpose model optimised for throughput and breadth — coding, writing, summarisation, Q&A. It is faster than R1 for tasks that do not require extended reasoning. Use V3 as your default open-source frontier model and R1 specifically when the reasoning capability matters to your use case.

DeepSeek’s Impact on the AI Landscape

DeepSeek’s releases in late 2024 and early 2025 had an outsized impact on the broader AI landscape beyond their direct utility. Their published training efficiency claims — training V3 on roughly 2,000 H100-equivalent GPUs for $6 million — suggested that frontier model training was becoming dramatically more accessible, prompting a reassessment of the GPU demand projections that had driven NVIDIA’s valuation. Their open-source releases demonstrated that reasoning models and MoE architectures could be openly published without the capability sacrifices that had previously been assumed necessary. For practitioners, the main takeaway is practical: DeepSeek models are among the most capable open-source options available, competitive with Western frontier models on many tasks, available for local deployment, and accessible via API at significantly lower cost than GPT-4o or Claude.

Benchmark Performance: R1 and V3 vs. GPT-4o and Claude

Benchmark              | DeepSeek R1 | DeepSeek V3 | GPT-4o | Claude Sonnet 4.6
-----------------------|-------------|-------------|--------|------------------
MMLU                   |    90.8     |    88.5     |  88.7  |      88.3
MATH-500               |    97.3     |    90.2     |  76.6  |      78.3
HumanEval (coding)     |    92.1     |    89.1     |  90.2  |      93.7
GPQA Diamond (science) |    71.5     |    59.1     |  53.6  |      65.0

R1 leads significantly on mathematical reasoning — a direct result of its reinforcement learning reasoning training. V3 is competitive with GPT-4o and Claude Sonnet across general benchmarks. Both significantly outperform Western open-source contemporaries and match or exceed many frontier closed models on specific tasks.

Running DeepSeek Locally

The R1 distilled models are the most practical for local use. The 7B and 8B variants run on a single RTX 4090; the 14B at Q4 also fits; the 32B requires 24+ GB; the 70B needs dual 4090s or an A100 80GB:

ollama pull deepseek-r1:7b
ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b "Prove that sqrt(2) is irrational."
# R1 shows extended thinking — strip content between <think> tags for final-answer-only display

API Access and Pricing

DeepSeek’s own API (api.deepseek.com) offers V3 and R1 at roughly $0.27/$1.10 per million input/output tokens — approximately 10x cheaper than GPT-4o at comparable capability. The API is OpenAI-compatible, so existing SDK code needs only a base URL change. Third-party US-hosted providers (Together AI, Fireworks) also serve DeepSeek models at similar prices for teams with data residency requirements precluding the Chinese-operated endpoint.

Data Residency Considerations

DeepSeek’s own API routes data through servers in China, which creates genuine compliance concerns for regulated industries. For these workloads, self-hosting the openly licensed weights on your own infrastructure or using a US-based third-party provider resolves the concern entirely — the model weights themselves impose no data residency constraints, only sending data to DeepSeek’s API does.

When to Choose R1 vs. V3

R1 is purpose-built for reasoning-heavy tasks: mathematics, scientific analysis, complex logic, step-by-step problem solving. It generates extended thinking traces before answering, making it slower but substantially better on tasks that benefit from showing working. V3 is a general-purpose model optimised for throughput and breadth — coding, writing, Q&A, summarisation. Use V3 as your default open-source frontier model and R1 specifically when the reasoning capability is what your use case actually needs. Both are open-source, both are locally deployable, and both are available via API at a fraction of Western frontier model costs — making DeepSeek one of the most important additions to the open-source LLM ecosystem in 2025.

DeepSeek’s Impact on the AI Landscape

DeepSeek’s releases had outsized impact beyond their direct utility. Their published training efficiency claims — training V3 for roughly $6 million — suggested frontier model training was becoming dramatically more accessible and prompted reassessment of GPU demand projections that had driven NVIDIA’s valuation. Their open-source releases demonstrated that reasoning models and MoE architectures could be openly published without the capability sacrifices previously assumed necessary.

For practitioners, the takeaway is practical: DeepSeek models are among the most capable open-source options available, competitive with Western frontier models on many tasks, deployable locally under an open licence, and accessible via API at 5–10x lower cost than GPT-4o or Claude Sonnet. They are not a replacement for every use case — Claude Opus 4 still leads on nuanced reasoning and creative tasks, and GPT-4o has a more mature tool-use ecosystem — but for cost-sensitive applications requiring high general capability, DeepSeek V3 and R1 have earned their place as serious production options in 2026.

Using DeepSeek with LangChain and LlamaIndex

Because DeepSeek’s API is OpenAI-compatible, integration with LangChain and LlamaIndex requires only a configuration change — no custom connectors needed:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.deepseek.com",
    api_key="your-deepseek-key",
    model="deepseek-chat"  # V3, or "deepseek-reasoner" for R1
)

# Works with all LangChain chains, agents, and tools
response = llm.invoke("Explain the MoE architecture of DeepSeek V3.")

For local deployment through Ollama, point LangChain’s Ollama integration at the deepseek-r1 model using the same patterns as any other Ollama model. The OpenAI compatibility layer means that switching between DeepSeek’s hosted API, a third-party provider like Together AI, and a local Ollama instance is a one-line configuration change — making it easy to develop locally and deploy to the hosted API, or to fall back between providers based on availability and cost.

DeepSeek Compared to Other Open-Source Frontier Models

In the open-source landscape, DeepSeek V3 and R1 sit at the top tier alongside Llama 3.3 70B and Qwen 2.5 72B. For general tasks, all three are competitive — the differences are often smaller than the task-to-task variation within a single model. DeepSeek V3 tends to edge out Llama 3.3 70B on coding benchmarks. Qwen 2.5 72B tends to edge out both on Chinese language tasks and certain mathematical reasoning tests. Llama 3.3 70B has the largest ecosystem, the most community tooling, and the most deployment documentation. The practical recommendation: if you are starting fresh, benchmark all three on your specific use case rather than defaulting to any one. If you need the strongest available reasoning without cost constraints, DeepSeek R1 is the open-source benchmark. If you need the most battle-tested open-source deployment with the largest community, Llama 3.3 70B remains the default.

Should You Use DeepSeek in Production?

For non-regulated workloads where data can go to external APIs, DeepSeek’s hosted API is a genuinely compelling option — frontier capability at a fraction of Western provider costs, with excellent OpenAI SDK compatibility. The main operational risks to evaluate are API reliability (DeepSeek’s infrastructure is newer and has had occasional capacity issues during high-demand periods) and the data residency question for any sensitive data. For regulated workloads, self-hosting the open-source weights resolves the data residency concern and eliminates dependency on DeepSeek’s API availability entirely. Either way, the model quality warrants serious evaluation — dismissing DeepSeek models based on their origin rather than their capability would mean missing some of the most cost-effective open-source LLM options available in 2026.

Leave a Comment