Multi-agent systems represent one of the most powerful patterns in AI development, enabling complex tasks to be decomposed across specialized agents that collaborate to achieve goals beyond what any single agent could accomplish. While a single LLM agent can handle straightforward tasks, real-world applications often require orchestrating multiple specialized agents—one for research, another for data analysis, a third for content generation—each bringing unique capabilities to solve different aspects of complex problems.
LangChain has emerged as the leading framework for building these sophisticated multi-agent systems, providing the infrastructure to create, coordinate, and manage agent interactions. This comprehensive guide walks through building a complete multi-agent system from scratch, covering agent design patterns, communication protocols, state management, and orchestration strategies that enable agents to work together effectively rather than operating as isolated tools.
Understanding Multi-Agent Architecture Fundamentals
Before diving into implementation, grasping the core concepts that differentiate multi-agent systems from single-agent approaches clarifies why and when this added complexity delivers value.
What Makes a System “Multi-Agent”
A multi-agent system isn’t simply running multiple agents sequentially—it’s creating an architecture where agents coordinate, share information, and make collective decisions. Key characteristics include:
Agent specialization where each agent excels at specific tasks rather than attempting to be a generalist. A research agent focuses on finding information, a writing agent on creating content, and an analysis agent on extracting insights.
Agent communication through defined protocols that enable agents to pass information, request assistance, and coordinate actions. Agents don’t just execute in isolation—they interact and influence each other’s behavior.
Shared state management where agents access and modify common state, maintaining context about what the system has accomplished and what remains to be done.
Hierarchical or collaborative orchestration where either a supervisor agent coordinates subordinates, or peer agents negotiate and collaborate to divide work among themselves.
The Supervisor Pattern vs. Peer Collaboration
Two primary architectural patterns dominate multi-agent systems:
Supervisor pattern employs a central coordinator agent that receives requests, delegates to specialized worker agents, synthesizes their outputs, and makes final decisions. This pattern provides clear control flow and makes reasoning about system behavior easier.
Peer collaboration allows agents to communicate directly with each other, negotiating responsibilities and sharing information without central coordination. This pattern offers more flexibility but introduces complexity in managing agent interactions.
Most production systems use the supervisor pattern for its predictability, though hybrid approaches combining both patterns handle certain scenarios effectively.
Setting Up Your Multi-Agent Development Environment
Proper environment setup prevents common issues and establishes good practices from the start.
Installing Required Dependencies
Create a fresh Python environment and install LangChain with necessary extensions:
python -m venv multi-agent-env
source multi-agent-env/bin/activate # Windows: multi-agent-env\Scripts\activate
pip install langchain langchain-openai langgraph chromadb python-dotenv
These packages provide LangChain’s core functionality, OpenAI integration (you can substitute other LLM providers), LangGraph for advanced agent orchestration, ChromaDB for retrieval capabilities, and environment variable management.
Environment Configuration
Create a .env file for API keys and configuration:
OPENAI_API_KEY=your-api-key-here
SERPAPI_API_KEY=your-serpapi-key # For web search capabilities
MODEL_NAME=gpt-4-turbo-preview
TEMPERATURE=0.3
Load these in your application:
from dotenv import load_dotenv
import os
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4-turbo-preview")
Building Your First Multi-Agent System: Research Assistant
We’ll build a practical multi-agent research assistant that coordinates three specialized agents: a researcher who finds information, an analyst who extracts insights, and a writer who synthesizes findings into coherent reports.
Defining the Agent State Schema
Multi-agent systems require shared state that all agents can access and modify:
from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]
research_query: str
research_results: str
analysis: str
final_report: str
next_agent: str
This state schema defines what information flows through the system. The messages field maintains conversation history, while specific fields store outputs from each agent. The Annotated type with operator.add ensures messages accumulate rather than overwrite.
Creating Specialized Agent Functions
Each agent is implemented as a function that receives state, performs its specialized task, and returns updated state:
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool
from langchain import hub
llm = ChatOpenAI(model=MODEL_NAME, temperature=0.3)
def create_researcher_agent():
"""Agent specialized in finding information"""
# Define research tools
def web_search(query: str) -> str:
"""Search the web for information"""
# Implement with SerpAPI or similar
from langchain.utilities import SerpAPIWrapper
search = SerpAPIWrapper()
return search.run(query)
tools = [
Tool(
name="WebSearch",
func=web_search,
description="Search the web for current information on any topic"
)
]
prompt = hub.pull("hwchase17/openai-functions-agent")
agent = create_openai_functions_agent(llm, tools, prompt)
return AgentExecutor(agent=agent, tools=tools, verbose=True)
def researcher_node(state: AgentState) -> AgentState:
"""Research agent that finds information"""
agent = create_researcher_agent()
result = agent.invoke({
"input": f"Research the following topic thoroughly: {state['research_query']}"
})
return {
"research_results": result["output"],
"next_agent": "analyst"
}
def analyst_node(state: AgentState) -> AgentState:
"""Analysis agent that extracts insights"""
analysis_prompt = f"""
Analyze the following research results and extract key insights:
Research Results:
{state['research_results']}
Provide:
1. Main findings
2. Important patterns or trends
3. Potential implications
"""
response = llm.invoke(analysis_prompt)
return {
"analysis": response.content,
"next_agent": "writer"
}
def writer_node(state: AgentState) -> AgentState:
"""Writing agent that creates final report"""
writing_prompt = f"""
Create a comprehensive report based on this research and analysis:
Research Results:
{state['research_results']}
Analysis:
{state['analysis']}
Write a clear, well-structured report that synthesizes these findings.
"""
response = llm.invoke(writing_prompt)
return {
"final_report": response.content,
"next_agent": "END"
}
Each agent function receives the current state, performs its specialized task, and returns partial state updates that get merged into the overall state.
Orchestrating Agent Execution with LangGraph
LangGraph provides the infrastructure to connect agents and manage execution flow:
from langgraph.graph import StateGraph, END
def create_research_workflow():
# Create the graph
workflow = StateGraph(AgentState)
# Add agent nodes
workflow.add_node("researcher", researcher_node)
workflow.add_node("analyst", analyst_node)
workflow.add_node("writer", writer_node)
# Define the flow
workflow.add_edge("researcher", "analyst")
workflow.add_edge("analyst", "writer")
workflow.add_edge("writer", END)
# Set entry point
workflow.set_entry_point("researcher")
# Compile into executable graph
return workflow.compile()
# Use the workflow
app = create_research_workflow()
# Execute the multi-agent system
result = app.invoke({
"research_query": "What are the latest developments in quantum computing?",
"messages": [],
"research_results": "",
"analysis": "",
"final_report": "",
"next_agent": ""
})
print(result["final_report"])
This creates a linear pipeline where each agent executes in sequence. The graph structure makes the flow explicit and easy to modify.
Multi-Agent Workflow Visualization
Implementing Dynamic Agent Routing
Linear pipelines work for straightforward workflows, but complex tasks require dynamic routing where the system decides which agent to invoke based on current state.
Conditional Routing Logic
def router_node(state: AgentState) -> str:
"""Determines which agent should execute next"""
# If no research results yet, go to researcher
if not state.get("research_results"):
return "researcher"
# If research done but no analysis, go to analyst
if state.get("research_results") and not state.get("analysis"):
return "analyst"
# If analysis done, check if we need more research
if state.get("analysis"):
# Use LLM to determine if research is sufficient
decision_prompt = f"""
Based on this analysis:
{state['analysis']}
Do we need more research, or can we proceed to writing?
Answer with RESEARCH or WRITE
"""
response = llm.invoke(decision_prompt)
if "RESEARCH" in response.content:
return "researcher"
else:
return "writer"
return END
# Update workflow with conditional routing
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("analyst", analyst_node)
workflow.add_node("writer", writer_node)
# Add conditional edges based on router logic
workflow.add_conditional_edges(
"researcher",
lambda state: "analyst", # Always go to analyst after research
)
workflow.add_conditional_edges(
"analyst",
router_node, # Conditionally go to researcher or writer
{
"researcher": "researcher",
"writer": "writer",
END: END
}
)
workflow.add_edge("writer", END)
workflow.set_entry_point("researcher")
app = workflow.compile()
This routing logic enables loops—the analyst might determine more research is needed, sending execution back to the researcher before proceeding to the writer.
Building a Supervisor-Based Multi-Agent System
The supervisor pattern provides more sophisticated orchestration where a coordinator agent manages worker agents.
Creating the Supervisor Agent
from langchain.tools import tool
class SupervisorAgent:
def __init__(self, worker_agents: dict):
self.workers = worker_agents
self.llm = ChatOpenAI(model=MODEL_NAME, temperature=0)
@tool
def delegate_to_researcher(query: str) -> str:
"""Delegate research tasks to the researcher agent"""
return "Delegated to researcher"
@tool
def delegate_to_analyst(data: str) -> str:
"""Delegate analysis tasks to the analyst agent"""
return "Delegated to analyst"
@tool
def delegate_to_writer(content: str) -> str:
"""Delegate writing tasks to the writer agent"""
return "Delegated to writer"
def supervise(self, state: AgentState) -> AgentState:
"""Supervisor decides which agent to invoke"""
supervision_prompt = f"""
You are a supervisor coordinating a team of agents to complete research tasks.
Available agents:
- researcher: Finds information from various sources
- analyst: Analyzes data and extracts insights
- writer: Creates reports and documentation
Current state:
- Query: {state['research_query']}
- Research completed: {bool(state.get('research_results'))}
- Analysis completed: {bool(state.get('analysis'))}
- Report completed: {bool(state.get('final_report'))}
Decide which agent should work next, or if the task is complete.
"""
tools = [
self.delegate_to_researcher,
self.delegate_to_analyst,
self.delegate_to_writer
]
agent = create_openai_functions_agent(self.llm, tools, supervision_prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({"input": "What should we do next?"})
# Parse the result to determine next action
# This is simplified - production code needs robust parsing
if "researcher" in result["output"].lower():
return self.workers["researcher"](state)
elif "analyst" in result["output"].lower():
return self.workers["analyst"](state)
elif "writer" in result["output"].lower():
return self.workers["writer"](state)
else:
return {"next_agent": END}
The supervisor uses its own reasoning to decide which worker agent should execute next based on current state and task requirements.
Managing Agent Communication and Collaboration
Effective multi-agent systems require agents to communicate beyond simple sequential handoffs.
Implementing Agent-to-Agent Messages
from langchain_core.messages import HumanMessage, AIMessage
def create_communicating_agents():
"""Agents that can send messages to each other"""
def researcher_with_communication(state: AgentState) -> AgentState:
# Perform research
results = perform_research(state['research_query'])
# Send message to analyst with specific questions
message_to_analyst = HumanMessage(
content=f"I found this information: {results}. Can you identify any gaps or areas needing deeper investigation?"
)
return {
"research_results": results,
"messages": [message_to_analyst],
"next_agent": "analyst"
}
def analyst_with_feedback(state: AgentState) -> AgentState:
# Read message from researcher
last_message = state["messages"][-1] if state["messages"] else None
# Analyze and provide feedback
analysis = perform_analysis(state['research_results'])
# Might request more research
if needs_more_research(analysis):
feedback = HumanMessage(
content=f"Analysis reveals gaps. Please research: {identify_gaps(analysis)}"
)
return {
"messages": [feedback],
"next_agent": "researcher" # Loop back
}
else:
return {
"analysis": analysis,
"messages": [AIMessage(content="Analysis complete")],
"next_agent": "writer"
}
This pattern enables agents to provide feedback, request additional work, and iterate until reaching satisfactory results.
State Sharing and Coordination
class SharedKnowledgeBase:
"""Shared state accessible to all agents"""
def __init__(self):
self.knowledge = {}
self.agent_outputs = {}
def store_finding(self, agent_name: str, key: str, value: str):
"""Agent stores a finding for others to access"""
if agent_name not in self.agent_outputs:
self.agent_outputs[agent_name] = {}
self.agent_outputs[agent_name][key] = value
def get_findings(self, agent_name: str = None) -> dict:
"""Retrieve findings from specific agent or all agents"""
if agent_name:
return self.agent_outputs.get(agent_name, {})
return self.agent_outputs
def update_knowledge(self, key: str, value: str):
"""Update shared knowledge base"""
self.knowledge[key] = value
# Use in agent functions
kb = SharedKnowledgeBase()
def collaborative_researcher(state: AgentState) -> AgentState:
results = perform_research(state['research_query'])
# Store for other agents
kb.store_finding("researcher", "primary_sources", results)
# Check what analyst might need
analyst_needs = kb.get_findings("analyst")
return {"research_results": results}
Shared knowledge bases enable agents to coordinate without explicit message passing, useful for complex scenarios where multiple agents contribute to collective understanding.
Multi-Agent Patterns Comparison
Best for: Well-defined processes with clear stages
Complexity: Low
Best for: Tasks requiring iteration and refinement
Complexity: Medium
Best for: Complex tasks requiring intelligent coordination
Complexity: High
Best for: Emergent workflows and distributed decision-making
Complexity: Very High
Error Handling and Resilience in Multi-Agent Systems
Production multi-agent systems require robust error handling since failures can occur at any agent in the chain.
Implementing Retry Logic
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def resilient_agent_call(agent_func, state):
"""Call agent with automatic retry on failure"""
try:
return agent_func(state)
except Exception as e:
print(f"Agent failed: {e}. Retrying...")
raise
def researcher_node_resilient(state: AgentState) -> AgentState:
"""Researcher with error handling"""
try:
return resilient_agent_call(researcher_node, state)
except Exception as e:
# Fallback behavior when retries exhausted
return {
"research_results": f"Research failed: {str(e)}. Using cached data if available.",
"next_agent": "analyst" # Continue workflow with degraded data
}
Retry logic with exponential backoff handles transient failures like API timeouts, while fallback strategies ensure the system continues operating even when components fail.
Agent Health Monitoring
import time
from datetime import datetime
class AgentMonitor:
def __init__(self):
self.agent_stats = {}
def track_execution(self, agent_name: str, start_time: float, success: bool):
if agent_name not in self.agent_stats:
self.agent_stats[agent_name] = {
"total_calls": 0,
"successful_calls": 0,
"failed_calls": 0,
"avg_duration": 0,
"last_execution": None
}
stats = self.agent_stats[agent_name]
duration = time.time() - start_time
stats["total_calls"] += 1
stats["avg_duration"] = (
(stats["avg_duration"] * (stats["total_calls"] - 1) + duration)
/ stats["total_calls"]
)
stats["last_execution"] = datetime.now()
if success:
stats["successful_calls"] += 1
else:
stats["failed_calls"] += 1
def get_health_report(self) -> dict:
return {
agent: {
**stats,
"success_rate": stats["successful_calls"] / stats["total_calls"] if stats["total_calls"] > 0 else 0
}
for agent, stats in self.agent_stats.items()
}
# Use in agents
monitor = AgentMonitor()
def monitored_researcher(state: AgentState) -> AgentState:
start = time.time()
try:
result = researcher_node(state)
monitor.track_execution("researcher", start, True)
return result
except Exception as e:
monitor.track_execution("researcher", start, False)
raise
Monitoring provides visibility into agent performance, identifying bottlenecks and reliability issues before they impact production.
Conclusion
Building multi-agent systems with LangChain transforms complex AI applications from monolithic, single-model approaches into sophisticated orchestrations of specialized agents working collaboratively. The patterns covered—sequential pipelines for straightforward workflows, dynamic routing for adaptive processes, supervisor coordination for complex tasks, and peer collaboration for distributed intelligence—provide a comprehensive toolkit for architecting systems that decompose problems across specialized agents while maintaining coherent execution through shared state and intelligent orchestration.
Success with multi-agent systems comes from matching architectural patterns to your specific requirements rather than over-engineering with unnecessary complexity. Start with simple sequential pipelines to validate your agent designs and understand state flow, then incrementally add dynamic routing or supervisor patterns as complexity demands. The code examples and patterns demonstrated here provide production-ready foundations that scale from prototype experiments to deployed applications serving real users with the reliability and robustness that multi-agent coordination requires.