How to Build a Multi-Agent System Using LangChain

Multi-agent systems represent one of the most powerful patterns in AI development, enabling complex tasks to be decomposed across specialized agents that collaborate to achieve goals beyond what any single agent could accomplish. While a single LLM agent can handle straightforward tasks, real-world applications often require orchestrating multiple specialized agents—one for research, another for data analysis, a third for content generation—each bringing unique capabilities to solve different aspects of complex problems.

LangChain has emerged as the leading framework for building these sophisticated multi-agent systems, providing the infrastructure to create, coordinate, and manage agent interactions. This comprehensive guide walks through building a complete multi-agent system from scratch, covering agent design patterns, communication protocols, state management, and orchestration strategies that enable agents to work together effectively rather than operating as isolated tools.

Understanding Multi-Agent Architecture Fundamentals

Before diving into implementation, grasping the core concepts that differentiate multi-agent systems from single-agent approaches clarifies why and when this added complexity delivers value.

What Makes a System “Multi-Agent”

A multi-agent system isn’t simply running multiple agents sequentially—it’s creating an architecture where agents coordinate, share information, and make collective decisions. Key characteristics include:

Agent specialization where each agent excels at specific tasks rather than attempting to be a generalist. A research agent focuses on finding information, a writing agent on creating content, and an analysis agent on extracting insights.

Agent communication through defined protocols that enable agents to pass information, request assistance, and coordinate actions. Agents don’t just execute in isolation—they interact and influence each other’s behavior.

Shared state management where agents access and modify common state, maintaining context about what the system has accomplished and what remains to be done.

Hierarchical or collaborative orchestration where either a supervisor agent coordinates subordinates, or peer agents negotiate and collaborate to divide work among themselves.

The Supervisor Pattern vs. Peer Collaboration

Two primary architectural patterns dominate multi-agent systems:

Supervisor pattern employs a central coordinator agent that receives requests, delegates to specialized worker agents, synthesizes their outputs, and makes final decisions. This pattern provides clear control flow and makes reasoning about system behavior easier.

Peer collaboration allows agents to communicate directly with each other, negotiating responsibilities and sharing information without central coordination. This pattern offers more flexibility but introduces complexity in managing agent interactions.

Most production systems use the supervisor pattern for its predictability, though hybrid approaches combining both patterns handle certain scenarios effectively.

Setting Up Your Multi-Agent Development Environment

Proper environment setup prevents common issues and establishes good practices from the start.

Installing Required Dependencies

Create a fresh Python environment and install LangChain with necessary extensions:

python -m venv multi-agent-env
source multi-agent-env/bin/activate  # Windows: multi-agent-env\Scripts\activate

pip install langchain langchain-openai langgraph chromadb python-dotenv

python -m venv multi-agent-env
source multi-agent-env/bin/activate  # Windows: multi-agent-env\Scripts\activate

pip install langchain langchain-openai langgraph chromadb python-dotenv

These packages provide LangChain’s core functionality, OpenAI integration (you can substitute other LLM providers), LangGraph for advanced agent orchestration, ChromaDB for retrieval capabilities, and environment variable management.

Environment Configuration

Create a .env file for API keys and configuration:

OPENAI_API_KEY=your-api-key-here
SERPAPI_API_KEY=your-serpapi-key  # For web search capabilities
MODEL_NAME=gpt-4-turbo-preview
TEMPERATURE=0.3

OPENAI_API_KEY=your-api-key-here
SERPAPI_API_KEY=your-serpapi-key  # For web search capabilities
MODEL_NAME=gpt-4-turbo-preview
TEMPERATURE=0.3

Load these in your application:

from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4-turbo-preview")

from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4-turbo-preview")

Building Your First Multi-Agent System: Research Assistant

We’ll build a practical multi-agent research assistant that coordinates three specialized agents: a researcher who finds information, an analyst who extracts insights, and a writer who synthesizes findings into coherent reports.

Defining the Agent State Schema

Multi-agent systems require shared state that all agents can access and modify:

from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    research_query: str
    research_results: str
    analysis: str
    final_report: str
    next_agent: str

from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    research_query: str
    research_results: str
    analysis: str
    final_report: str
    next_agent: str

This state schema defines what information flows through the system. The messages field maintains conversation history, while specific fields store outputs from each agent. The Annotated type with operator.add ensures messages accumulate rather than overwrite.

Creating Specialized Agent Functions

Each agent is implemented as a function that receives state, performs its specialized task, and returns updated state:

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool
from langchain import hub

llm = ChatOpenAI(model=MODEL_NAME, temperature=0.3)

def create_researcher_agent():
    """Agent specialized in finding information"""
    # Define research tools
    def web_search(query: str) -> str:
        """Search the web for information"""
        # Implement with SerpAPI or similar
        from langchain.utilities import SerpAPIWrapper
        search = SerpAPIWrapper()
        return search.run(query)
    
    tools = [
        Tool(
            name="WebSearch",
            func=web_search,
            description="Search the web for current information on any topic"
        )
    ]
    
    prompt = hub.pull("hwchase17/openai-functions-agent")
    agent = create_openai_functions_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

def researcher_node(state: AgentState) -> AgentState:
    """Research agent that finds information"""
    agent = create_researcher_agent()
    
    result = agent.invoke({
        "input": f"Research the following topic thoroughly: {state['research_query']}"
    })
    
    return {
        "research_results": result["output"],
        "next_agent": "analyst"
    }

def analyst_node(state: AgentState) -> AgentState:
    """Analysis agent that extracts insights"""
    analysis_prompt = f"""
    Analyze the following research results and extract key insights:
    
    Research Results:
    {state['research_results']}
    
    Provide:
    1. Main findings
    2. Important patterns or trends
    3. Potential implications
    """
    
    response = llm.invoke(analysis_prompt)
    
    return {
        "analysis": response.content,
        "next_agent": "writer"
    }

def writer_node(state: AgentState) -> AgentState:
    """Writing agent that creates final report"""
    writing_prompt = f"""
    Create a comprehensive report based on this research and analysis:
    
    Research Results:
    {state['research_results']}
    
    Analysis:
    {state['analysis']}
    
    Write a clear, well-structured report that synthesizes these findings.
    """
    
    response = llm.invoke(writing_prompt)
    
    return {
        "final_report": response.content,
        "next_agent": "END"
    }

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.tools import Tool
from langchain import hub

llm = ChatOpenAI(model=MODEL_NAME, temperature=0.3)

def create_researcher_agent():
    """Agent specialized in finding information"""
    # Define research tools
    def web_search(query: str) -> str:
        """Search the web for information"""
        # Implement with SerpAPI or similar
        from langchain.utilities import SerpAPIWrapper
        search = SerpAPIWrapper()
        return search.run(query)
    
    tools = [
        Tool(
            name="WebSearch",
            func=web_search,
            description="Search the web for current information on any topic"
        )
    ]
    
    prompt = hub.pull("hwchase17/openai-functions-agent")
    agent = create_openai_functions_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

def researcher_node(state: AgentState) -> AgentState:
    """Research agent that finds information"""
    agent = create_researcher_agent()
    
    result = agent.invoke({
        "input": f"Research the following topic thoroughly: {state['research_query']}"
    })
    
    return {
        "research_results": result["output"],
        "next_agent": "analyst"
    }

def analyst_node(state: AgentState) -> AgentState:
    """Analysis agent that extracts insights"""
    analysis_prompt = f"""
    Analyze the following research results and extract key insights:
    
    Research Results:
    {state['research_results']}
    
    Provide:
    1. Main findings
    2. Important patterns or trends
    3. Potential implications
    """
    
    response = llm.invoke(analysis_prompt)
    
    return {
        "analysis": response.content,
        "next_agent": "writer"
    }

def writer_node(state: AgentState) -> AgentState:
    """Writing agent that creates final report"""
    writing_prompt = f"""
    Create a comprehensive report based on this research and analysis:
    
    Research Results:
    {state['research_results']}
    
    Analysis:
    {state['analysis']}
    
    Write a clear, well-structured report that synthesizes these findings.
    """
    
    response = llm.invoke(writing_prompt)
    
    return {
        "final_report": response.content,
        "next_agent": "END"
    }

Each agent function receives the current state, performs its specialized task, and returns partial state updates that get merged into the overall state.

Orchestrating Agent Execution with LangGraph

LangGraph provides the infrastructure to connect agents and manage execution flow:

from langgraph.graph import StateGraph, END

def create_research_workflow():
    # Create the graph
    workflow = StateGraph(AgentState)
    
    # Add agent nodes
    workflow.add_node("researcher", researcher_node)
    workflow.add_node("analyst", analyst_node)
    workflow.add_node("writer", writer_node)
    
    # Define the flow
    workflow.add_edge("researcher", "analyst")
    workflow.add_edge("analyst", "writer")
    workflow.add_edge("writer", END)
    
    # Set entry point
    workflow.set_entry_point("researcher")
    
    # Compile into executable graph
    return workflow.compile()

# Use the workflow
app = create_research_workflow()

# Execute the multi-agent system
result = app.invoke({
    "research_query": "What are the latest developments in quantum computing?",
    "messages": [],
    "research_results": "",
    "analysis": "",
    "final_report": "",
    "next_agent": ""
})

print(result["final_report"])

from langgraph.graph import StateGraph, END

def create_research_workflow():
    # Create the graph
    workflow = StateGraph(AgentState)
    
    # Add agent nodes
    workflow.add_node("researcher", researcher_node)
    workflow.add_node("analyst", analyst_node)
    workflow.add_node("writer", writer_node)
    
    # Define the flow
    workflow.add_edge("researcher", "analyst")
    workflow.add_edge("analyst", "writer")
    workflow.add_edge("writer", END)
    
    # Set entry point
    workflow.set_entry_point("researcher")
    
    # Compile into executable graph
    return workflow.compile()

# Use the workflow
app = create_research_workflow()

# Execute the multi-agent system
result = app.invoke({
    "research_query": "What are the latest developments in quantum computing?",
    "messages": [],
    "research_results": "",
    "analysis": "",
    "final_report": "",
    "next_agent": ""
})

print(result["final_report"])

This creates a linear pipeline where each agent executes in sequence. The graph structure makes the flow explicit and easy to modify.

Multi-Agent Workflow Visualization

📊

User Input

Research query

→

🔍

Researcher Agent

Finds information

→

🧠

Analyst Agent

Extracts insights

→

✍️

Writer Agent

Creates report

→

📄

Final Output

Complete report

State Flow: Each agent receives the full state, performs its task, and updates specific fields. The next agent builds upon previous agents’ work through shared state.

Implementing Dynamic Agent Routing

Linear pipelines work for straightforward workflows, but complex tasks require dynamic routing where the system decides which agent to invoke based on current state.

Conditional Routing Logic

def router_node(state: AgentState) -> str:
    """Determines which agent should execute next"""
    
    # If no research results yet, go to researcher
    if not state.get("research_results"):
        return "researcher"
    
    # If research done but no analysis, go to analyst
    if state.get("research_results") and not state.get("analysis"):
        return "analyst"
    
    # If analysis done, check if we need more research
    if state.get("analysis"):
        # Use LLM to determine if research is sufficient
        decision_prompt = f"""
        Based on this analysis:
        {state['analysis']}
        
        Do we need more research, or can we proceed to writing?
        Answer with RESEARCH or WRITE
        """
        response = llm.invoke(decision_prompt)
        
        if "RESEARCH" in response.content:
            return "researcher"
        else:
            return "writer"
    
    return END

# Update workflow with conditional routing
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("analyst", analyst_node)
workflow.add_node("writer", writer_node)

# Add conditional edges based on router logic
workflow.add_conditional_edges(
    "researcher",
    lambda state: "analyst",  # Always go to analyst after research
)
workflow.add_conditional_edges(
    "analyst",
    router_node,  # Conditionally go to researcher or writer
    {
        "researcher": "researcher",
        "writer": "writer",
        END: END
    }
)
workflow.add_edge("writer", END)
workflow.set_entry_point("researcher")

app = workflow.compile()

def router_node(state: AgentState) -> str:
    """Determines which agent should execute next"""
    
    # If no research results yet, go to researcher
    if not state.get("research_results"):
        return "researcher"
    
    # If research done but no analysis, go to analyst
    if state.get("research_results") and not state.get("analysis"):
        return "analyst"
    
    # If analysis done, check if we need more research
    if state.get("analysis"):
        # Use LLM to determine if research is sufficient
        decision_prompt = f"""
        Based on this analysis:
        {state['analysis']}
        
        Do we need more research, or can we proceed to writing?
        Answer with RESEARCH or WRITE
        """
        response = llm.invoke(decision_prompt)
        
        if "RESEARCH" in response.content:
            return "researcher"
        else:
            return "writer"
    
    return END

# Update workflow with conditional routing
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher_node)
workflow.add_node("analyst", analyst_node)
workflow.add_node("writer", writer_node)

# Add conditional edges based on router logic
workflow.add_conditional_edges(
    "researcher",
    lambda state: "analyst",  # Always go to analyst after research
)
workflow.add_conditional_edges(
    "analyst",
    router_node,  # Conditionally go to researcher or writer
    {
        "researcher": "researcher",
        "writer": "writer",
        END: END
    }
)
workflow.add_edge("writer", END)
workflow.set_entry_point("researcher")

app = workflow.compile()

This routing logic enables loops—the analyst might determine more research is needed, sending execution back to the researcher before proceeding to the writer.

Building a Supervisor-Based Multi-Agent System

The supervisor pattern provides more sophisticated orchestration where a coordinator agent manages worker agents.

Creating the Supervisor Agent

from langchain.tools import tool

class SupervisorAgent:
    def __init__(self, worker_agents: dict):
        self.workers = worker_agents
        self.llm = ChatOpenAI(model=MODEL_NAME, temperature=0)
        
    @tool
    def delegate_to_researcher(query: str) -> str:
        """Delegate research tasks to the researcher agent"""
        return "Delegated to researcher"
    
    @tool
    def delegate_to_analyst(data: str) -> str:
        """Delegate analysis tasks to the analyst agent"""
        return "Delegated to analyst"
    
    @tool  
    def delegate_to_writer(content: str) -> str:
        """Delegate writing tasks to the writer agent"""
        return "Delegated to writer"
    
    def supervise(self, state: AgentState) -> AgentState:
        """Supervisor decides which agent to invoke"""
        
        supervision_prompt = f"""
        You are a supervisor coordinating a team of agents to complete research tasks.
        
        Available agents:
        - researcher: Finds information from various sources
        - analyst: Analyzes data and extracts insights
        - writer: Creates reports and documentation
        
        Current state:
        - Query: {state['research_query']}
        - Research completed: {bool(state.get('research_results'))}
        - Analysis completed: {bool(state.get('analysis'))}
        - Report completed: {bool(state.get('final_report'))}
        
        Decide which agent should work next, or if the task is complete.
        """
        
        tools = [
            self.delegate_to_researcher,
            self.delegate_to_analyst,
            self.delegate_to_writer
        ]
        
        agent = create_openai_functions_agent(self.llm, tools, supervision_prompt)
        executor = AgentExecutor(agent=agent, tools=tools)
        
        result = executor.invoke({"input": "What should we do next?"})
        
        # Parse the result to determine next action
        # This is simplified - production code needs robust parsing
        if "researcher" in result["output"].lower():
            return self.workers["researcher"](state)
        elif "analyst" in result["output"].lower():
            return self.workers["analyst"](state)
        elif "writer" in result["output"].lower():
            return self.workers["writer"](state)
        else:
            return {"next_agent": END}

from langchain.tools import tool

class SupervisorAgent:
    def __init__(self, worker_agents: dict):
        self.workers = worker_agents
        self.llm = ChatOpenAI(model=MODEL_NAME, temperature=0)
        
    @tool
    def delegate_to_researcher(query: str) -> str:
        """Delegate research tasks to the researcher agent"""
        return "Delegated to researcher"
    
    @tool
    def delegate_to_analyst(data: str) -> str:
        """Delegate analysis tasks to the analyst agent"""
        return "Delegated to analyst"
    
    @tool  
    def delegate_to_writer(content: str) -> str:
        """Delegate writing tasks to the writer agent"""
        return "Delegated to writer"
    
    def supervise(self, state: AgentState) -> AgentState:
        """Supervisor decides which agent to invoke"""
        
        supervision_prompt = f"""
        You are a supervisor coordinating a team of agents to complete research tasks.
        
        Available agents:
        - researcher: Finds information from various sources
        - analyst: Analyzes data and extracts insights
        - writer: Creates reports and documentation
        
        Current state:
        - Query: {state['research_query']}
        - Research completed: {bool(state.get('research_results'))}
        - Analysis completed: {bool(state.get('analysis'))}
        - Report completed: {bool(state.get('final_report'))}
        
        Decide which agent should work next, or if the task is complete.
        """
        
        tools = [
            self.delegate_to_researcher,
            self.delegate_to_analyst,
            self.delegate_to_writer
        ]
        
        agent = create_openai_functions_agent(self.llm, tools, supervision_prompt)
        executor = AgentExecutor(agent=agent, tools=tools)
        
        result = executor.invoke({"input": "What should we do next?"})
        
        # Parse the result to determine next action
        # This is simplified - production code needs robust parsing
        if "researcher" in result["output"].lower():
            return self.workers["researcher"](state)
        elif "analyst" in result["output"].lower():
            return self.workers["analyst"](state)
        elif "writer" in result["output"].lower():
            return self.workers["writer"](state)
        else:
            return {"next_agent": END}

The supervisor uses its own reasoning to decide which worker agent should execute next based on current state and task requirements.

Managing Agent Communication and Collaboration

Effective multi-agent systems require agents to communicate beyond simple sequential handoffs.

Implementing Agent-to-Agent Messages

from langchain_core.messages import HumanMessage, AIMessage

def create_communicating_agents():
    """Agents that can send messages to each other"""
    
    def researcher_with_communication(state: AgentState) -> AgentState:
        # Perform research
        results = perform_research(state['research_query'])
        
        # Send message to analyst with specific questions
        message_to_analyst = HumanMessage(
            content=f"I found this information: {results}. Can you identify any gaps or areas needing deeper investigation?"
        )
        
        return {
            "research_results": results,
            "messages": [message_to_analyst],
            "next_agent": "analyst"
        }
    
    def analyst_with_feedback(state: AgentState) -> AgentState:
        # Read message from researcher
        last_message = state["messages"][-1] if state["messages"] else None
        
        # Analyze and provide feedback
        analysis = perform_analysis(state['research_results'])
        
        # Might request more research
        if needs_more_research(analysis):
            feedback = HumanMessage(
                content=f"Analysis reveals gaps. Please research: {identify_gaps(analysis)}"
            )
            return {
                "messages": [feedback],
                "next_agent": "researcher"  # Loop back
            }
        else:
            return {
                "analysis": analysis,
                "messages": [AIMessage(content="Analysis complete")],
                "next_agent": "writer"
            }

from langchain_core.messages import HumanMessage, AIMessage

def create_communicating_agents():
    """Agents that can send messages to each other"""
    
    def researcher_with_communication(state: AgentState) -> AgentState:
        # Perform research
        results = perform_research(state['research_query'])
        
        # Send message to analyst with specific questions
        message_to_analyst = HumanMessage(
            content=f"I found this information: {results}. Can you identify any gaps or areas needing deeper investigation?"
        )
        
        return {
            "research_results": results,
            "messages": [message_to_analyst],
            "next_agent": "analyst"
        }
    
    def analyst_with_feedback(state: AgentState) -> AgentState:
        # Read message from researcher
        last_message = state["messages"][-1] if state["messages"] else None
        
        # Analyze and provide feedback
        analysis = perform_analysis(state['research_results'])
        
        # Might request more research
        if needs_more_research(analysis):
            feedback = HumanMessage(
                content=f"Analysis reveals gaps. Please research: {identify_gaps(analysis)}"
            )
            return {
                "messages": [feedback],
                "next_agent": "researcher"  # Loop back
            }
        else:
            return {
                "analysis": analysis,
                "messages": [AIMessage(content="Analysis complete")],
                "next_agent": "writer"
            }

This pattern enables agents to provide feedback, request additional work, and iterate until reaching satisfactory results.

State Sharing and Coordination

class SharedKnowledgeBase:
    """Shared state accessible to all agents"""
    
    def __init__(self):
        self.knowledge = {}
        self.agent_outputs = {}
        
    def store_finding(self, agent_name: str, key: str, value: str):
        """Agent stores a finding for others to access"""
        if agent_name not in self.agent_outputs:
            self.agent_outputs[agent_name] = {}
        self.agent_outputs[agent_name][key] = value
        
    def get_findings(self, agent_name: str = None) -> dict:
        """Retrieve findings from specific agent or all agents"""
        if agent_name:
            return self.agent_outputs.get(agent_name, {})
        return self.agent_outputs
    
    def update_knowledge(self, key: str, value: str):
        """Update shared knowledge base"""
        self.knowledge[key] = value

# Use in agent functions
kb = SharedKnowledgeBase()

def collaborative_researcher(state: AgentState) -> AgentState:
    results = perform_research(state['research_query'])
    
    # Store for other agents
    kb.store_finding("researcher", "primary_sources", results)
    
    # Check what analyst might need
    analyst_needs = kb.get_findings("analyst")
    
    return {"research_results": results}

class SharedKnowledgeBase:
    """Shared state accessible to all agents"""
    
    def __init__(self):
        self.knowledge = {}
        self.agent_outputs = {}
        
    def store_finding(self, agent_name: str, key: str, value: str):
        """Agent stores a finding for others to access"""
        if agent_name not in self.agent_outputs:
            self.agent_outputs[agent_name] = {}
        self.agent_outputs[agent_name][key] = value
        
    def get_findings(self, agent_name: str = None) -> dict:
        """Retrieve findings from specific agent or all agents"""
        if agent_name:
            return self.agent_outputs.get(agent_name, {})
        return self.agent_outputs
    
    def update_knowledge(self, key: str, value: str):
        """Update shared knowledge base"""
        self.knowledge[key] = value

# Use in agent functions
kb = SharedKnowledgeBase()

def collaborative_researcher(state: AgentState) -> AgentState:
    results = perform_research(state['research_query'])
    
    # Store for other agents
    kb.store_finding("researcher", "primary_sources", results)
    
    # Check what analyst might need
    analyst_needs = kb.get_findings("analyst")
    
    return {"research_results": results}

Shared knowledge bases enable agents to coordinate without explicit message passing, useful for complex scenarios where multiple agents contribute to collective understanding.

Multi-Agent Patterns Comparison

📋 Sequential Pipeline

Flow: Agent A → Agent B → Agent C → Complete
Best for: Well-defined processes with clear stages
Complexity: Low

Example: Research → Analyze → Write (our first implementation)

🔄 Dynamic Routing

Flow: Decision logic determines next agent based on state
Best for: Tasks requiring iteration and refinement
Complexity: Medium

Example: Research → Analyze → (More research needed?) → Write

👔 Supervisor Pattern

Flow: Supervisor coordinates worker agents dynamically
Best for: Complex tasks requiring intelligent coordination
Complexity: High

Example: Supervisor delegates to appropriate specialists as needed

🤝 Peer Collaboration

Flow: Agents communicate directly and negotiate responsibilities
Best for: Emergent workflows and distributed decision-making
Complexity: Very High

Example: Agents request help from each other, share findings, iterate

Error Handling and Resilience in Multi-Agent Systems

Production multi-agent systems require robust error handling since failures can occur at any agent in the chain.

Implementing Retry Logic

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def resilient_agent_call(agent_func, state):
    """Call agent with automatic retry on failure"""
    try:
        return agent_func(state)
    except Exception as e:
        print(f"Agent failed: {e}. Retrying...")
        raise

def researcher_node_resilient(state: AgentState) -> AgentState:
    """Researcher with error handling"""
    try:
        return resilient_agent_call(researcher_node, state)
    except Exception as e:
        # Fallback behavior when retries exhausted
        return {
            "research_results": f"Research failed: {str(e)}. Using cached data if available.",
            "next_agent": "analyst"  # Continue workflow with degraded data
        }

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def resilient_agent_call(agent_func, state):
    """Call agent with automatic retry on failure"""
    try:
        return agent_func(state)
    except Exception as e:
        print(f"Agent failed: {e}. Retrying...")
        raise

def researcher_node_resilient(state: AgentState) -> AgentState:
    """Researcher with error handling"""
    try:
        return resilient_agent_call(researcher_node, state)
    except Exception as e:
        # Fallback behavior when retries exhausted
        return {
            "research_results": f"Research failed: {str(e)}. Using cached data if available.",
            "next_agent": "analyst"  # Continue workflow with degraded data
        }

Retry logic with exponential backoff handles transient failures like API timeouts, while fallback strategies ensure the system continues operating even when components fail.

Agent Health Monitoring

import time
from datetime import datetime

class AgentMonitor:
    def __init__(self):
        self.agent_stats = {}
        
    def track_execution(self, agent_name: str, start_time: float, success: bool):
        if agent_name not in self.agent_stats:
            self.agent_stats[agent_name] = {
                "total_calls": 0,
                "successful_calls": 0,
                "failed_calls": 0,
                "avg_duration": 0,
                "last_execution": None
            }
        
        stats = self.agent_stats[agent_name]
        duration = time.time() - start_time
        
        stats["total_calls"] += 1
        stats["avg_duration"] = (
            (stats["avg_duration"] * (stats["total_calls"] - 1) + duration) 
            / stats["total_calls"]
        )
        stats["last_execution"] = datetime.now()
        
        if success:
            stats["successful_calls"] += 1
        else:
            stats["failed_calls"] += 1
    
    def get_health_report(self) -> dict:
        return {
            agent: {
                **stats,
                "success_rate": stats["successful_calls"] / stats["total_calls"] if stats["total_calls"] > 0 else 0
            }
            for agent, stats in self.agent_stats.items()
        }

# Use in agents
monitor = AgentMonitor()

def monitored_researcher(state: AgentState) -> AgentState:
    start = time.time()
    try:
        result = researcher_node(state)
        monitor.track_execution("researcher", start, True)
        return result
    except Exception as e:
        monitor.track_execution("researcher", start, False)
        raise

import time
from datetime import datetime

class AgentMonitor:
    def __init__(self):
        self.agent_stats = {}
        
    def track_execution(self, agent_name: str, start_time: float, success: bool):
        if agent_name not in self.agent_stats:
            self.agent_stats[agent_name] = {
                "total_calls": 0,
                "successful_calls": 0,
                "failed_calls": 0,
                "avg_duration": 0,
                "last_execution": None
            }
        
        stats = self.agent_stats[agent_name]
        duration = time.time() - start_time
        
        stats["total_calls"] += 1
        stats["avg_duration"] = (
            (stats["avg_duration"] * (stats["total_calls"] - 1) + duration) 
            / stats["total_calls"]
        )
        stats["last_execution"] = datetime.now()
        
        if success:
            stats["successful_calls"] += 1
        else:
            stats["failed_calls"] += 1
    
    def get_health_report(self) -> dict:
        return {
            agent: {
                **stats,
                "success_rate": stats["successful_calls"] / stats["total_calls"] if stats["total_calls"] > 0 else 0
            }
            for agent, stats in self.agent_stats.items()
        }

# Use in agents
monitor = AgentMonitor()

def monitored_researcher(state: AgentState) -> AgentState:
    start = time.time()
    try:
        result = researcher_node(state)
        monitor.track_execution("researcher", start, True)
        return result
    except Exception as e:
        monitor.track_execution("researcher", start, False)
        raise

Monitoring provides visibility into agent performance, identifying bottlenecks and reliability issues before they impact production.

Conclusion

Building multi-agent systems with LangChain transforms complex AI applications from monolithic, single-model approaches into sophisticated orchestrations of specialized agents working collaboratively. The patterns covered—sequential pipelines for straightforward workflows, dynamic routing for adaptive processes, supervisor coordination for complex tasks, and peer collaboration for distributed intelligence—provide a comprehensive toolkit for architecting systems that decompose problems across specialized agents while maintaining coherent execution through shared state and intelligent orchestration.

Success with multi-agent systems comes from matching architectural patterns to your specific requirements rather than over-engineering with unnecessary complexity. Start with simple sequential pipelines to validate your agent designs and understand state flow, then incrementally add dynamic routing or supervisor patterns as complexity demands. The code examples and patterns demonstrated here provide production-ready foundations that scale from prototype experiments to deployed applications serving real users with the reliability and robustness that multi-agent coordination requires.