How to Build LLM Agents with LangChain: Tools, Memory, and Production Patterns

Why LangChain for Agents?

LangChain is the most widely-used framework for building LLM-powered agents and applications. Its agent abstractions handle the decision loop — deciding which tools to use, calling them, processing results, and continuing until the task is complete — so you can focus on defining tools and goals rather than orchestration logic. In 2026, LangChain’s agent primitives and LangGraph (its graph-based workflow engine) together cover the full spectrum from simple tool-calling agents to complex multi-step workflows with branching, parallelism, and human-in-the-loop controls.

Core Concepts: Tools, Agents, and Executors

A LangChain agent has three main components. Tools are functions the agent can call — web search, database queries, calculators, API calls. The LLM acts as the agent’s brain, deciding which tool to use and interpreting results. The agent executor runs the decision loop, calling tools and feeding results back to the LLM until a final answer is produced.

pip install langchain langchain-anthropic langchain-community

from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool

llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)

@tool
def get_word_count(text: str) -> int:
    """Count the number of words in a text string."""
    return len(text.split())

@tool
def calculate(expression: str) -> float:
    """Evaluate a mathematical expression. E.g. '2 + 2' or '100 * 0.15'."""
    return eval(expression)  # Use a safe eval in production

tools = [get_word_count, calculate]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant with access to tools. Use them when needed."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({"input": "How many words are in 'the quick brown fox'? Then multiply that by 15."})
print(result["output"])

Adding Web Search

Connect agents to live web search using DuckDuckGo or Tavily:

from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.tools.tavily_search import TavilySearchResults

# DuckDuckGo (free, no API key)
search = DuckDuckGoSearchRun()
tools = [search, calculate]

# Tavily (better quality, needs API key)
# import os; os.environ["TAVILY_API_KEY"] = "your-key"
tavily = TavilySearchResults(max_results=3)
tools = [tavily, calculate]

executor = AgentExecutor(agent=create_tool_calling_agent(llm, tools, prompt), tools=tools, verbose=True)
result = executor.invoke({"input": "What is the current price of NVIDIA stock? Is it higher or lower than $500?"})
print(result["output"])

Memory: Stateful Conversations

By default, LangChain agents are stateless — each invocation starts fresh. Add conversation memory to give agents context across turns:

from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

message_history = ChatMessageHistory()

agent_with_history = RunnableWithMessageHistory(
    executor,
    lambda session_id: message_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

config = {"configurable": {"session_id": "user-123"}}

response1 = agent_with_history.invoke({"input": "My name is Alice."}, config=config)
response2 = agent_with_history.invoke({"input": "What's my name?"}, config=config)
print(response2["output"])  # "Your name is Alice."

For production applications serving multiple users, store chat history in Redis or a database rather than in-memory, keyed by user session ID. LangChain provides RedisChatMessageHistory and SQLChatMessageHistory for persistent storage with no changes to the agent logic.

Custom Tools with Input Validation

For tools that interact with databases or external APIs, use the BaseTool class for more control over input validation and error handling:

from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type

class DatabaseQueryInput(BaseModel):
    table: str = Field(description="Table name to query: 'users', 'orders', or 'products'")
    filter_field: str = Field(description="Field to filter on")
    filter_value: str = Field(description="Value to filter by")
    limit: int = Field(default=10, description="Max rows to return (1-100)", ge=1, le=100)

class DatabaseQueryTool(BaseTool):
    name: str = "query_database"
    description: str = "Query the internal database for business data."
    args_schema: Type[BaseModel] = DatabaseQueryInput

    def _run(self, table: str, filter_field: str, filter_value: str, limit: int = 10) -> str:
        allowed_tables = {"users", "orders", "products"}
        if table not in allowed_tables:
            return f"Error: table '{table}' not allowed. Choose from: {allowed_tables}"
        # Execute safe parameterised query
        results = db.execute(f"SELECT * FROM {table} WHERE {filter_field} = ? LIMIT ?",
                             [filter_value, limit])
        return str(results.fetchall())

tools = [DatabaseQueryTool()]

Pydantic validation in the input schema catches malformed arguments before they reach your database. Always allowlist table names and field names rather than passing model-generated strings directly to SQL — prompt injection via tool arguments is a real attack vector.

LangGraph: Complex Multi-Step Workflows

For agents that need branching logic, parallel execution, or human-in-the-loop steps, LangGraph provides a graph-based workflow engine built on top of LangChain:

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

def should_continue(state: AgentState) -> str:
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return END

def call_model(state: AgentState):
    response = llm.bind_tools(tools).invoke(state["messages"])
    return {"messages": [response]}

workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("tools", "agent")

app = workflow.compile()
result = app.invoke({"messages": [("user", "Search for the latest LLM papers and summarise the top 3.")]})
print(result["messages"][-1].content)

LangGraph’s explicit graph structure makes complex agent behaviour inspectable and debuggable — you can visualise the full decision flow, add breakpoints for human review, and handle errors at specific nodes. For production agents that take real-world actions, LangGraph’s structured approach is significantly safer than the implicit loop in standard AgentExecutor.

Streaming Agent Responses

Stream agent outputs token-by-token for a responsive user interface:

from langchain_core.callbacks import StreamingStdOutCallbackHandler

llm_streaming = ChatAnthropic(
    model="claude-sonnet-4-6",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

# For LangGraph, stream events
async for event in app.astream_events(
    {"messages": [("user", "Research AI chip companies and compare their recent performance.")]},
    version="v2"
):
    if event["event"] == "on_chat_model_stream":
        chunk = event["data"]["chunk"]
        if hasattr(chunk, "content") and chunk.content:
            print(chunk.content, end="", flush=True)

RAG-Powered Agents

Combine agents with a retrieval tool so they can query your document corpus when needed:

from langchain_community.vectorstores import PGVector
from langchain_openai import OpenAIEmbeddings
from langchain.tools.retriever import create_retriever_tool

vectorstore = PGVector(
    connection_string="postgresql://user:pass@localhost:5432/ragdb",
    embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
    collection_name="company_docs"
)

retriever_tool = create_retriever_tool(
    vectorstore.as_retriever(search_kwargs={"k": 5}),
    name="search_company_docs",
    description="Search internal company documentation, policies, and procedures. Use when asked about company-specific information."
)

tools = [retriever_tool, DuckDuckGoSearchRun()]
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

The agent now routes questions intelligently: company-specific questions go to the internal retriever; current events or external information go to web search. It combines both when a question spans internal policy and external context.

Production Hardening

Several patterns are essential for production LangChain agent deployments. Set a maximum iterations limit on AgentExecutor (max_iterations=10) to prevent runaway agents. Enable verbose logging in development but switch to structured JSON logging in production — verbose=False with a custom callback that logs to your observability stack. Add timeouts to tool calls so a slow external API cannot block the entire agent indefinitely. Validate all tool arguments before execution, treating model-generated arguments with the same distrust as user input. Use LangSmith for tracing — connecting your LangChain app to LangSmith gives you full visibility into every tool call, LLM call, and token usage with minimal setup. And always run your agent through adversarial test cases before deploying: what happens when a user asks the agent to perform an action outside its intended scope, when a tool returns an error, or when the tool call arguments are malformed?

When to Use LangGraph vs. AgentExecutor

AgentExecutor is simpler and appropriate for agents with straightforward tool-calling loops where the decision is just “call a tool or respond.” LangGraph is the right choice when you need: explicit branching between different agent behaviours, parallel execution of multiple agent paths, human-in-the-loop checkpoints before irreversible actions, complex state management across many steps, or fine-grained error handling at specific workflow nodes. For most teams starting with agents, AgentExecutor is the right beginning — it handles 80% of agent use cases with minimal boilerplate. Migrate to LangGraph when your agent logic becomes complex enough that you need to see and control the exact flow explicitly, rather than relying on the LLM to make all decisions about the execution path.

Memory: Stateful Conversations

By default LangChain agents are stateless — each invocation starts fresh. Add conversation memory to give agents context across turns using RunnableWithMessageHistory. For production serving multiple users, store history in Redis keyed by session ID via RedisChatMessageHistory — the agent logic stays identical regardless of the backing store.

Custom Tools with Validation

Use BaseTool with a Pydantic args_schema for tools that interact with databases or external APIs. Pydantic validation catches malformed model-generated arguments before they reach your system. Always allowlist table names and field names in database tools — prompt injection via tool arguments is a genuine production risk when argument values derive from untrusted user input.

LangGraph: Graph-Based Workflows

LangGraph extends LangChain with a graph-based workflow engine for agents requiring explicit branching, parallel execution, or human-in-the-loop checkpoints. You define nodes (agent, tools, human review) and edges (conditions for moving between nodes), making the agent’s decision flow inspectable and controllable rather than implicit. This structured approach is significantly safer for production agents that take real-world actions — you can add a human approval gate before any irreversible operation, handle errors at specific nodes, and visualise the full execution graph for debugging.

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

workflow = StateGraph(AgentState)
workflow.add_node("agent", lambda state: {"messages": [llm.bind_tools(tools).invoke(state["messages"])]})
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", lambda s: "tools" if s["messages"][-1].tool_calls else END)
workflow.add_edge("tools", "agent")
app = workflow.compile()

RAG-Powered Agents

Combine agents with a retrieval tool via create_retriever_tool so they can query your document corpus when needed. Define two tools — an internal retriever for company-specific questions and web search for external information — and the agent routes intelligently between them based on the query. This pattern is the foundation of most enterprise knowledge assistant architectures.

Production Hardening Checklist

Set max_iterations=10 on AgentExecutor to prevent runaway loops. Add timeouts to tool calls so slow external APIs cannot block indefinitely. Use LangSmith for tracing — set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in your environment, and every LangChain call is automatically traced with full visibility into tool calls, token usage, and latency. Validate all tool arguments before execution, treating model-generated values with the same caution as user input. Test adversarial inputs — what happens when a user asks the agent to act outside its intended scope, or when a tool returns an unexpected error format? Run these scenarios in your test suite before every production deployment.

When to Use LangGraph vs. AgentExecutor

AgentExecutor is the right starting point for straightforward tool-calling agents. LangGraph is the right choice when you need explicit branching between agent behaviours, parallel execution paths, human-in-the-loop checkpoints before irreversible actions, or fine-grained error handling at specific workflow nodes. Start with AgentExecutor, and migrate to LangGraph when your agent logic becomes complex enough that the implicit decision loop is insufficient for the control and visibility your production use case requires.

Multi-Agent Systems with LangGraph

For complex tasks that benefit from specialisation, LangGraph supports multi-agent architectures where a supervisor agent delegates subtasks to specialist agents. A research supervisor might delegate to a search agent, a data analysis agent, and a writing agent — each with their own toolset and instructions — then combine their outputs into a final deliverable. This pattern scales to sophisticated workflows without making any single agent’s context window or tool list unmanageable. The supervisor controls the overall task flow; each specialist focuses on what it does best. LangGraph’s graph structure makes the delegation and result-combining logic explicit and testable, rather than embedded in a free-form agent prompt that is hard to reason about or debug.

Evaluating Agent Quality

Agent evaluation is harder than single-turn LLM evaluation because the output is a sequence of actions rather than a single response. Effective agent evaluation measures three things independently: task completion rate (did the agent achieve the goal?), tool call accuracy (did it call the right tools with correct arguments?), and efficiency (how many steps and tokens did it take?). Build a test suite of tasks with known correct outcomes and run it against your agent on every significant prompt or tool change. Track the agent’s success rate over time — a declining success rate after a prompt update or model version change is a leading indicator of quality regression before it reaches production users.

Why LangChain for Agents?

Core Concepts: Tools, Agents, and Executors

Adding Web Search

Memory: Stateful Conversations

Custom Tools with Input Validation

LangGraph: Complex Multi-Step Workflows

Streaming Agent Responses

RAG-Powered Agents

Production Hardening

When to Use LangGraph vs. AgentExecutor

Memory: Stateful Conversations

Custom Tools with Validation

LangGraph: Graph-Based Workflows

RAG-Powered Agents

Production Hardening Checklist

When to Use LangGraph vs. AgentExecutor

Multi-Agent Systems with LangGraph

Evaluating Agent Quality

Leave a Comment Cancel reply