Large Language Models (LLMs) are brilliant at language generation and reasoning, but they still need access to external knowledge for reliable, up-to-date answers. A vector database lets you store text (or any other media) as embeddings—high-dimensional numeric vectors—so you can retrieve semantically related content.
pgvector brings first-class vector search directly into PostgreSQL. LangChain, meanwhile, has built-in abstractions for talking to vector stores. When you combine LangChain and pgvector, you keep all the power of Postgres (ACID compliance, SQL joins, rich indexing) while unlocking state-of-the-art retrieval-augmented generation (RAG).
Why Choose pgvector over a Stand-Alone Vector DB?
Feature | pgvector in Postgres | Stand-alone Vector DB |
---|---|---|
Deployment | Re-use existing Postgres cluster; fewer moving parts | New system and ops overhead |
Transactions | ACID and MVCC out of the box | Often eventual consistency |
Joins | SQL joins between vectors and relational data | Limited or via “join-like” APIs |
Ecosystem | Works with every Postgres tool (psql, pgAdmin, Airflow) | Custom tooling |
Cost | No extra license if Postgres already paid for | Additional instance costs |
If you already trust PostgreSQL in production, pgvector lets you adopt an LLM-ready vector search with one extension. LangChain then becomes the glue between your LLM agent and Postgres.
Prerequisites
- Python 3.9 or newer
- PostgreSQL 14 or newer (with superuser access to install extensions)
- pgvector 0.5.1 or later (
CREATE EXTENSION pgvector
) - OpenAI, Cohere, or any embedding model accessible via API or local pipeline
- LangChain ≥ 0.1.0 (
pip install langchain
) - Optional: SQLAlchemy for connection pooling
Tip On macOS you can
brew install pgvector
andbrew services start postgresql@15
.
Step 1 Install and Configure pgvector
- Enable the extension
CREATE EXTENSION IF NOT EXISTS pgvector;
- Create a dedicated table
CREATE TABLE documents ( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(1536) -- length matches your embedding size );
- Add an approximate index for fast K-NN
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- tune lists for recall vs. speed
Step 2 Generate and Store Embeddings
2.1 Python Snippet
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
import psycopg2
import os, json
conn = psycopg2.connect(os.getenv("PG_DSN"))
embed = OpenAIEmbeddings()
texts = [
"LangChain lets you orchestrate LLM calls with memory and tooling.",
"pgvector adds vector similarity to Postgres databases.",
"Retrieval-augmented generation combines search with generation."
]
with conn, conn.cursor() as cur:
for txt in texts:
vec = embed.embed_query(txt) # returns list[float]
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(txt, vec)
)
Security Note Store your OpenAI key in
OPENAI_API_KEY
and read viaos.getenv
.
Step 3 Integrate pgvector with LangChain’s VectorStore API
LangChain provides PGVector
—a drop-in implementation of VectorStore
. Under the hood it uses SQLAlchemy.
from langchain.vectorstores.pgvector import PGVector
store = PGVector(
connection_string=os.getenv("PG_DSN"),
embedding_function=embed,
collection_name="documents", # optional logical namespace
distance_strategy="cosine", # cosine / euclidean / inner
)
Now you can run similarity search:
query = "How do LangChain and pgvector work together?"
results = store.similarity_search(query, k=2)
for doc in results:
print(doc.page_content)
LangChain automatically:
- Generates an embedding for
query
. - Executes a K-NN query:
SELECT * FROM documents ORDER BY embedding <=> :query_vec LIMIT :k;
- Wraps rows in
Document
objects.
Step 4 Build a Retrieval-Augmented Generation Chain
A basic RAG chain in LangChain has three parts:
- Retriever (vector search)
- LLM (e.g., GPT-4)
- Prompt template (inject retrieved context)
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
template = """
Answer the question based only on the context below.
If you don't know the answer, say "I’m not sure."
Context:
{context}
Question:
{question}
Answer:"""
prompt = PromptTemplate(
template=template,
input_variables=["context", "question"]
)
rag = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
chain_type="stuff",
retriever=store.as_retriever(),
chain_type_kwargs={"prompt": prompt},
verbose=True
)
print(rag.run("Why is pgvector useful with LangChain?"))
The LLM sees semantically similar chunks pulled from Postgres and answers grounded in that data—dramatically improving factual correctness and reducing hallucinations.
Performance Tuning with pgvector
Index Parameters
Parameter | Effect | Recommendation |
---|---|---|
lists | # of IVF lists | Start with √N; larger is higher recall but slower writes |
probes | Runtime probe | Larger probes give better recall at slower reads |
dimensions | Must match embedding size | 1536 for OpenAI text-embedding-3-small |
Set probes per session:
SET ivfflat.probes = 10;
Batch Inserts
psycopg2.extras.execute_values
lets you bulk-load embeddings for 10–20× throughput.
Storage
Use the FLOAT4
(4-byte) variant for 2× smaller disk at minimal recall loss:
ALTER TABLE documents
ALTER COLUMN embedding TYPE vector(1536) USING embedding::vector;
Hybrid Filtering
Need metadata filters? Combine SQL WHERE
with vector search:
SELECT * FROM documents
WHERE content ILIKE '%postgres%'
ORDER BY embedding <=> :qvec
LIMIT 5;
Advanced Workflows
1 Hybrid RAG with Metadata Ranking
- Pre-filter by
tags @> ARRAY['tutorial']
- Then vector sort
- Feed into LangChain’s
MultiQueryRetriever
for multi-perspective queries.
2 Streaming Updates from Kafka
Use Debezium or WAL2JSON to listen for new embeddings and update pgvector indexes in near-real-time—ideal for news feeds.
3 Fine-Grained Access Control
Store user IDs alongside vectors; add Row-Level Security (RLS) so users only retrieve their own data.
CREATE POLICY user_rls
ON documents
USING (owner_id = current_setting('app.user_id')::INT);
Then set SET app.user_id = 42;
per connection.
Comparing pgvector to Pinecone and Weaviate
Feature | pgvector | Pinecone | Weaviate |
---|---|---|---|
Managed Service | DIY or cloud PG | Yes | Yes |
Hybrid SQL joins | Native | No | Limited (GraphQL) |
ACID transactions | Yes | No | No |
Horizontal scaling | Partitioning | Automatic | Automatic |
Learning curve | Minimal for PG | New API | New API |
If you rely heavily on relational data or already operate Postgres, sticking to pgvector minimises cognitive overhead and data duplication.
Migration Tips for Existing Postgres Users
- Upgrade to Postgres 14+ for parallel query improvements.
- Enable
pg_trgm
for trigram search; combine lexical + vector ranking. - Sharding: Use Citus or pg-bouncer to scale reads across replicas.
Monitoring and Observability
- Index bloat:
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
- Query latency:
EXPLAIN ANALYZE
shows distance computation cost. - pgvector stats: check
pgvector_stats
extension for list usage.
Grafana dashboards can track:
- Average
ivfflat
probes - Cache hit ratio
- Slow queries >200 ms
Common Pitfalls and How to Avoid Them
Pitfall | Symptom | Solution |
---|---|---|
Wrong vector size | ERROR: vector must be length … | Ensure embedding length matches column |
High write latency | Bulk insert stalls | Disable index, bulk load, then rebuild |
Poor recall | Irrelevant results | Raise probes or increase lists |
“Out Of Memory” | Large ANN index | Use FLOAT4 vectors or partition table |
Future Roadmap for LangChain × pgvector
- Auto-chunking: LangChain soon auto-splits documents based on token counts and stores chunk metadata in Postgres.
- Vector-stored function calls: Explore Postgres 16’s
sql
/plpython
functions for on-DB embedding generation. - Streaming RAG: Use Postgres logical replication plus LangChain callbacks for near-real-time update of retrieval indexes.
Conclusion
Combining LangChain and pgvector gives you a battle-tested, SQL-native vector store with the ergonomic developer experience of LangChain. You keep all the strengths of Postgres—transactions, joins, security—while unlocking modern RAG patterns that bring external knowledge into your LLM applications.
Key takeaways
- Install pgvector and LangChain in minutes with
pip install pgvector langchain
. - Store embeddings in a
VECTOR
column and index withivfflat
. - Use LangChain’s
PGVector
wrapper to integrate Postgres directly as a retriever. - Tune
lists
andprobes
for the speed/recall sweet spot. - Layer on RLS, monitoring, and batch pipelines for production readiness.
With these tools in hand, you can deliver accurate, context-aware AI experiences without migrating data to yet another storage layer. LangChain and pgvector—a perfect match for full-stack AI builders.