The Case Against External Vector DBs for Agent Memory

Open the docs for almost any agent-memory framework and the quickstart starts the same way: install a vector database. Mem0 wants Qdrant or Pinecone. Self-hosted Zep, or rather, what survives of it, wants Neo4j, FalkorDB, or Kuzu. LangChain's memory examples reach for Chroma by default. The pattern is consistent enough that "agent memory means vector database" has become a kind of folk wisdom.
Most of the time, none of that is necessary.
This isn't an argument against vector databases. Pinecone, Weaviate, and Qdrant are well-engineered for the workload they were designed for: read-heavy similarity search over large, mostly-static corpora. That workload is usually called RAG. Agent memory is something else, and treating the two as the same problem is what produced the bad default in the first place.
The argument is narrower: for most agent-memory workloads, the operational complexity of running a second stateful service does not pay for itself. And what the tax buys — a fast similarity search — is the wrong layer to optimize. Most of the work that turns a stream of conversations into useful agent memory happens on the write side: fact extraction, entity resolution, contradiction handling, invalidation, and synthesis. A vector database sees none of that. It sees embeddings going in and embeddings coming out. The actual memory lives above it.
Agent Memory Is Not RAG
Start with the workload, because everything else follows from it.
| Axis | RAG | Agent memory |
|---|---|---|
| Read/write ratio | Read-heavy; corpus built once | Write-heavy; every conversation turn produces facts |
| Per-tenant footprint | Millions to billions of vectors | Hundreds of thousands to a few million |
| Retrieval mix | Mostly semantic similarity | Semantic + entity + temporal + graph |
| Latency budget | Hundreds of ms (search UI) | Sub-200ms (agent loop) |
RAG is read-heavy. A corpus is built once, embeddings are computed in batch, indexes are built, and the system spends the rest of its life serving similarity queries against a mostly-static index. Updates happen, but they are not the hot path. The hot path is "user types a question, return the most semantically similar chunks." Throughput on approximate nearest-neighbor search is the headline number.
Agent memory inverts most of that, and not just because writes happen "more often." Each write runs through a learning pipeline. Raw conversation turns are decomposed into atomic facts. Entities mentioned across those facts are resolved against existing entities: "the auth service," "our login system," and "the OAuth microservice" should land on one canonical node. New facts that contradict existing ones trigger contradiction resolution. Facts that have been superseded get invalidated. Periodically, a synthesis step reads accumulated facts and writes higher-order observations back. None of that exists in RAG ingestion, and none of it is what a vector database is built to do.
The retrieval pattern is different too. RAG queries are mostly one-shot semantic lookups. Agent memory queries are mixed. "What did the user decide about pricing last Tuesday?" is temporal and entity-anchored. "Show me everything related to the database migration thread" is graph-shaped. "Has the user mentioned any concerns about the new dashboard?" needs semantic search but also entity resolution to catch synonyms. None of those are pure ANN problems.
Per-tenant footprint is the third difference. A typical agent-memory corpus is a few hundred thousand to a few million facts per tenant. That is two to three orders of magnitude smaller than the RAG workloads that justify running Pinecone in production. The thing that makes a dedicated vector database worth its operational cost (billion-vector scale) usually isn't present.
For a longer treatment of how these two systems differ, the key architectural differences are worth a read.
The Hidden Tax of External Vector Databases
If agent memory had the same shape as RAG, the operational tax of a separate vector database would be a normal cost of doing business. It doesn't, so the tax mostly buys nothing.
Operational surface area
Running an external vector database means provisioning, scaling, securing, backing up, monitoring, and version-upgrading a second stateful service. For teams that already operate Pinecone or Weaviate at scale for other reasons, this is amortized. For teams that don't, the agent-memory project just doubled the database count.
The pattern is visible in how the major agent-memory frameworks handle self-hosting. Self-hosted Mem0 requires standing up Qdrant or Pinecone separately. Zep's Community Edition was deprecated, and self-hosting Graphiti now means standing up Neo4j, FalkorDB, or Kuzu on top of Graphiti's own services. The frameworks themselves are not the heavy lift. The database dependencies are.
Network hop and latency
Agent loops have tighter latency budgets than RAG pipelines. A memory retrieval call that takes 400ms feels fine in a search UI; it feels broken in an agent that's making four tool calls per turn. Every external service in the path adds a round trip and a serialization step. For low-volume agent-memory workloads, the network hop to a managed vector database is often a larger fraction of the latency budget than the actual query.
Cost at small scale
Managed vector databases have minimums. Pinecone's serverless and pod tiers both have pricing floors that don't scale linearly with the small datasets typical of agent memory. Weaviate Cloud, Zilliz, and Qdrant Cloud have similar structures. (Verify current pricing; the vendors revise it often.) For a workload that fits comfortably in a few gigabytes of Postgres, paying a managed-vector-DB floor is a tax on operational simplicity that buys nothing the workload actually needs.
The Learning Layer a Vector Database Doesn't Have
If you peel back the retrieval argument, the more fundamental problem is upstream of retrieval entirely.
Useful agent memory isn't a pile of conversation chunks. It's a structured representation that gets built and rebuilt as conversations accumulate. The architectural primitives that make that work are not vectors:
- Fact extraction. A model decomposes each conversation turn into atomic, retrievable claims, with provenance back to the originating message.
- Entity resolution. New mentions get linked to existing entities. The same customer, project, or service resolves to one canonical node regardless of how it was named.
- Contradiction handling. When a new fact conflicts with an existing one, the system has to decide what wins, what gets invalidated, and what gets preserved as history.
- Invalidation. Facts have lifespans. "The dashboard is in beta" should stop being retrieved once the dashboard ships.
- Reflection. Periodically, the system reads accumulated facts and writes synthesized observations back — patterns, summaries, recurring themes that aren't explicit in any single conversation but emerge from many.
A vector database has no opinion about any of this. It accepts an id and a vector. The structure, the relationships, the temporal validity, the dedupe logic — all of that has to live somewhere else. In practice, "somewhere else" usually means a second database (or a sprawl of application logic and caches), with the vector store reduced to one column of a larger schema you're already maintaining.
At that point, you have not removed a database from your stack. You have added one.
Putting the learning layer and the retrieval substrate in the same transactional database collapses the architecture. The fact extraction pipeline writes to the same Postgres that pgvector indexes. Entity resolution is a join. Contradiction resolution is a transaction. Reflect outputs are facts like any other, indexed alongside their inputs. The retrieval mix the agent actually needs falls out of the data model, instead of being assembled across services.
Single-Strategy Retrieval Is the Real Ceiling
A vector database does one thing well: approximate nearest-neighbor search over embeddings. Even if the operational arguments above didn't apply, a vector-only system would still be the wrong substrate for agent memory, because most agent-memory queries don't decompose to pure semantic similarity.
Consider the four retrieval strategies that Hindsight runs in parallel against every query: semantic search, entity-based retrieval, temporal filtering, and graph traversal. Three of those four are not ANN. Entity-based retrieval is structured lookup against an entity table. Temporal filtering is a range query over timestamps. Graph traversal is recursive walks over edges. A vector database can do the first. It has to delegate or duplicate the others.
That isn't a Hindsight-specific observation. Any sufficiently complete agent-memory system ends up with at least three of those four retrieval modes, because that's what the queries demand. Zep's temporal knowledge graph is the strongest example of someone who took graph and time seriously, and the trade-off is that Zep's graph database also stops being "a vector database problem" the moment you walk it. The architectural argument cuts in more than one direction.
The empirical result follows the architecture. On the LongMemEval benchmark, the standard evaluation for long-term conversational memory, vector-primary systems score lower. Mem0 publishes 49.0%. Multi-strategy systems score higher. Hindsight publishes 91.4% on the same benchmark. SuperMemory, which also runs multiple retrieval modes, publishes 81.6%. The accuracy gap is the visible artifact of the architectural one. For a deeper architectural comparison, how Hindsight compares to Mem0 walks through the same trade-offs in detail.
Vector Databases Are Optimized for the Wrong Workload
Vector index structures (HNSW, IVF, ScaNN) are read-optimized. They are expensive to build and expensive to update incrementally. Some implementations support delete-and-rebuild patterns; others require full re-indexing for non-trivial mutation. None of them love a workload that writes, updates, and invalidates as frequently as agent memory does.
Worse, the writes a vector index sees aren't even the substantive ones. The meaningful work — extraction, resolution, contradiction, reflect — happens upstream and produces many small derived records that the vector store only ever sees in their final embedded form. Most of the learning logic doesn't touch the vector index at all, which is the point: the vector store is one substrate in a system whose center of gravity is somewhere else.
PostgreSQL has spent thirty years getting good at transactional workloads. MVCC, WAL, point-in-time recovery, online schema changes, mature replication. None of that was built for vectors, but it was built for the access patterns (writes, updates, deletes, concurrent reads) that agent memory actually has. Adding vector search via pgvector inherits all of it.
At billion-vector scale, this reverses. A specialized vector database wins on raw throughput and index build times, and the operational overhead pays for itself. Most agent-memory workloads aren't at billion-vector scale.
pgvector Is Already Good Enough for Agent Memory
The historical objection to "just use Postgres" was performance: pgvector's early IVFFlat indexes weren't competitive at scale. That's no longer the binding constraint. pgvector supports HNSW indexes, which are competitive on the recall and latency curves that matter for agent-memory workloads. Postgres 16's parallel index builds and query parallelism close most of the remaining gap.
More importantly, putting vectors in Postgres means the entity tables, the temporal facts, the graph edges, and the application data live in one transactional database. Multi-strategy retrieval becomes a join, not a distributed query. Atomic writes across all four retrieval substrates are free. Backups are one pg_dump away.
In practice that looks like a single query:
-- Three retrieval strategies merged in one query against one database
WITH semantic AS (
SELECT fact_id, 1 - (embedding <=> $query_embedding) AS score
FROM facts
ORDER BY embedding <=> $query_embedding
LIMIT 50
),
temporal AS (
SELECT fact_id, 1.0 AS score
FROM facts
WHERE created_at BETWEEN $window_start AND $window_end
),
entity AS (
SELECT f.fact_id, 1.0 AS score
FROM facts f
JOIN fact_entities fe USING (fact_id)
WHERE fe.entity_id = ANY($resolved_entity_ids)
)
SELECT f.*,
COALESCE(s.score, 0) AS semantic_score,
COALESCE(t.score, 0) AS temporal_score,
COALESCE(e.score, 0) AS entity_score
FROM facts f
LEFT JOIN semantic s USING (fact_id)
LEFT JOIN temporal t USING (fact_id)
LEFT JOIN entity e USING (fact_id)
WHERE s.fact_id IS NOT NULL OR t.fact_id IS NOT NULL OR e.fact_id IS NOT NULL;
Graph traversal adds a recursive CTE over the edges table. Cross-encoder reranking runs in the application against the merged candidate set. The same database holds all of it, and all of it is one transaction.
The choice isn't "Postgres instead of a vector database." The choice is "Postgres because the retrieval pattern wants the join."
Do You Need a Vector Database for an AI Agent?
Not by default. Most agent-memory workloads are small enough to fit alongside the application data in Postgres with pgvector. External vector databases are built for RAG-scale corpora and read-heavy similarity search; agent memory is write-heavy, update-heavy, and benefits more from combined vector, graph, and temporal retrieval than from raw ANN throughput. There are real exceptions, covered next.
When an External Vector Database Still Makes Sense
There are three cases where the calculus genuinely reverses.
Multi-tenant SaaS at very large vector counts. If you're storing tens of millions of vectors per tenant across hundreds of tenants, pgvector index build times and shared-resource contention become real problems. Pinecone's serverless indexing and Weaviate's tenant isolation are worth the operational cost.
Shared infrastructure between agent memory and a real RAG corpus. If your team already runs a vector database for product RAG over millions of documents, layering agent memory onto the same infrastructure can be more efficient than standing up a parallel Postgres. The marginal cost of the vector DB is already paid.
Read-throughput-dominated workloads. Some agent-memory deployments (long-tail recall over enterprise knowledge bases, multi-region read replicas) look more like RAG than like agent memory. Specialized vector databases are designed for that shape.
None of these are the default. They are the cases where the workload has crossed back over into vector-database-shaped territory.
A Better Default for Agent Memory
The default that fits most agent-memory workloads:
- A learning pipeline that writes facts, entities, edges, and synthesized observations into one Postgres — with pgvector for embeddings, graph tables for edges, and temporal indexes for fact lifespans
- Multi-strategy retrieval implemented as parallel queries against that same database, merged in the application layer
- Cross-encoder reranking on the merged results, run in the application
- Deployed as one container, backed up like any other Postgres
This is the architecture Hindsight ships, and it isn't a coincidence. The architecture was chosen because the workload demands it, and the workload demands it because agent memory is not RAG. Self-hosting is one Docker command. Storage is embedded PostgreSQL. There is no external vector DB to provision and no graph database to operate alongside it. If how Hindsight compares to Zep is the right next question (graph-first versus multi-strategy on Postgres), that's the comparison to read.
Conclusion
The right question isn't "which vector database should I run for agent memory?" It's two questions. What does the learning pipeline upstream of retrieval actually do? And what does my retrieval mix actually look like? For most teams, the answers are: fact extraction, entity resolution, contradiction handling, and reflect; and a mix that's mostly not semantic similarity, mostly not billion-scale, mostly write-heavy, and mostly small enough to fit alongside the application data. A single Postgres handles both layers better than a separate vector database does, and at lower operational cost.
External vector databases remain excellent at what they were built for. Agent memory just isn't usually it — and the part that least resembles RAG is the learning layer, not the search.
Further reading:
- What Is Agent Memory? for the foundational concepts
- Agent Memory vs RAG on the key architectural differences
- Best AI Agent Memory Systems in 2026 for the full landscape
- Hindsight vs Mem0 on accuracy and architecture
