Pipecat Voice AI Persistent Memory: Add Memory to Your Voice Pipeline

If you build voice AI pipelines with Pipecat, you know it handles real-time speech processing, LLM integration, and streaming synthesis beautifully. But there's one critical thing it doesn't do: remember anything between calls. Every voice conversation starts fresh. Your voice agent has no idea what the user said in yesterday's call, what their preferences are, or what it already researched.
Adding Pipecat persistent memory doesn't require building a custom RAG system or managing separate vector databases. With the hindsight-pipecat integration, you can wire long-term memory into any voice pipeline with a single FrameProcessor that recalls context before responding and retains conversation content after each turn.
TL;DR
- Pipecat voice pipelines have no built-in memory; each conversation starts from scratch.
hindsight-pipecatadds automatic recall (before LLM) and retain (after each turn) without code changes to the pipeline.- One-line setup: add
HindsightMemoryServicebetween the user aggregator and LLM service. - On each turn, Hindsight recalls relevant past conversations and injects them as system messages.
- Works with any Pipecat transport (Daily, Deepgram, OpenAI, Cartesia, etc.) and any LLM provider.
The Problem: Pipecat Voice Agents Have No Persistent Memory
Pipecat is a solid framework for building voice AI agents. Real-time speech-to-text, streaming text-to-speech, LLM integration, and a clean frame-based pipeline architecture make it a popular choice for voice applications. But it ships with no memory layer.
Every voice conversation starts from zero. The agent doesn't know what the user said last week. It doesn't know their preferences, their history, or what context might be relevant to the current call. As a result, voice agents that interact with users over multiple sessions must re-establish all context in every conversation.
You could pass conversation history to the LLM context. But that's just chat history—it doesn't consolidate repeated information, doesn't build a knowledge graph of entities and relationships, and grows linearly until it exceeds your context window.
Real voice agent memory is different:
- Extracting structured facts from conversations (names, preferences, past requests)
- Building relationships between entities (users, topics, decisions)
- Retrieving relevant context from days, weeks, or months of past conversations
- Synthesizing coherent answers from scattered memory across multiple calls
That's what Hindsight provides. And with hindsight-pipecat, you get this without rebuilding your pipeline.
How Pipecat Persistent Memory Works
Hindsight is a memory engine that runs locally or in the cloud. It doesn't just store raw conversation text. Instead, it extracts structured entities and relationships, builds a knowledge graph, and indexes everything for multi-strategy retrieval. Semantic search, BM25 keyword matching, graph traversal, and temporal ranking all work together to surface the most relevant memories.
The hindsight-pipecat integration connects this engine to your Pipecat pipeline through a FrameProcessor that sits between your user context aggregator and LLM service.
Voice Input
↓ (STT)
User Aggregator
↓ (OpenAILLMContextFrame arrives)
HindsightMemoryService ← Memory goes here
├─ Retain previous turn (async, non-blocking)
└─ Recall relevant past conversations
↓ (inject as system message)
LLM Service
↓ (generates response with memory context)
Response ↓
Assistant Aggregator
↓ (TTS)
Voice Output
Here's what happens on each turn:
- New turn starts. A user message comes in and the user aggregator creates an
OpenAILLMContextFrame. - Recall. The HindsightMemoryService extracts the latest user message and searches Hindsight for relevant past conversations. Results are injected as a system message before the LLM sees the context.
- Retain. Any new complete user+assistant exchange is sent to Hindsight asynchronously (non-blocking), so the pipeline stays responsive.
- Forward. The enriched context frame is passed downstream to the LLM service, which now has access to relevant memories.
Memory accumulates across calls. By the third or fourth conversation, recall starts surfacing useful context that the pipeline didn't have to re-establish in the current call.
Setting Up Pipecat Persistent Memory
The setup process is minimal. Install the integration, choose your Hindsight deployment, and add one line to your pipeline.
Step 1: Install the Pipecat Integration
pip install hindsight-pipecat
Step 2: Choose Your Hindsight Deployment
Pick either Hindsight Cloud (recommended, no self-hosting) or Local (run your own daemon).
Option 2a: Hindsight Cloud (Recommended)
Sign up free for Hindsight Cloud — managed infrastructure, no daemon to run, memory syncs across your devices.
Option 2b: Local Hindsight
Run Hindsight locally:
pip install hindsight-all
export HINDSIGHT_API_LLM_API_KEY=YOUR_OPENAI_KEY
hindsight-api
This runs at http://localhost:8888 with embedded Postgres, local embeddings, and local reranking.
Step 3: Add Memory to Your Pipeline
Create a Hindsight memory service and add it to your pipeline:
For Hindsight Cloud:
from pipecat.pipeline.pipeline import Pipeline
from hindsight_pipecat import HindsightMemoryService
memory = HindsightMemoryService(
bank_id="user-123",
hindsight_api_url="https://api.hindsight.vectorize.io",
api_key="hsk_your_token_here",
)
pipeline = Pipeline([
transport.input(),
stt_service,
user_aggregator,
memory, # ← add between user_aggregator and LLM
llm_service,
assistant_aggregator,
tts_service,
transport.output(),
])
For Local Hindsight:
memory = HindsightMemoryService(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)
The memory service now:
- Recalls relevant past conversations before every LLM call
- Automatically retains new exchanges after each turn
- Runs asynchronously so your pipeline stays responsive
Configuration and Tuning
The HindsightMemoryService accepts several parameters to control memory behavior:
HindsightMemoryService(
bank_id="user-123", # Required: memory bank to use
hindsight_api_url="...", # Hindsight API URL
api_key="hsk_...", # API key (Hindsight Cloud)
recall_budget="mid", # "low", "mid", or "high"
recall_max_tokens=4096, # Max tokens for recall results
enable_recall=True, # Inject memories before LLM
enable_retain=True, # Store turns after each exchange
memory_prefix="Relevant memories from past conversations:\n",
)
Recall Budget
The recall_budget parameter controls how much memory data gets injected before the LLM:
- "low": Fast recalls, fewer results. Good for latency-sensitive applications or tight context windows.
- "mid" (default): Balanced trade-off. 50-200ms latency, moderate result count.
- "high": Exhaustive recalls. Maximum results, highest latency. Good for complex conversations where context matters most.
For real-time voice, "low" or "mid" are usually best. You want memory recall to happen in 100-300ms so the voice agent responds quickly.
Selective Enable/Disable
You can disable recall or retain individually:
# Recall-only: search past conversations but don't store new ones
memory = HindsightMemoryService(
bank_id="user-123",
enable_retain=False,
enable_recall=True,
)
# Retain-only: store conversations but don't recall them
memory = HindsightMemoryService(
bank_id="user-123",
enable_retain=True,
enable_recall=False,
)
This is useful for multi-agent architectures where one pipeline gathers information while another agent answers questions using only recalled memory.
Global Configuration
If you have multiple pipelines sharing the same Hindsight instance, use global configuration:
from hindsight_pipecat import configure
configure(
hindsight_api_url="http://localhost:8888",
api_key="hsk_...",
recall_budget="mid",
)
# Now create services without repeating connection details
memory = HindsightMemoryService(bank_id="user-123")
Real-World Example: Customer Support Voice Agent
Here's a complete example of a customer support voice agent with persistent memory:
import os
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.daily import DailyTransport
from pipecat.pipeline.pipeline import Pipeline
from hindsight_pipecat import HindsightMemoryService
# Setup Hindsight memory with customer ID as bank_id
customer_id = "customer-abc-123"
memory = HindsightMemoryService(
bank_id=customer_id,
hindsight_api_url="https://api.hindsight.vectorize.io",
api_key=os.getenv("HINDSIGHT_API_KEY"),
recall_budget="mid",
)
# Setup services
llm = OpenAILLMService(model="gpt-4o-mini")
stt = DeepgramSTTService(model="nova-2")
tts = CartesiaTTSService(model="sonic-english")
transport = DailyTransport(room_url="https://...", token="...")
# Build pipeline with memory
pipeline = Pipeline([
transport.input(),
stt,
transport.user_aggregator,
memory, # ← Memory service here
llm,
transport.assistant_aggregator,
tts,
transport.output(),
])
On the first call, the agent knows nothing about the customer. By the third or fourth call, Hindsight has extracted:
- Customer name, account status, past issues
- Preferences for communication style
- Technical details from previous troubleshooting
- Resolution history (what worked, what didn't)
Each new call starts with relevant context automatically injected, so the agent can pick up exactly where it left off and respond more intelligently.
Use Cases for Voice AI with Persistent Memory
Customer Support
Support agents remember customer history, previous issues, and what solutions worked. Faster resolution, fewer repetitive questions, better customer experience.
Personal Assistant
Assistants remember user preferences, calendar, habits, and context. Multi-session conversations feel natural instead of starting from scratch each time.
Sales and Booking
Agents remember customer needs, budget, preferences, and past interactions. More targeted recommendations, better close rates, less context-building.
Healthcare and Wellness
Voice health assistants remember patient history, medications, symptoms from past conversations. Continuity of care without manual note review.
Accessibility
Voice-first interfaces for users with visual impairments benefit enormously from persistent context. The agent remembers where you were in a process, what you were doing last time, and can help you pick up without repetition.
Multi-Session Research
Voice agents that gather research across multiple conversations can consolidate findings, spot patterns, and synthesize insights from accumulated data.
Advanced Patterns
Multi-User Pipelines
If a single pipeline handles multiple users (e.g., a shared bot), use the user ID in the bank_id:
user_id = extract_user_from_request()
memory = HindsightMemoryService(
bank_id=f"user-{user_id}",
hindsight_api_url="https://api.hindsight.vectorize.io",
api_key=os.getenv("HINDSIGHT_API_KEY"),
)
Each user gets isolated memory. No cross-contamination.
Scoped Memory Banks
For complex applications, use separate banks for different scopes:
# Support agent per customer
support_memory = HindsightMemoryService(bank_id=f"support-{customer_id}")
# Research agent per project
research_memory = HindsightMemoryService(bank_id=f"research-{project_id}")
# Agent-specific context
agent_memory = HindsightMemoryService(bank_id=f"agent-{agent_role}")
Isolation ensures memory doesn't leak between contexts.
Testing and Development
For testing, use a temporary bank and disable retention:
memory = HindsightMemoryService(
bank_id="test-session",
enable_retain=False, # Don't pollute persistent memory
enable_recall=True, # But do test recall
)
Or switch to a local test Hindsight instance that's separate from production.
Edge Cases and Troubleshooting
Latency
Recall lookups add 50-300ms depending on recall_budget and memory size. For voice applications where response time matters, use "low" budget and monitor latency.
memory = HindsightMemoryService(
bank_id="user-123",
recall_budget="low",
recall_max_tokens=2048, # Limit tokens to reduce latency
)
Privacy and Data Retention
If you need to delete all memories for a user (GDPR, user request), use the Hindsight Cloud dashboard or API to clear the bank:
# Via API
curl -X DELETE https://api.hindsight.vectorize.io/banks/user-123 \
-H "Authorization: Bearer hsk_..."
Async Frame Handling
The HindsightMemoryService processes frames asynchronously. If you need synchronous behavior (rare), you must wait for the retain operation. In practice, async is better—it keeps your pipeline responsive.
Pipecat Memory: Tradeoffs and Alternatives
When Pipecat Persistent Memory Makes Sense
Pipecat persistent memory works best for voice agents that interact with users across multiple sessions over days, weeks, or months. Good use cases include customer support bots, personal assistants, accessibility interfaces, and any voice agent where remembering past interactions improves quality.
When Not to Use It
Skip persistent memory for stateless APIs or one-shot voice transactions where context doesn't compound over time. Also skip if your pipeline is purely transactional (e.g., order confirmation calls) where no learning is needed.
How It Compares to Alternatives
| Approach | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Hindsight + Pipecat | Multi-strategy retrieval (semantic + BM25 + graph + temporal), structured entity extraction, low-latency recall | Requires Hindsight server or cloud account | Voice agents with multi-session context needs |
| Manual LLM context | Simple, no dependencies, full control | Grows linearly with conversation length, no entity extraction | Short single-session voice calls |
| Custom vector store | Full control over retrieval | You build chunking, indexing, and reranking yourself | Teams with existing vector infrastructure |
| Transcript storage only | Simple audit trail | No structured retrieval, no synthesis | Compliance-focused voice recording |
For most voice pipelines that need persistent memory across sessions with minimal changes, the Hindsight integration is the fastest path to production.
Recap: Pipecat Persistent Memory Integration
Adding persistent memory to Pipecat voice pipelines doesn't require rebuilding your architecture. The hindsight-pipecat integration provides a single FrameProcessor that:
- Recalls relevant past conversations before each LLM call
- Retains new exchanges asynchronously after each turn
- Stays responsive with configurable recall budgets for latency-sensitive voice applications
Just drop it into your pipeline between the user aggregator and LLM service. Memory accumulates over calls and compounds in value as your voice agent learns more about each user.
For Python developers building Pipecat voice agents that need persistent memory, this is the simplest path from stateless to stateful.
Next Steps
- Try it locally:
pip install hindsight-all hindsight-pipecatand test with the interactive chat example - Use Hindsight Cloud: Skip self-hosting with a free account
- Run the basic example:
python examples/basic_pipeline.pyto see the full voice pipeline with memory - Test interactive recall:
python examples/interactive_chat.py --bank demo-userto see recall in action - Configure recall budget: Tune "low", "mid", or "high" based on your latency requirements
- Inspect memories: Use the Hindsight Cloud dashboard to browse extracted facts and entities from conversations
- Explore other integrations: Add memory to Pydantic AI agents, CrewAI agents, or any framework via MCP
- Read the Pipecat docs: Learn more about Pipecat pipelines and services to extend your voice agent further
