🤖

Using a coding agent? Run this to install the Hindsight docs skill:

npx skills add https://github.com/vectorize-io/hindsight --skill hindsight-docs

Pipecat

Persistent long-term memory for Pipecat voice AI pipelines via Hindsight. A single FrameProcessor slots between your user context aggregator and LLM service — recalling relevant memories before each turn and retaining conversation content after.

Quick Start

# 1. Start Hindsight (self-hosted)
pip install hindsight-all
export HINDSIGHT_API_LLM_API_KEY=your-openai-key
hindsight-api

# 2. Install the integration
pip install hindsight-pipecat

from pipecat.pipeline.pipeline import Pipeline
from hindsight_pipecat import HindsightMemoryService

memory = HindsightMemoryService(
    bank_id="user-123",
    hindsight_api_url="http://localhost:8888",
)

pipeline = Pipeline([
    transport.input(),
    stt_service,
    user_aggregator,
    memory,           # ← add between user_aggregator and LLM
    llm_service,
    assistant_aggregator,
    tts_service,
    transport.output(),
])

Or with Hindsight Cloud:

memory = HindsightMemoryService(
    bank_id="user-123",
    hindsight_api_url="https://api.hindsight.vectorize.io",
    api_key="hsk_your_token_here",
)

How It Works

New turn starts
  └─ OpenAILLMContextFrame arrives
       ├─ Retain previous complete turn (user+assistant) — fire-and-forget
       └─ Recall relevant memories for current user query
            └─ Inject as <hindsight_memories> system message
                 └─ Forward enriched context to LLM

On each OpenAILLMContextFrame:

Retain — any new complete user+assistant turn pairs are sent to Hindsight asynchronously (non-blocking)
Recall — the latest user message is used as the search query; results are injected as a system message before the LLM sees the context
Forward — the enriched context frame is pushed downstream

Memory accumulates across calls. By the third or fourth turn, recall starts surfacing useful context that the pipeline didn't have to re-establish.

Configuration

HindsightMemoryService(
    bank_id="user-123",              # Required: memory bank to use
    hindsight_api_url="...",         # Hindsight API URL
    api_key="hsk_...",               # API key (Hindsight Cloud)
    recall_budget="mid",             # "low", "mid", or "high"
    recall_max_tokens=4096,          # Max tokens for recall results
    enable_recall=True,              # Inject memories before LLM
    enable_retain=True,              # Store turns after each exchange
    memory_prefix="Relevant memories from past conversations:\n",
)

Global Configuration

from hindsight_pipecat import configure

configure(
    hindsight_api_url="http://localhost:8888",
    api_key="hsk_...",
    recall_budget="mid",
)

# Now create services without repeating connection details
memory = HindsightMemoryService(bank_id="user-123")

Compatibility

Tested with Pipecat v0.0.108. The processor handles both the new LLMContextFrame and the deprecated OpenAILLMContextFrame for forward compatibility.

Manual Testing

The examples/ directory includes an interactive text-based chat simulator for testing memory recall/retain without requiring Daily/Deepgram/Cartesia API keys:

python examples/interactive_chat.py --bank demo-user

The examples/basic_pipeline.py shows the full voice pipeline with Daily + Deepgram + OpenAI + Cartesia.

Prerequisites

A running Hindsight instance:

Self-hosted:

pip install hindsight-all
export HINDSIGHT_API_LLM_API_KEY=your-api-key
hindsight-api  # starts on http://localhost:8888

Hindsight Cloud: Sign up — no self-hosting required.

Quick Start​

How It Works​

Configuration​

Global Configuration​

Compatibility​

Manual Testing​

Prerequisites​