Frequently Asked Questions

What is Hindsight and how does it differ from RAG?

Hindsight is an agent memory system that provides long-term memory for AI agents using biomimetic data structures. Unlike traditional RAG (Retrieval-Augmented Generation), Hindsight:

Stores structured facts instead of raw document chunks
Builds mental models that consolidate knowledge over time
Uses graph-based relationships between entities and concepts
Supports temporal reasoning with time-aware retrieval
Enables disposition-aware reflection for nuanced reasoning

For a detailed comparison, see RAG vs Memory.

Why use Hindsight instead of other solutions?

Hindsight is purpose-built for agent memory with unique advantages:

State-of-the-art accuracy: Ranked #1 LongMemEval benchmarks for agent memory (see details)
Built on proven technology: PostgreSQL - battle-tested, reliable, and widely understood
Cloud-native architecture: Designed for modern cloud deployments with horizontal scalability
Flexible deployment: Self-host or use Hindsight Cloud - works with any LLM provider
True long-term memory: Builds mental models that consolidate knowledge over time, not just retrieval
Graph-based reasoning: Understands relationships between entities and concepts for richer context
Production-ready: Scales to millions of memories with 50-500ms recall latency
Developer-friendly: Simple APIs (retain, recall, reflect), SDKs for Python/TypeScript/Go/Rust, integrations with LiteLLM/Vercel AI SDK

Unlike vector databases (just search) or RAG systems (document retrieval), Hindsight provides living memory that evolves with your users.

Which LLM providers are supported?

Hindsight supports:

OpenAI
Anthropic
Google Gemini
Groq
Ollama (local models)
LM Studio (local models)
Any OpenAI-compatible provider (Together AI, Fireworks, DeepInfra, etc.)
Any Anthropic-compatible provider

Using local models with Ollama:

HINDSIGHT_API_LLM_PROVIDER=ollama
HINDSIGHT_API_LLM_MODEL=llama3.1
HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434

Using local models with LM Studio:

HINDSIGHT_API_LLM_PROVIDER=lmstudio
HINDSIGHT_API_LLM_MODEL=your-model-name
HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1

Configure your provider using the HINDSIGHT_API_LLM_PROVIDER environment variable. See Configuration and Models for details.

Do I need to host my own infrastructure?

No! You have two options:

Hindsight Cloud - Fully managed service at ui.hindsight.vectorize.io
Self-hosted - Deploy on your own infrastructure using Docker or direct installation

See Installation for self-hosting instructions.

What are the minimum system requirements for self-hosting?

For running the Hindsight API server locally:

Python 3.11+
4GB RAM minimum (8GB recommended for production)
LLM API key (OpenAI, Anthropic, etc.) or local LLM setup

See Installation for setup instructions.

How do I isolate user data?

A memory bank is an isolated memory store (like a "brain") that contains its own memories, entities, relationships, and optional disposition traits (skepticism, literalism, empathy). Banks are completely isolated from each other with no data leakage.

There are two approaches for multi-user applications:

1. Per-user memory banks (recommended for most use cases)

Create one bank per user (e.g., bank_id="user-123")
Easiest setup and strongest data isolation
Perfect for per-user queries and personalization
Each bank can have unique disposition traits and background context
Limitation: Cannot perform cross-user analysis (e.g., "What is the most mentioned topic across all users?")

2. Single bank with tags (for applications needing aggregated insights)

Use one bank for the entire application
Tag memories with user identifiers during retain (e.g., tags={"user_id": "user-123"})
Filter by tags during recall/reflect for per-user queries
Advantage: Enables both per-user AND cross-user queries (e.g., analyze specific users or aggregate across all users)

Choose per-user banks for simplicity and privacy, or single bank with tags if you need holistic reasoning across users. See Memory Banks for management details.

What's the difference between retain, recall, and reflect?

Hindsight has three core operations:

Retain: Store data (facts, entities, relationships)
Recall: Search and retrieve raw memory data based on a query
Reflect: Use an AI agent to answer a query using retrieved memories

See Operations for API details.

When should I use recall vs reflect?

Use recall when:

You want raw facts to feed into your own reasoning or prompt
You need maximum control over how memories are interpreted
You're doing simple fact lookup (e.g., "What did Alice say about X?")
Latency is critical — recall is significantly faster (50-500ms vs 1-10s)
You want to build your own answer synthesis layer on top of retrieved memories

Use reflect when:

You want a ready-to-use answer generated from memories (no extra LLM call needed)
You need disposition-aware responses shaped by the bank's personality traits (skepticism, literalism, empathy)
The query requires multi-step reasoning across facts, observations, and mental models
You need structured output (via response_schema) from memory-grounded reasoning
You want citations — reflect returns which memories, mental models, and directives informed the answer

Key difference: Recall returns data; reflect returns an answer. Recall gives you raw materials, reflect does the reasoning for you using the bank's disposition and an autonomous search loop.

recall("What food does Alice like?")
→ ["Alice loves sushi", "Alice prefers vegetarian options"]   # raw facts

reflect("What should I order for Alice?")
→ "I'd recommend a vegetarian sushi platter — Alice loves sushi and prefers vegetarian options."  # grounded answer

See Recall and Reflect for full API details.

When should I use mental models?

Mental models are consolidated knowledge patterns synthesized from individual facts over time. Use them when you need:

Higher-level understanding beyond raw facts (e.g., "User prefers functional programming patterns")
Long-term behavioral patterns (e.g., "Customer is price-sensitive but values quality")
Context for AI agent reasoning during reflect operations

Mental models are automatically built during retain and used by reflect to provide richer, more contextual responses. See Mental Models.

What's the typical latency for recall operations?

Typical latencies:

Without reranking: 50-100ms
With reranking: 200-500ms (depends on reranker model and installation)

See Performance for tuning options.

Still have questions?

Join our Slack community or report issues on GitHub.

What is Hindsight and how does it differ from RAG?​

Why use Hindsight instead of other solutions?​

Which LLM providers are supported?​

Do I need to host my own infrastructure?​

What are the minimum system requirements for self-hosting?​

How do I isolate user data?​

What's the difference between retain, recall, and reflect?​

When should I use recall vs reflect?​

When should I use mental models?​

What's the typical latency for recall operations?​

Still have questions?​