Skip to main content

Configuration

Complete reference for configuring Hindsight services through environment variables.

Hindsight has two services, each with its own configuration prefix:

ServicePrefixDescription
API ServiceHINDSIGHT_API_*Core memory engine
Control PlaneHINDSIGHT_CP_*Web UI

API Service

The API service handles all memory operations (retain, recall, reflect).

Database

VariableDescriptionDefault
HINDSIGHT_API_DATABASE_URLPostgreSQL connection stringpg0 (embedded)
HINDSIGHT_API_RUN_MIGRATIONS_ON_STARTUPRun database migrations on API startuptrue

If not provided, the server uses embedded pg0 — convenient for development but not recommended for production.

Database Connection Pool

VariableDescriptionDefault
HINDSIGHT_API_DB_POOL_MIN_SIZEMinimum connections in the pool5
HINDSIGHT_API_DB_POOL_MAX_SIZEMaximum connections in the pool100
HINDSIGHT_API_DB_COMMAND_TIMEOUTPostgreSQL command timeout in seconds60
HINDSIGHT_API_DB_ACQUIRE_TIMEOUTConnection acquisition timeout in seconds30

For high-concurrency workloads, increase DB_POOL_MAX_SIZE. Each concurrent recall/think operation can use 2-4 connections.

To run migrations manually (e.g., before starting the API), use the admin CLI:

hindsight-admin run-db-migration
# Or for a specific schema:
hindsight-admin run-db-migration --schema tenant_acme

LLM Provider

VariableDescriptionDefault
HINDSIGHT_API_LLM_PROVIDERProvider: openai, anthropic, gemini, groq, ollama, lmstudioopenai
HINDSIGHT_API_LLM_API_KEYAPI key for LLM provider-
HINDSIGHT_API_LLM_MODELModel namegpt-5-mini
HINDSIGHT_API_LLM_BASE_URLCustom LLM endpointProvider default
HINDSIGHT_API_LLM_MAX_CONCURRENTMax concurrent LLM requests32
HINDSIGHT_API_LLM_TIMEOUTLLM request timeout in seconds120
HINDSIGHT_API_LLM_GROQ_SERVICE_TIERGroq service tier: on_demand, flex, autoauto

Provider Examples

# Groq (recommended for fast inference)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b
# For free tier users: override to on_demand if you get service_tier errors
# export HINDSIGHT_API_LLM_GROQ_SERVICE_TIER=on_demand

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Ollama (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3

# LM Studio (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model

# OpenAI-compatible endpoint
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_BASE_URL=https://your-endpoint.com/v1
export HINDSIGHT_API_LLM_API_KEY=your-api-key
export HINDSIGHT_API_LLM_MODEL=your-model-name

Per-Operation LLM Configuration

Different memory operations have different requirements. Retain (fact extraction) benefits from models with strong structured output capabilities, while Reflect (reasoning/response generation) can use lighter, faster models. Configure separate LLM models for each operation to optimize for cost and performance.

VariableDescriptionDefault
HINDSIGHT_API_RETAIN_LLM_PROVIDERLLM provider for retain operationsFalls back to HINDSIGHT_API_LLM_PROVIDER
HINDSIGHT_API_RETAIN_LLM_API_KEYAPI key for retain LLMFalls back to HINDSIGHT_API_LLM_API_KEY
HINDSIGHT_API_RETAIN_LLM_MODELModel for retain operationsFalls back to HINDSIGHT_API_LLM_MODEL
HINDSIGHT_API_RETAIN_LLM_BASE_URLBase URL for retain LLMFalls back to HINDSIGHT_API_LLM_BASE_URL
HINDSIGHT_API_REFLECT_LLM_PROVIDERLLM provider for reflect operationsFalls back to HINDSIGHT_API_LLM_PROVIDER
HINDSIGHT_API_REFLECT_LLM_API_KEYAPI key for reflect LLMFalls back to HINDSIGHT_API_LLM_API_KEY
HINDSIGHT_API_REFLECT_LLM_MODELModel for reflect operationsFalls back to HINDSIGHT_API_LLM_MODEL
HINDSIGHT_API_REFLECT_LLM_BASE_URLBase URL for reflect LLMFalls back to HINDSIGHT_API_LLM_BASE_URL
When to Use Per-Operation Config
  • Retain: Use models with strong structured output (e.g., GPT-4o, Claude) for accurate fact extraction
  • Reflect: Use faster/cheaper models (e.g., GPT-4o-mini, Groq) for reasoning and response generation
  • Recall: Does not use LLM (pure retrieval), so no configuration needed

Example: Separate Models for Retain and Reflect

# Default LLM (used as fallback)
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Use GPT-4o for retain (strong structured output)
export HINDSIGHT_API_RETAIN_LLM_MODEL=gpt-4o

# Use faster/cheaper model for reflect
export HINDSIGHT_API_REFLECT_LLM_PROVIDER=groq
export HINDSIGHT_API_REFLECT_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_REFLECT_LLM_MODEL=llama-3.3-70b-versatile

Embeddings

VariableDescriptionDefault
HINDSIGHT_API_EMBEDDINGS_PROVIDERProvider: local, tei, openai, cohere, or litellmlocal
HINDSIGHT_API_EMBEDDINGS_LOCAL_MODELModel for local providerBAAI/bge-small-en-v1.5
HINDSIGHT_API_EMBEDDINGS_TEI_URLTEI server URL-
HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEYOpenAI API key (falls back to HINDSIGHT_API_LLM_API_KEY)-
HINDSIGHT_API_EMBEDDINGS_OPENAI_MODELOpenAI embedding modeltext-embedding-3-small
HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URLCustom base URL for OpenAI-compatible API (e.g., Azure OpenAI)-
HINDSIGHT_API_COHERE_API_KEYCohere API key (shared for embeddings and reranker)-
HINDSIGHT_API_EMBEDDINGS_COHERE_MODELCohere embedding modelembed-english-v3.0
HINDSIGHT_API_EMBEDDINGS_COHERE_BASE_URLCustom base URL for Cohere-compatible API (e.g., Azure-hosted)-
HINDSIGHT_API_LITELLM_API_BASELiteLLM proxy base URL (shared for embeddings and reranker)http://localhost:4000
HINDSIGHT_API_LITELLM_API_KEYLiteLLM proxy API key (optional, depends on proxy config)-
HINDSIGHT_API_EMBEDDINGS_LITELLM_MODELLiteLLM embedding model (use provider prefix, e.g., cohere/embed-english-v3.0)text-embedding-3-small
# Local (default) - uses SentenceTransformers
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# OpenAI - cloud-based embeddings
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=sk-xxxxxxxxxxxx # or reuses HINDSIGHT_API_LLM_API_KEY
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small # 1536 dimensions

# Azure OpenAI - embeddings via Azure endpoint
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=your-azure-api-key
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small
export HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment

# TEI - HuggingFace Text Embeddings Inference (recommended for production)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=tei
export HINDSIGHT_API_EMBEDDINGS_TEI_URL=http://localhost:8080

# Cohere - cloud-based embeddings
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0 # 1024 dimensions

# Azure-hosted Cohere - embeddings via custom endpoint
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-azure-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0
export HINDSIGHT_API_EMBEDDINGS_COHERE_BASE_URL=https://your-azure-cohere-endpoint.com

# LiteLLM proxy - unified gateway for multiple providers
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_LITELLM_API_KEY=your-litellm-key # optional
export HINDSIGHT_API_EMBEDDINGS_LITELLM_MODEL=text-embedding-3-small # or cohere/embed-english-v3.0

Embedding Dimensions

Hindsight automatically detects the embedding dimension from the model at startup and adjusts the database schema accordingly. The default model (BAAI/bge-small-en-v1.5) produces 384-dimensional vectors, while OpenAI models produce 1536 or 3072 dimensions.

Dimension Changes

Once memories are stored, you cannot change the embedding dimension without losing data. If you need to switch to a model with different dimensions:

  1. Empty database: The schema is adjusted automatically on startup
  2. Existing data: Either delete all memories first, or use a model with matching dimensions

Supported OpenAI embedding dimensions:

  • text-embedding-3-small: 1536 dimensions
  • text-embedding-3-large: 3072 dimensions
  • text-embedding-ada-002: 1536 dimensions (legacy)

Reranker

VariableDescriptionDefault
HINDSIGHT_API_RERANKER_PROVIDERProvider: local, tei, cohere, flashrank, litellm, or rrflocal
HINDSIGHT_API_RERANKER_LOCAL_MODELModel for local providercross-encoder/ms-marco-MiniLM-L-6-v2
HINDSIGHT_API_RERANKER_LOCAL_MAX_CONCURRENTMax concurrent local reranking (prevents CPU thrashing under load)4
HINDSIGHT_API_RERANKER_TEI_URLTEI server URL-
HINDSIGHT_API_RERANKER_TEI_BATCH_SIZEBatch size for TEI reranking128
HINDSIGHT_API_RERANKER_TEI_MAX_CONCURRENTMax concurrent TEI reranking requests8
HINDSIGHT_API_RERANKER_COHERE_MODELCohere rerank modelrerank-english-v3.0
HINDSIGHT_API_RERANKER_COHERE_BASE_URLCustom base URL for Cohere-compatible API (e.g., Azure-hosted)-
HINDSIGHT_API_RERANKER_LITELLM_MODELLiteLLM rerank model (use provider prefix, e.g., cohere/rerank-english-v3.0)cohere/rerank-english-v3.0
# Local (default) - uses SentenceTransformers CrossEncoder
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# TEI - for high-performance inference
export HINDSIGHT_API_RERANKER_PROVIDER=tei
export HINDSIGHT_API_RERANKER_TEI_URL=http://localhost:8081

# Cohere - cloud-based reranking
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key # shared with embeddings
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0

# Azure-hosted Cohere - reranking via custom endpoint
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-azure-api-key
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0
export HINDSIGHT_API_RERANKER_COHERE_BASE_URL=https://your-azure-cohere-endpoint.com

# LiteLLM proxy - unified gateway for multiple reranking providers
export HINDSIGHT_API_RERANKER_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_LITELLM_API_KEY=your-litellm-key # optional
export HINDSIGHT_API_RERANKER_LITELLM_MODEL=cohere/rerank-english-v3.0 # or voyage/rerank-2, together_ai/...

LiteLLM supports multiple reranking providers via the /rerank endpoint:

  • Cohere (cohere/rerank-english-v3.0, cohere/rerank-multilingual-v3.0)
  • Together AI (together_ai/...)
  • Voyage AI (voyage/rerank-2)
  • Jina AI (jina_ai/...)
  • AWS Bedrock (bedrock/...)

Authentication

By default, Hindsight runs without authentication. For production deployments, enable API key authentication using the built-in tenant extension:

# Enable the built-in API key authentication
export HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
export HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

When enabled, all requests must include the API key in the Authorization header:

curl -H "Authorization: Bearer your-secret-api-key" \
http://localhost:8888/v1/default/banks

Requests without a valid API key receive a 401 Unauthorized response.

Custom Authentication

For advanced authentication (JWT, OAuth, multi-tenant schemas), implement a custom TenantExtension. See the Extensions documentation for details.

Server

VariableDescriptionDefault
HINDSIGHT_API_HOSTBind address0.0.0.0
HINDSIGHT_API_PORTServer port8888
HINDSIGHT_API_WORKERSNumber of uvicorn worker processes1
HINDSIGHT_API_LOG_LEVELLog level: debug, info, warning, errorinfo
HINDSIGHT_API_MCP_ENABLEDEnable MCP server at /mcp/{bank_id}/true

Retrieval

VariableDescriptionDefault
HINDSIGHT_API_GRAPH_RETRIEVERGraph retrieval algorithm: link_expansion, mpfp, or bfslink_expansion
HINDSIGHT_API_RECALL_MAX_CONCURRENTMax concurrent recall operations per worker (backpressure)32
HINDSIGHT_API_RERANKER_MAX_CANDIDATESMax candidates to rerank per recall (RRF pre-filters the rest)300

Graph Retrieval Algorithms

  • link_expansion (default): Fast, simple graph expansion from semantic seeds via entity co-occurrence and causal links. Target latency under 100ms. Recommended for most use cases.
  • mpfp: Multi-Path Fact Propagation - iterative graph traversal with activation spreading. More thorough but slower.
  • bfs: Breadth-first search from seed facts. Simple but less effective for large graphs.

Entity Observations

Controls when the system generates entity observations (summaries about entities mentioned in retained content).

VariableDescriptionDefault
HINDSIGHT_API_OBSERVATION_MIN_FACTSMinimum facts about an entity before generating observations5
HINDSIGHT_API_OBSERVATION_TOP_ENTITIESMax entities to process per retain batch5

Retain

Controls the retain (memory ingestion) pipeline.

VariableDescriptionDefault
HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENSMax completion tokens for fact extraction LLM calls64000
HINDSIGHT_API_RETAIN_CHUNK_SIZEMax characters per chunk for fact extraction. Larger chunks extract fewer LLM calls but may lose context.3000
HINDSIGHT_API_RETAIN_EXTRACTION_MODEFact extraction mode: concise (selective, fewer high-quality facts) or verbose (detailed, more facts)concise
HINDSIGHT_API_RETAIN_EXTRACT_CAUSAL_LINKSExtract causal relationships between factstrue
HINDSIGHT_API_RETAIN_OBSERVATIONS_ASYNCRun entity observation generation asynchronously (after retain completes)false

Extraction Modes

The extraction mode controls how aggressively facts are extracted from content:

  • concise (default): Selective extraction that focuses on significant, long-term valuable facts. Filters out greetings, filler, and trivial information. Produces fewer but higher-quality facts with better performance.

  • verbose: Detailed extraction that captures every piece of information with maximum verbosity. Produces more facts with extensive detail but slower performance and higher token usage.

Local MCP Server

Configuration for the local MCP server (hindsight-local-mcp command).

VariableDescriptionDefault
HINDSIGHT_API_MCP_LOCAL_BANK_IDMemory bank ID for local MCPmcp
HINDSIGHT_API_MCP_INSTRUCTIONSAdditional instructions appended to retain/recall tool descriptions-
# Example: instruct MCP to also store assistant actions
export HINDSIGHT_API_MCP_INSTRUCTIONS="Also store every action you take, including tool calls and decisions made."

Distributed Workers

Configuration for background task processing. By default, the API processes tasks internally. For high-throughput deployments, run dedicated workers. See Services - Worker Service for details.

VariableDescriptionDefault
HINDSIGHT_API_WORKER_ENABLEDEnable internal worker in API processtrue
HINDSIGHT_API_WORKER_IDUnique worker identifierhostname
HINDSIGHT_API_WORKER_POLL_INTERVAL_MSDatabase polling interval in milliseconds500
HINDSIGHT_API_WORKER_BATCH_SIZETasks to claim per poll cycle10
HINDSIGHT_API_WORKER_MAX_RETRIESMax retries before marking task failed3
HINDSIGHT_API_WORKER_HTTP_PORTHTTP port for worker metrics/health (worker CLI only)8889

Performance Optimization

VariableDescriptionDefault
HINDSIGHT_API_SKIP_LLM_VERIFICATIONSkip LLM connection check on startupfalse
HINDSIGHT_API_LAZY_RERANKERLazy-load reranker model (faster startup)false

Programmatic Configuration

You can also configure the API programmatically using MemoryEngine.from_env():

from hindsight_api import MemoryEngine

memory = MemoryEngine.from_env()
await memory.initialize()

Control Plane

The Control Plane is the web UI for managing memory banks.

VariableDescriptionDefault
HINDSIGHT_CP_DATAPLANE_API_URLURL of the API servicehttp://localhost:8888
# Point Control Plane to a remote API service
export HINDSIGHT_CP_DATAPLANE_API_URL=http://api.example.com:8888

Example .env File

# API Service
HINDSIGHT_API_DATABASE_URL=postgresql://hindsight:hindsight_dev@localhost:5432/hindsight
HINDSIGHT_API_LLM_PROVIDER=groq
HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx

# Authentication (optional, recommended for production)
# HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
# HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

# Control Plane
HINDSIGHT_CP_DATAPLANE_API_URL=http://localhost:8888

For configuration issues not covered here, please open an issue on GitHub.