Skip to main content

Configuration

Complete reference for configuring Hindsight services through environment variables.

Hindsight has two services, each with its own configuration prefix:

ServicePrefixDescription
API ServiceHINDSIGHT_API_*Core memory engine
Control PlaneHINDSIGHT_CP_*Web UI

API Service

The API service handles all memory operations (retain, recall, reflect).

Database

VariableDescriptionDefault
HINDSIGHT_API_DATABASE_URLPostgreSQL connection stringpg0 (embedded)
HINDSIGHT_API_DATABASE_SCHEMAPostgreSQL schema name for tablespublic
HINDSIGHT_API_RUN_MIGRATIONS_ON_STARTUPRun database migrations on API startuptrue

If not provided, the server uses embedded pg0 — convenient for development but not recommended for production.

The DATABASE_SCHEMA setting allows you to use a custom PostgreSQL schema instead of the default public schema. This is useful for:

  • Multi-database setups where you want Hindsight tables in a dedicated schema
  • Hosting platforms (e.g., Supabase) where public schema is reserved or shared
  • Organizational preferences for schema naming conventions
# Example: Using a custom schema
export HINDSIGHT_API_DATABASE_URL=postgresql://user:pass@host:5432/dbname
export HINDSIGHT_API_DATABASE_SCHEMA=hindsight

Migrations will automatically create the schema if it doesn't exist and create all tables in the configured schema.

Database Connection Pool

VariableDescriptionDefault
HINDSIGHT_API_DB_POOL_MIN_SIZEMinimum connections in the pool5
HINDSIGHT_API_DB_POOL_MAX_SIZEMaximum connections in the pool100
HINDSIGHT_API_DB_COMMAND_TIMEOUTPostgreSQL command timeout in seconds60
HINDSIGHT_API_DB_ACQUIRE_TIMEOUTConnection acquisition timeout in seconds30

For high-concurrency workloads, increase DB_POOL_MAX_SIZE. Each concurrent recall/think operation can use 2-4 connections.

To run migrations manually (e.g., before starting the API), use the admin CLI:

# Migrate the base schema plus all discovered tenant schemas
hindsight-admin run-db-migration

# Or migrate a specific schema only:
hindsight-admin run-db-migration --schema tenant_acme

Vector Extension

VariableDescriptionDefault
HINDSIGHT_API_VECTOR_EXTENSIONVector index algorithm: pgvector, vchord, or pgvectorscalepgvector

Hindsight supports three PostgreSQL vector extensions:

pgvector (HNSW - default)

  • In-memory index using Hierarchical Navigable Small World algorithm
  • Works well for most embeddings and dataset sizes
  • Fast for small-medium datasets (<10M vectors)
  • Higher memory usage for large datasets
  • Most widely deployed and supported
  • Disk-based index using StreamingDiskANN algorithm
  • 28x lower p95 latency and 16x higher throughput vs dedicated vector DBs
  • 60-75% cost reduction at scale (SSDs cheaper than RAM)
  • Superior filtering performance with streaming retrieval model
  • Optimized for large datasets (10M+ vectors)
  • Supports both pgvectorscale (open source) and pg_diskann (Azure)
  • Installation:
    • Open source/self-hosted: CREATE EXTENSION vector; CREATE EXTENSION vectorscale CASCADE;
    • Azure PostgreSQL: CREATE EXTENSION vector; CREATE EXTENSION pg_diskann CASCADE;

vchord (vchordrq)

  • Alternative high-performance vector index
  • Optimized for high-dimensional embeddings (3000+ dimensions)
  • Includes integrated BM25 search capabilities
  • Requires vchord extension

When to use pgvectorscale (DiskANN):

  • Large datasets (10M+ vectors) ⭐
  • Complex filtering requirements
  • Cost-sensitive deployments
  • Production workloads requiring high throughput
  • When disk I/O is not a bottleneck

When to use pgvector (HNSW):

  • Small-medium datasets (<10M vectors)
  • Maximum query speed when all data fits in memory
  • Simple nearest-neighbor queries without filters
  • Standard PostgreSQL deployment preference

When to use vchord:

  • High-dimensional embeddings (3000+ dimensions)
  • Want integrated BM25 search
  • Already using vchord for text search

Switching extensions:

If you need to switch from one extension to another:

  1. Set HINDSIGHT_API_VECTOR_EXTENSION to your desired extension (pgvector, vchord, or pgvectorscale)
  2. If your database has existing data, you'll get an error with migration instructions
  3. For empty databases, indexes will be automatically recreated on startup

Learn more:

Text Search Extension

VariableDescriptionDefault
HINDSIGHT_API_TEXT_SEARCH_EXTENSIONText search backend: native, vchord, or pg_textsearchnative

Hindsight supports three text search backends for BM25 keyword retrieval:

  • native: PostgreSQL's built-in full-text search (tsvector + GIN indexes)
  • vchord: VectorChord BM25 (bm25vector + BM25 indexes) - requires vchord_bm25 extension
  • pg_textsearch: Timescale BM25 (text columns + BM25 indexes) - requires pg_textsearch extension

When to use native:

  • Standard PostgreSQL deployment (no extra extensions)
  • Simpler setup and wider compatibility
  • Works well for most use cases

When to use vchord:

  • Already using vchord for vector search (good integration)
  • Want better BM25 ranking performance
  • Need advanced tokenization (uses llmlingua2 tokenizer)

When to use pg_textsearch:

  • Want industry-standard BM25 ranking with better relevance than native PostgreSQL
  • Need efficient top-K queries with Block-Max WAND optimization
  • Prefer lower memory footprint compared to vchord
  • Already using Timescale or have pg_textsearch available

Switching backends:

To switch between backends:

  1. Set HINDSIGHT_API_TEXT_SEARCH_EXTENSION to your desired backend (native, vchord, or pg_textsearch)
  2. If your database has existing data, you'll get an error with migration instructions
  3. For empty databases, the columns/indexes will be automatically recreated on startup

Note: VectorChord uses the llmlingua2 tokenizer for multilingual support, while native and pg_textsearch use PostgreSQL's English tokenizer.

LLM Provider

VariableDescriptionDefault
HINDSIGHT_API_LLM_PROVIDERProvider: openai, openai-codex, claude-code, anthropic, gemini, groq, ollama, lmstudio, vertexaiopenai
HINDSIGHT_API_LLM_API_KEYAPI key for LLM provider-
HINDSIGHT_API_LLM_MODELModel namegpt-5-mini
HINDSIGHT_API_LLM_BASE_URLCustom LLM endpointProvider default
HINDSIGHT_API_LLM_MAX_CONCURRENTMax concurrent LLM requests32
HINDSIGHT_API_LLM_MAX_RETRIESMax retry attempts for LLM API calls10
HINDSIGHT_API_LLM_INITIAL_BACKOFFInitial retry backoff in seconds (exponential backoff)1.0
HINDSIGHT_API_LLM_MAX_BACKOFFMax retry backoff cap in seconds60.0
HINDSIGHT_API_LLM_TIMEOUTLLM request timeout in seconds120
HINDSIGHT_API_LLM_GROQ_SERVICE_TIERGroq service tier: on_demand, flex, autoauto
HINDSIGHT_API_LLM_OPENAI_SERVICE_TIEROpenAI service tier: flex for 50% cost savings (OpenAI Flex Processing)None (default)

Provider Examples

# Groq (recommended for fast inference)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b
# For free tier users: override to on_demand if you get service_tier errors
# export HINDSIGHT_API_LLM_GROQ_SERVICE_TIER=on_demand

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o
# Optional: Use Flex Processing for 50% cost savings (with variable latency)
# export HINDSIGHT_API_LLM_OPENAI_SERVICE_TIER=flex

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Vertex AI (Google Cloud - uses native genai SDK)
export HINDSIGHT_API_LLM_PROVIDER=vertexai
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash-001
export HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID=your-gcp-project-id
export HINDSIGHT_API_LLM_VERTEXAI_REGION=us-central1
# Optional: use ADC (gcloud auth application-default login) or provide service account key:
# export HINDSIGHT_API_LLM_VERTEXAI_SERVICE_ACCOUNT_KEY=/path/to/service-account-key.json

# Ollama (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3

# LM Studio (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model

# OpenAI-compatible endpoint
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_BASE_URL=https://your-endpoint.com/v1
export HINDSIGHT_API_LLM_API_KEY=your-api-key
export HINDSIGHT_API_LLM_MODEL=your-model-name

# OpenAI Codex (ChatGPT Plus/Pro subscription - uses OAuth, no API key needed)
export HINDSIGHT_API_LLM_PROVIDER=openai-codex
export HINDSIGHT_API_LLM_MODEL=gpt-5.2-codex
# No API key needed - uses OAuth tokens from ~/.codex/auth.json

# Claude Code (Claude Pro/Max subscription - uses OAuth, no API key needed)
export HINDSIGHT_API_LLM_PROVIDER=claude-code
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-5-20250929
# No API key needed - uses claude auth login credentials
OpenAI Codex, Claude Code & Vertex AI Setup

For detailed setup instructions for OpenAI Codex (ChatGPT Plus/Pro), Claude Code (Claude Pro/Max), and Vertex AI (Google Cloud), see the Models documentation.

Per-Operation LLM Configuration

Different memory operations have different requirements. Retain (fact extraction) benefits from models with strong structured output capabilities, while Reflect (reasoning/response generation) can use lighter, faster models. Configure separate LLM models for each operation to optimize for cost and performance.

VariableDescriptionDefault
HINDSIGHT_API_RETAIN_LLM_PROVIDERLLM provider for retain operationsFalls back to HINDSIGHT_API_LLM_PROVIDER
HINDSIGHT_API_RETAIN_LLM_API_KEYAPI key for retain LLMFalls back to HINDSIGHT_API_LLM_API_KEY
HINDSIGHT_API_RETAIN_LLM_MODELModel for retain operationsFalls back to HINDSIGHT_API_LLM_MODEL
HINDSIGHT_API_RETAIN_LLM_BASE_URLBase URL for retain LLMFalls back to HINDSIGHT_API_LLM_BASE_URL
HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENTMax concurrent requests for retainFalls back to HINDSIGHT_API_LLM_MAX_CONCURRENT
HINDSIGHT_API_RETAIN_LLM_MAX_RETRIESMax retries for retainFalls back to HINDSIGHT_API_LLM_MAX_RETRIES
HINDSIGHT_API_RETAIN_LLM_INITIAL_BACKOFFInitial backoff for retain retries (seconds)Falls back to HINDSIGHT_API_LLM_INITIAL_BACKOFF
HINDSIGHT_API_RETAIN_LLM_MAX_BACKOFFMax backoff cap for retain retries (seconds)Falls back to HINDSIGHT_API_LLM_MAX_BACKOFF
HINDSIGHT_API_RETAIN_LLM_TIMEOUTTimeout for retain requests (seconds)Falls back to HINDSIGHT_API_LLM_TIMEOUT
HINDSIGHT_API_REFLECT_LLM_PROVIDERLLM provider for reflect operationsFalls back to HINDSIGHT_API_LLM_PROVIDER
HINDSIGHT_API_REFLECT_LLM_API_KEYAPI key for reflect LLMFalls back to HINDSIGHT_API_LLM_API_KEY
HINDSIGHT_API_REFLECT_LLM_MODELModel for reflect operationsFalls back to HINDSIGHT_API_LLM_MODEL
HINDSIGHT_API_REFLECT_LLM_BASE_URLBase URL for reflect LLMFalls back to HINDSIGHT_API_LLM_BASE_URL
HINDSIGHT_API_REFLECT_LLM_MAX_CONCURRENTMax concurrent requests for reflectFalls back to HINDSIGHT_API_LLM_MAX_CONCURRENT
HINDSIGHT_API_REFLECT_LLM_MAX_RETRIESMax retries for reflectFalls back to HINDSIGHT_API_LLM_MAX_RETRIES
HINDSIGHT_API_REFLECT_LLM_INITIAL_BACKOFFInitial backoff for reflect retries (seconds)Falls back to HINDSIGHT_API_LLM_INITIAL_BACKOFF
HINDSIGHT_API_REFLECT_LLM_MAX_BACKOFFMax backoff cap for reflect retries (seconds)Falls back to HINDSIGHT_API_LLM_MAX_BACKOFF
HINDSIGHT_API_REFLECT_LLM_TIMEOUTTimeout for reflect requests (seconds)Falls back to HINDSIGHT_API_LLM_TIMEOUT
HINDSIGHT_API_CONSOLIDATION_LLM_PROVIDERLLM provider for observation consolidationFalls back to HINDSIGHT_API_LLM_PROVIDER
HINDSIGHT_API_CONSOLIDATION_LLM_API_KEYAPI key for consolidation LLMFalls back to HINDSIGHT_API_LLM_API_KEY
HINDSIGHT_API_CONSOLIDATION_LLM_MODELModel for consolidation operationsFalls back to HINDSIGHT_API_LLM_MODEL
HINDSIGHT_API_CONSOLIDATION_LLM_BASE_URLBase URL for consolidation LLMFalls back to HINDSIGHT_API_LLM_BASE_URL
HINDSIGHT_API_CONSOLIDATION_LLM_MAX_CONCURRENTMax concurrent requests for consolidationFalls back to HINDSIGHT_API_LLM_MAX_CONCURRENT
HINDSIGHT_API_CONSOLIDATION_LLM_MAX_RETRIESMax retries for consolidationFalls back to HINDSIGHT_API_LLM_MAX_RETRIES
HINDSIGHT_API_CONSOLIDATION_LLM_INITIAL_BACKOFFInitial backoff for consolidation retries (seconds)Falls back to HINDSIGHT_API_LLM_INITIAL_BACKOFF
HINDSIGHT_API_CONSOLIDATION_LLM_MAX_BACKOFFMax backoff cap for consolidation retries (seconds)Falls back to HINDSIGHT_API_LLM_MAX_BACKOFF
HINDSIGHT_API_CONSOLIDATION_LLM_TIMEOUTTimeout for consolidation requests (seconds)Falls back to HINDSIGHT_API_LLM_TIMEOUT
When to Use Per-Operation Config
  • Retain: Use models with strong structured output (e.g., GPT-4o, Claude) for accurate fact extraction
  • Reflect: Use faster/cheaper models (e.g., GPT-4o-mini, Groq) for reasoning and response generation
  • Recall: Does not use LLM (pure retrieval), so no configuration needed

Example: Separate Models for Retain and Reflect

# Default LLM (used as fallback)
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Use GPT-4o for retain (strong structured output)
export HINDSIGHT_API_RETAIN_LLM_MODEL=gpt-4o

# Use faster/cheaper model for reflect
export HINDSIGHT_API_REFLECT_LLM_PROVIDER=groq
export HINDSIGHT_API_REFLECT_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_REFLECT_LLM_MODEL=llama-3.3-70b-versatile

Example: Tuning Retry Behavior for Rate-Limited APIs

# For Anthropic with tight rate limits (10k output tokens/minute)
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Reduce concurrent requests for retain to avoid rate limits
export HINDSIGHT_API_RETAIN_LLM_MAX_CONCURRENT=3

# Fail faster with fewer retries
export HINDSIGHT_API_RETAIN_LLM_MAX_RETRIES=3

# Or increase backoff times to wait out rate limit windows
export HINDSIGHT_API_RETAIN_LLM_INITIAL_BACKOFF=2.0 # Start at 2s instead of 1s
export HINDSIGHT_API_RETAIN_LLM_MAX_BACKOFF=120.0 # Cap at 2min instead of 1min

Embeddings

VariableDescriptionDefault
HINDSIGHT_API_EMBEDDINGS_PROVIDERProvider: local, tei, openai, cohere, litellm, or litellm-sdklocal
HINDSIGHT_API_EMBEDDINGS_LOCAL_MODELModel for local providerBAAI/bge-small-en-v1.5
HINDSIGHT_API_EMBEDDINGS_LOCAL_TRUST_REMOTE_CODEAllow loading models with custom code (security risk, disabled by default)false
HINDSIGHT_API_EMBEDDINGS_TEI_URLTEI server URL-
HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEYOpenAI API key (falls back to HINDSIGHT_API_LLM_API_KEY)-
HINDSIGHT_API_EMBEDDINGS_OPENAI_MODELOpenAI embedding modeltext-embedding-3-small
HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URLCustom base URL for OpenAI-compatible API (e.g., Azure OpenAI)-
HINDSIGHT_API_EMBEDDINGS_COHERE_API_KEYCohere API key for embeddings-
HINDSIGHT_API_EMBEDDINGS_COHERE_MODELCohere embedding modelembed-english-v3.0
HINDSIGHT_API_EMBEDDINGS_COHERE_BASE_URLCustom base URL for Cohere-compatible API (e.g., Azure-hosted)-
HINDSIGHT_API_EMBEDDINGS_LITELLM_API_BASELiteLLM proxy base URL for embeddingshttp://localhost:4000
HINDSIGHT_API_EMBEDDINGS_LITELLM_API_KEYLiteLLM proxy API key for embeddings (optional, depends on proxy config)-
HINDSIGHT_API_EMBEDDINGS_LITELLM_MODELLiteLLM embedding model (use provider prefix, e.g., cohere/embed-english-v3.0)text-embedding-3-small
HINDSIGHT_API_EMBEDDINGS_LITELLM_SDK_API_KEYLiteLLM SDK API key for direct embedding provider access-
HINDSIGHT_API_EMBEDDINGS_LITELLM_SDK_MODELLiteLLM SDK embedding model (use provider prefix, e.g., cohere/embed-english-v3.0)cohere/embed-english-v3.0
HINDSIGHT_API_EMBEDDINGS_LITELLM_SDK_API_BASECustom base URL for LiteLLM SDK embeddings (optional)-
# Local (default) - uses SentenceTransformers
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# Local with custom model requiring trust_remote_code
# WARNING: Only enable trust_remote_code for models you trust (security risk)
# export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=your-custom-model
# export HINDSIGHT_API_EMBEDDINGS_LOCAL_TRUST_REMOTE_CODE=true

# OpenAI - cloud-based embeddings
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=sk-xxxxxxxxxxxx # or reuses HINDSIGHT_API_LLM_API_KEY
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small # 1536 dimensions

# Azure OpenAI - embeddings via Azure endpoint
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=your-azure-api-key
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small
export HINDSIGHT_API_EMBEDDINGS_OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment

# TEI - HuggingFace Text Embeddings Inference (recommended for production)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=tei
export HINDSIGHT_API_EMBEDDINGS_TEI_URL=http://localhost:8080

# Cohere - cloud-based embeddings
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_EMBEDDINGS_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0 # 1024 dimensions

# Azure-hosted Cohere - embeddings via custom endpoint
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_EMBEDDINGS_COHERE_API_KEY=your-azure-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0
export HINDSIGHT_API_EMBEDDINGS_COHERE_BASE_URL=https://your-azure-cohere-endpoint.com

# LiteLLM proxy - unified gateway for multiple providers
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=litellm
export HINDSIGHT_API_EMBEDDINGS_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_EMBEDDINGS_LITELLM_API_KEY=your-litellm-key # optional
export HINDSIGHT_API_EMBEDDINGS_LITELLM_MODEL=text-embedding-3-small # or cohere/embed-english-v3.0

# LiteLLM SDK - direct API access without proxy server (recommended)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=litellm-sdk
export HINDSIGHT_API_EMBEDDINGS_LITELLM_SDK_API_KEY=your-provider-api-key
export HINDSIGHT_API_EMBEDDINGS_LITELLM_SDK_MODEL=cohere/embed-english-v3.0

# Supported LiteLLM SDK embedding providers:
# - cohere/embed-english-v3.0 (1024 dimensions)
# - openai/text-embedding-3-small (1536 dimensions)
# - together_ai/togethercomputer/m2-bert-80M-8k-retrieval
# - huggingface/sentence-transformers/all-MiniLM-L6-v2
# - voyage/voyage-2

Embedding Dimensions

Hindsight automatically detects the embedding dimension from the model at startup and adjusts the database schema accordingly. The default model (BAAI/bge-small-en-v1.5) produces 384-dimensional vectors, while OpenAI models produce 1536 or 3072 dimensions.

Dimension Changes

Once memories are stored, you cannot change the embedding dimension without losing data. If you need to switch to a model with different dimensions:

  1. Empty database: The schema is adjusted automatically on startup
  2. Existing data: Either delete all memories first, or use a model with matching dimensions

Supported OpenAI embedding dimensions:

  • text-embedding-3-small: 1536 dimensions
  • text-embedding-3-large: 3072 dimensions
  • text-embedding-ada-002: 1536 dimensions (legacy)

Reranker

VariableDescriptionDefault
HINDSIGHT_API_RERANKER_PROVIDERProvider: local, tei, cohere, zeroentropy, flashrank, litellm, litellm-sdk, or rrflocal
HINDSIGHT_API_RERANKER_LOCAL_MODELModel for local providercross-encoder/ms-marco-MiniLM-L-6-v2
HINDSIGHT_API_RERANKER_LOCAL_MAX_CONCURRENTMax concurrent local reranking (prevents CPU thrashing under load)4
HINDSIGHT_API_RERANKER_LOCAL_TRUST_REMOTE_CODEAllow loading models with custom code (security risk, disabled by default)false
HINDSIGHT_API_RERANKER_TEI_URLTEI server URL-
HINDSIGHT_API_RERANKER_TEI_BATCH_SIZEBatch size for TEI reranking128
HINDSIGHT_API_RERANKER_TEI_MAX_CONCURRENTMax concurrent TEI reranking requests8
HINDSIGHT_API_RERANKER_COHERE_API_KEYCohere API key for reranking-
HINDSIGHT_API_RERANKER_COHERE_MODELCohere rerank modelrerank-english-v3.0
HINDSIGHT_API_RERANKER_COHERE_BASE_URLCustom base URL for Cohere-compatible API (e.g., Azure-hosted)-
HINDSIGHT_API_RERANKER_LITELLM_API_BASELiteLLM proxy base URL for rerankinghttp://localhost:4000
HINDSIGHT_API_RERANKER_LITELLM_API_KEYLiteLLM proxy API key for reranking (optional, depends on proxy config)-
HINDSIGHT_API_RERANKER_LITELLM_MODELLiteLLM proxy rerank model (use provider prefix, e.g., cohere/rerank-english-v3.0)cohere/rerank-english-v3.0
HINDSIGHT_API_RERANKER_LITELLM_SDK_API_KEYLiteLLM SDK API key for direct reranking (no proxy needed)-
HINDSIGHT_API_RERANKER_LITELLM_SDK_MODELLiteLLM SDK rerank model (e.g., deepinfra/Qwen3-reranker-8B)cohere/rerank-english-v3.0
HINDSIGHT_API_RERANKER_LITELLM_SDK_API_BASECustom API base URL for LiteLLM SDK (optional)-
HINDSIGHT_API_RERANKER_ZEROENTROPY_API_KEYZeroEntropy API key for reranking-
HINDSIGHT_API_RERANKER_ZEROENTROPY_MODELZeroEntropy rerank model (zerank-2, zerank-2-small)zerank-2
HINDSIGHT_API_RERANKER_FLASHRANK_MODELFlashRank model for fast CPU-based rerankingms-marco-MiniLM-L-12-v2
HINDSIGHT_API_RERANKER_FLASHRANK_CACHE_DIRCache directory for FlashRank modelsSystem default
# Local (default) - uses SentenceTransformers CrossEncoder
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Local with custom model requiring trust_remote_code (e.g., jina-reranker-v2)
# WARNING: Only enable trust_remote_code for models you trust (security risk)
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=jinaai/jina-reranker-v2-base-multilingual
export HINDSIGHT_API_RERANKER_LOCAL_TRUST_REMOTE_CODE=true

# TEI - for high-performance inference
export HINDSIGHT_API_RERANKER_PROVIDER=tei
export HINDSIGHT_API_RERANKER_TEI_URL=http://localhost:8081

# Cohere - cloud-based reranking
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_RERANKER_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0

# Azure-hosted Cohere - reranking via custom endpoint
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_RERANKER_COHERE_API_KEY=your-azure-api-key
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0
export HINDSIGHT_API_RERANKER_COHERE_BASE_URL=https://your-azure-cohere-endpoint.com

# ZeroEntropy - cloud-based reranking (state-of-the-art accuracy)
export HINDSIGHT_API_RERANKER_PROVIDER=zeroentropy
export HINDSIGHT_API_RERANKER_ZEROENTROPY_API_KEY=your-api-key
export HINDSIGHT_API_RERANKER_ZEROENTROPY_MODEL=zerank-2 # or zerank-2-small

# LiteLLM proxy - unified gateway for multiple reranking providers (requires running LiteLLM proxy server)
export HINDSIGHT_API_RERANKER_PROVIDER=litellm
export HINDSIGHT_API_RERANKER_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_RERANKER_LITELLM_API_KEY=your-litellm-key # optional
export HINDSIGHT_API_RERANKER_LITELLM_MODEL=cohere/rerank-english-v3.0 # or voyage/rerank-2, together_ai/...

# LiteLLM SDK - direct API access without proxy (recommended for simplicity)
export HINDSIGHT_API_RERANKER_PROVIDER=litellm-sdk
export HINDSIGHT_API_RERANKER_LITELLM_SDK_API_KEY=your-deepinfra-api-key
export HINDSIGHT_API_RERANKER_LITELLM_SDK_MODEL=deepinfra/Qwen3-reranker-8B # or cohere/rerank-english-v3.0, etc.

LiteLLM Proxy vs SDK

  • litellm: Requires running a separate LiteLLM proxy server. Good for centralized configuration, rate limiting, and caching.
  • litellm-sdk: Direct API access without proxy. Simpler setup, lower latency, fewer infrastructure components.

Both support the same providers:

  • Cohere (cohere/rerank-english-v3.0, cohere/rerank-multilingual-v3.0)
  • DeepInfra (deepinfra/Qwen3-reranker-8B, deepinfra/bge-reranker-v2-m3)
  • Together AI (together_ai/Salesforce/Llama-Rank-V1)
  • HuggingFace (huggingface/BAAI/bge-reranker-v2-m3)
  • Voyage AI (voyage/rerank-2)
  • Jina AI (jina_ai/jina-reranker-v2)
  • AWS Bedrock (bedrock/...)

Authentication

By default, Hindsight runs without authentication. For production deployments, enable API key authentication using the built-in tenant extension:

# Enable the built-in API key authentication
export HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
export HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

When enabled, all requests must include the API key in the Authorization header:

curl -H "Authorization: Bearer your-secret-api-key" \
http://localhost:8888/v1/default/banks

Requests without a valid API key receive a 401 Unauthorized response.

Custom Authentication

For advanced authentication (JWT, OAuth, multi-tenant schemas), implement a custom TenantExtension. See the Extensions documentation for details.

Server

VariableDescriptionDefault
HINDSIGHT_API_HOSTBind address0.0.0.0
HINDSIGHT_API_PORTServer port8888
HINDSIGHT_API_BASE_PATHBase path for API when behind reverse proxy (e.g., /hindsight)"" (root)
HINDSIGHT_API_WORKERSNumber of uvicorn worker processes1
HINDSIGHT_API_LOG_LEVELLog level: debug, info, warning, errorinfo
HINDSIGHT_API_LOG_FORMATLog format: text or json (structured logging for cloud platforms)text
HINDSIGHT_API_MCP_ENABLEDEnable MCP server at /mcp/{bank_id}/true

Retrieval

VariableDescriptionDefault
HINDSIGHT_API_GRAPH_RETRIEVERGraph retrieval algorithm: link_expansion, mpfp, or bfslink_expansion
HINDSIGHT_API_RECALL_MAX_CONCURRENTMax concurrent recall operations per worker (backpressure)32
HINDSIGHT_API_RECALL_CONNECTION_BUDGETMax concurrent DB connections per recall operation4
HINDSIGHT_API_RERANKER_MAX_CANDIDATESMax candidates to rerank per recall (RRF pre-filters the rest)300
HINDSIGHT_API_MPFP_TOP_K_NEIGHBORSFan-out limit per node in MPFP graph traversal20
HINDSIGHT_API_MENTAL_MODEL_REFRESH_CONCURRENCYMax concurrent mental model refreshes8
HINDSIGHT_API_ENABLE_MENTAL_MODEL_HISTORYTrack history of content changes to each mental model (previous content + timestamp). Disable to reduce storage if audit trails are not needed.true

Graph Retrieval Algorithms

  • link_expansion (default): Fast, simple graph expansion from semantic seeds via entity co-occurrence and causal links. Target latency under 100ms. Recommended for most use cases.
  • mpfp: Multi-Path Fact Propagation - iterative graph traversal with activation spreading. More thorough but slower.
  • bfs: Breadth-first search from seed facts. Simple but less effective for large graphs.

Retain

Controls the retain (memory ingestion) pipeline.

VariableDescriptionDefault
HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENSMax completion tokens for fact extraction LLM calls64000
HINDSIGHT_API_RETAIN_CHUNK_SIZEMax characters per chunk for fact extraction. Larger chunks extract fewer LLM calls but may lose context.3000
HINDSIGHT_API_RETAIN_EXTRACTION_MODEFact extraction mode: concise, verbose, or customconcise
HINDSIGHT_API_RETAIN_MISSIONWhat this bank should pay attention to during extraction. Steers the LLM without replacing the extraction rules — works alongside any extraction mode.-
HINDSIGHT_API_RETAIN_CUSTOM_INSTRUCTIONSFull prompt override for fact extraction (only used when mode is custom). Replaces built-in extraction rules entirely.-
HINDSIGHT_API_RETAIN_EXTRACT_CAUSAL_LINKSExtract causal relationships between factstrue
HINDSIGHT_API_RETAIN_BATCH_ENABLEDUse LLM Batch API for fact extraction (50% cost savings, only with async operations)false
HINDSIGHT_API_RETAIN_BATCH_POLL_INTERVAL_SECONDSBatch API polling interval in seconds60

Entity labels (entity_labels) and free-form entity extraction (entities_allow_free_form) are configured per bank via the bank config API, not as global environment variables — each bank can have its own controlled vocabulary. See Entity Labels for details.

Customizing retain: when to use what

There are three levels of customization for the retain pipeline. Start with the simplest that covers your needs:

GoalUse
Steer what topics to focus on or deprioritizeHINDSIGHT_API_RETAIN_MISSION
Extract more detail per factHINDSIGHT_API_RETAIN_EXTRACTION_MODE=verbose
Completely replace the extraction rulesHINDSIGHT_API_RETAIN_EXTRACTION_MODE=custom + HINDSIGHT_API_RETAIN_CUSTOM_INSTRUCTIONS

HINDSIGHT_API_RETAIN_MISSION — steer extraction without replacing it (recommended starting point)

Tell the bank what to pay attention to during extraction, in plain language. The mission is injected into the extraction prompt alongside the built-in rules — it narrows focus without replacing the underlying logic. Works with any extraction mode (concise, verbose, custom).

export HINDSIGHT_API_RETAIN_MISSION="Focus on technical decisions, architecture choices, and team member expertise. Deprioritize social or personal information."

HINDSIGHT_API_RETAIN_EXTRACTION_MODE=verbose — more detail per fact

Use when you need richer facts with full context, relationships, and verbosity. Slower and uses more tokens than concise.

HINDSIGHT_API_RETAIN_EXTRACTION_MODE=custom + HINDSIGHT_API_RETAIN_CUSTOM_INSTRUCTIONS — full control

Replaces the built-in selectivity rules entirely. The structural parts of the prompt (output format, temporal handling, coreference resolution) remain intact — only the extraction guidelines are replaced.

Use this when retain_mission isn't sufficient and you need strict inclusion/exclusion logic.

export HINDSIGHT_API_RETAIN_EXTRACTION_MODE=custom
export HINDSIGHT_API_RETAIN_CUSTOM_INSTRUCTIONS="ONLY extract facts that are:
✅ Technical decisions and their rationale
✅ Architecture patterns and design choices
✅ Performance metrics and benchmarks

DO NOT extract:
❌ Greetings or social conversation
❌ Process chatter (\"let me check\", \"one moment\")
❌ Anything that would not be useful in 6 months"

File Processing

Configuration for the file upload and conversion pipeline (used by POST /v1/default/banks/{bank_id}/files/retain).

VariableDescriptionDefault
HINDSIGHT_API_ENABLE_FILE_UPLOAD_APIEnable the file upload API endpointtrue
HINDSIGHT_API_FILE_PARSERServer-side default parser or fallback chain (comma-separated, e.g. iris,markitdown)markitdown
HINDSIGHT_API_FILE_PARSER_ALLOWLISTComma-separated list of parsers clients are allowed to request. If not set, all registered parsers are allowed.
HINDSIGHT_API_FILE_CONVERSION_MAX_BATCH_SIZEMax files per upload request10
HINDSIGHT_API_FILE_CONVERSION_MAX_BATCH_SIZE_MBMax total upload size per request (MB)100
HINDSIGHT_API_FILE_DELETE_AFTER_RETAINDelete stored files after memory extraction completestrue

Parser selection

Clients can override the server default by passing parser in the request body of POST /v1/default/banks/{bank_id}/files/retain. Both the server default and the per-request field accept a single parser name or an ordered fallback chain — each parser is tried in sequence until one succeeds.

# Server default: try iris first, fall back to markitdown if iris fails
export HINDSIGHT_API_FILE_PARSER=iris,markitdown

# Restrict what clients may request (optional — defaults to all registered parsers)
export HINDSIGHT_API_FILE_PARSER_ALLOWLIST=markitdown,iris
// Per-request override (in the JSON body of the file retain endpoint)
{
"parser": "iris",
"files_metadata": [
{ "document_id": "report" },
{ "document_id": "fallback_doc", "parser": ["iris", "markitdown"] }
]
}

Clients that request a parser not in the allowlist receive HTTP 400.

Parser: markitdown (default)

Local file-to-markdown conversion using Microsoft's markitdown. No external service required.

Supported formats: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, images (JPG, PNG — OCR), audio (MP3, WAV — transcription), HTML, TXT, MD, CSV.

Parser: iris

Cloud-based extraction via Vectorize Iris. Higher quality extraction for complex documents, powered by a remote AI service.

VariableDescriptionDefault
HINDSIGHT_API_FILE_PARSER_IRIS_TOKENVectorize API token
HINDSIGHT_API_FILE_PARSER_IRIS_ORG_IDVectorize organization ID

Supported formats: PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, images (JPG, JPEG, PNG, GIF, BMP, TIFF, WEBP), HTML, TXT, MD, CSV.

# Use iris as the only parser
export HINDSIGHT_API_FILE_PARSER=iris
export HINDSIGHT_API_FILE_PARSER_IRIS_TOKEN=your-vectorize-token
export HINDSIGHT_API_FILE_PARSER_IRIS_ORG_ID=your-org-id

# Or: try iris first, fall back to markitdown if iris fails or rejects the file type
export HINDSIGHT_API_FILE_PARSER=iris,markitdown
# Increase batch limits for large file imports
export HINDSIGHT_API_FILE_CONVERSION_MAX_BATCH_SIZE=20
export HINDSIGHT_API_FILE_CONVERSION_MAX_BATCH_SIZE_MB=500

# Keep files after processing (useful for debugging or re-processing)
export HINDSIGHT_API_FILE_DELETE_AFTER_RETAIN=false

File Storage

Files uploaded via the file retain API are stored in an object storage backend before conversion. Choose the backend that fits your infrastructure.

VariableDescriptionDefault
HINDSIGHT_API_FILE_STORAGE_TYPEStorage backend: native, s3, gcs, or azurenative

Native (PostgreSQL)

Files are stored as BYTEA in the file_storage table. No additional infrastructure required. Suitable for development and small deployments.

# Native storage is the default — no additional configuration needed
export HINDSIGHT_API_FILE_STORAGE_TYPE=native

S3 / S3-Compatible

VariableDescriptionDefault
HINDSIGHT_API_FILE_STORAGE_S3_BUCKETS3 bucket name-
HINDSIGHT_API_FILE_STORAGE_S3_REGIONAWS region-
HINDSIGHT_API_FILE_STORAGE_S3_ENDPOINTCustom endpoint URL (for S3-compatible stores like MinIO, Cloudflare R2)AWS default
HINDSIGHT_API_FILE_STORAGE_S3_ACCESS_KEY_IDAWS access key ID-
HINDSIGHT_API_FILE_STORAGE_S3_SECRET_ACCESS_KEYAWS secret access key-
# AWS S3
export HINDSIGHT_API_FILE_STORAGE_TYPE=s3
export HINDSIGHT_API_FILE_STORAGE_S3_BUCKET=my-hindsight-files
export HINDSIGHT_API_FILE_STORAGE_S3_REGION=us-east-1
export HINDSIGHT_API_FILE_STORAGE_S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export HINDSIGHT_API_FILE_STORAGE_S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# S3-compatible (MinIO, Cloudflare R2, etc.)
export HINDSIGHT_API_FILE_STORAGE_TYPE=s3
export HINDSIGHT_API_FILE_STORAGE_S3_BUCKET=my-bucket
export HINDSIGHT_API_FILE_STORAGE_S3_ENDPOINT=https://your-minio.example.com
export HINDSIGHT_API_FILE_STORAGE_S3_ACCESS_KEY_ID=minioadmin
export HINDSIGHT_API_FILE_STORAGE_S3_SECRET_ACCESS_KEY=minioadmin

Google Cloud Storage

VariableDescriptionDefault
HINDSIGHT_API_FILE_STORAGE_GCS_BUCKETGCS bucket name-
HINDSIGHT_API_FILE_STORAGE_GCS_SERVICE_ACCOUNT_KEYPath to service account JSON key fileADC if not set
export HINDSIGHT_API_FILE_STORAGE_TYPE=gcs
export HINDSIGHT_API_FILE_STORAGE_GCS_BUCKET=my-hindsight-files
# Optional: use service account key file (otherwise falls back to ADC)
export HINDSIGHT_API_FILE_STORAGE_GCS_SERVICE_ACCOUNT_KEY=/path/to/key.json

Azure Blob Storage

VariableDescriptionDefault
HINDSIGHT_API_FILE_STORAGE_AZURE_CONTAINERAzure container name-
HINDSIGHT_API_FILE_STORAGE_AZURE_ACCOUNT_NAMEAzure storage account name-
HINDSIGHT_API_FILE_STORAGE_AZURE_ACCOUNT_KEYAzure storage account key-
export HINDSIGHT_API_FILE_STORAGE_TYPE=azure
export HINDSIGHT_API_FILE_STORAGE_AZURE_CONTAINER=hindsight-files
export HINDSIGHT_API_FILE_STORAGE_AZURE_ACCOUNT_NAME=mystorageaccount
export HINDSIGHT_API_FILE_STORAGE_AZURE_ACCOUNT_KEY=base64encodedkey==

Storage Backend Comparison

BackendBest ForNotes
nativeDevelopment, small deploymentsNo extra infrastructure, stored in PostgreSQL
s3Production, AWS deploymentsWorks with any S3-compatible store
gcsProduction, GCP deploymentsSupports ADC for keyless auth
azureProduction, Azure deploymentsUses account key auth
Production Recommendation

For production deployments, use s3, gcs, or azure to avoid storing large binary files in your PostgreSQL database. Set HINDSIGHT_API_FILE_DELETE_AFTER_RETAIN=true (the default) to delete files after memory extraction, which minimizes storage costs.

Observations (Experimental)

Observations are consolidated knowledge synthesized from facts.

VariableDescriptionDefault
HINDSIGHT_API_ENABLE_OBSERVATIONSEnable observation consolidationtrue
HINDSIGHT_API_ENABLE_OBSERVATION_HISTORYTrack history of changes to each observation (previous content + timestamp). Disable to reduce storage if audit trails are not needed.true
HINDSIGHT_API_CONSOLIDATION_BATCH_SIZEMemories to load per batch (internal optimization)50
HINDSIGHT_API_CONSOLIDATION_MAX_TOKENSMax tokens for recall when finding related observations during consolidation1024
HINDSIGHT_API_CONSOLIDATION_LLM_BATCH_SIZENumber of facts sent to the LLM in a single consolidation call. Higher values reduce LLM calls and improve throughput at the cost of larger prompts. Set to 1 to disable batching. Configurable per bank.8
HINDSIGHT_API_CONSOLIDATION_SOURCE_FACTS_MAX_TOKENSTotal token budget for source facts included with observations in the consolidation prompt. -1 = unlimited. Configurable per bank.-1
HINDSIGHT_API_CONSOLIDATION_SOURCE_FACTS_MAX_TOKENS_PER_OBSERVATIONPer-observation token cap for source facts in the consolidation prompt. Each observation independently gets at most this many tokens of source facts. -1 = unlimited. Configurable per bank.256
HINDSIGHT_API_OBSERVATIONS_MISSIONWhat this bank should synthesise into durable observations. Replaces the built-in consolidation rules — leave unset to use the server default.-

Customizing observations: when to use what

GoalUse
Default behavior: durable specific facts, no ephemeral stateLeave unset
Change what observations are for this bank (different shape, different purpose)HINDSIGHT_API_OBSERVATIONS_MISSION

HINDSIGHT_API_OBSERVATIONS_MISSION — redefine what this bank synthesises

By default, observations are durable, specific facts synthesized from memories — the kind of knowledge that stays true over time (preferences, skills, relationships, recurring patterns). Ephemeral state is filtered out. Contradictions are tracked with temporal markers.

Set HINDSIGHT_API_OBSERVATIONS_MISSION to replace this definition entirely. Write a plain-language description of what observations should be for your use case. The LLM will use this instead of the default rules when deciding what to create or update. Leave it unset to keep the server default.

When to use observations_mission

Use it when the default durable-knowledge behavior doesn't match your use case. Common scenarios:

  • You want broader event summaries rather than isolated facts
  • You want observations grouped by time period (weekly, monthly)
  • You want a different granularity (one observation per project rather than per fact)
  • You have a domain-specific notion of what's worth remembering

Example: Weekly event summaries

export HINDSIGHT_API_OBSERVATIONS_MISSION="Observations are broad summaries of project events grouped by week. Each observation should capture what happened, what was decided, and what was blocked — not individual facts. Merge related events into cohesive weekly narratives."

Example: Person-centric knowledge

export HINDSIGHT_API_OBSERVATIONS_MISSION="Observations are durable facts about specific named people: their preferences, skills, relationships, and behavioral patterns. Only create observations for facts that are stable over time and tied to a named individual."

Example: Support ticket patterns

export HINDSIGHT_API_OBSERVATIONS_MISSION="Observations are recurring patterns in customer support interactions: common failure modes, frequently requested features, and pain points that appear across multiple tickets."

Reflect

VariableDescriptionDefault
HINDSIGHT_API_REFLECT_MAX_ITERATIONSMax tool call iterations before forcing a response10
HINDSIGHT_API_REFLECT_MAX_CONTEXT_TOKENSMax accumulated context tokens in the reflect loop before forcing final synthesis. Prevents context_length_exceeded errors on large banks. Lower this if your LLM has a context window smaller than 128K.100000
HINDSIGHT_API_REFLECT_MISSIONGlobal reflect mission (identity and reasoning framing). Overridden per bank via config API.-

Disposition

Disposition traits control how the bank reasons during reflect operations. Each trait is on a scale of 1–5. These are hierarchical — they can be overridden per bank via the config API.

VariableDescriptionDefault
HINDSIGHT_API_DISPOSITION_SKEPTICISMHow skeptical vs trusting (1=trusting, 5=skeptical)3
HINDSIGHT_API_DISPOSITION_LITERALISMHow literally to interpret information (1=flexible, 5=literal)3
HINDSIGHT_API_DISPOSITION_EMPATHYHow much to consider emotional context (1=detached, 5=empathetic)3

MCP Server

Configuration for MCP server endpoints.

VariableDescriptionDefault
HINDSIGHT_API_MCP_ENABLEDEnable MCP server at /mcp/{bank_id}/true
HINDSIGHT_API_MCP_ENABLED_TOOLSComma-separated allowlist of MCP tools to expose globally (empty = all tools)-
HINDSIGHT_API_MCP_AUTH_TOKENBearer token for MCP authentication (optional)-
HINDSIGHT_API_MCP_LOCAL_BANK_IDMemory bank ID for local MCPmcp
HINDSIGHT_API_MCP_INSTRUCTIONSAdditional instructions appended to retain/recall tool descriptions-

Tool Access Control:

HINDSIGHT_API_MCP_ENABLED_TOOLS restricts which MCP tools are registered at the server level. This is useful for read-only deployments or limiting surface area:

# Expose only recall (read-only deployment)
export HINDSIGHT_API_MCP_ENABLED_TOOLS=recall

# Expose recall and reflect only
export HINDSIGHT_API_MCP_ENABLED_TOOLS=recall,reflect

Available tool names: retain, recall, reflect, list_banks, create_bank, list_mental_models, get_mental_model, create_mental_model, update_mental_model, delete_mental_model, refresh_mental_model, list_directives, create_directive, delete_directive, list_memories, get_memory, delete_memory, list_documents, get_document, delete_document, list_operations, get_operation, cancel_operation, list_tags, get_bank, get_bank_stats, update_bank, delete_bank, clear_memories.

This can also be overridden per bank via the config API:

# Restrict a specific bank to read-only MCP access
curl -X PATCH http://localhost:8888/v1/default/banks/my-bank/config \
-H "Content-Type: application/json" \
-d '{"updates": {"mcp_enabled_tools": ["recall"]}}'

When a bank-level mcp_enabled_tools is set, tools not in the list return a clear error when invoked (they still appear in the tools list for MCP protocol compatibility).

MCP Authentication:

By default, the MCP endpoint is open. For production deployments, set HINDSIGHT_API_MCP_AUTH_TOKEN to require Bearer token authentication:

export HINDSIGHT_API_MCP_AUTH_TOKEN=your-secret-token

Clients must then include the token in the Authorization header. See MCP Server documentation for details.

Local MCP instructions:

# Example: instruct MCP to also store assistant actions
export HINDSIGHT_API_MCP_INSTRUCTIONS="Also store every action you take, including tool calls and decisions made."

Distributed Workers

Configuration for background task processing. By default, the API processes tasks internally. For high-throughput deployments, run dedicated workers. See Services - Worker Service for details.

VariableDescriptionDefault
HINDSIGHT_API_WORKER_ENABLEDEnable internal worker in API processtrue
HINDSIGHT_API_WORKER_IDUnique worker identifierhostname
HINDSIGHT_API_WORKER_POLL_INTERVAL_MSDatabase polling interval in milliseconds500
HINDSIGHT_API_WORKER_MAX_RETRIESMax retries before marking task failed3
HINDSIGHT_API_WORKER_HTTP_PORTHTTP port for worker metrics/health (worker CLI only)8889
HINDSIGHT_API_WORKER_MAX_SLOTSMaximum concurrent tasks per worker10
HINDSIGHT_API_WORKER_CONSOLIDATION_MAX_SLOTSMaximum concurrent consolidation tasks per worker2

Performance Optimization

VariableDescriptionDefault
HINDSIGHT_API_SKIP_LLM_VERIFICATIONSkip LLM connection check on startupfalse
HINDSIGHT_API_LAZY_RERANKERLazy-load reranker model (faster startup)false

Programmatic Configuration

You can also configure the API programmatically using MemoryEngine.from_env():

from hindsight_api import MemoryEngine

memory = MemoryEngine.from_env()
await memory.initialize()

Observability & Tracing

Hindsight provides OpenTelemetry-based observability for LLM calls, conforming to GenAI semantic conventions.

OpenTelemetry Tracing

VariableDescriptionDefault
HINDSIGHT_API_OTEL_TRACES_ENABLEDEnable distributed tracing for LLM callsfalse
HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINTOTLP endpoint URL (e.g., Grafana LGTM, Langfuse, etc.)-
HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERSHeaders for OTLP exporter (format: "key1=value1,key2=value2")-
HINDSIGHT_API_OTEL_SERVICE_NAMEService name for traceshindsight-api
HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENTDeployment environment name (e.g., development, staging, production)development

Features:

  • Full prompts and completions recorded as events
  • Token usage tracking (input/output)
  • Model and provider information
  • Error tracking with finish reasons
  • Conforms to OpenTelemetry GenAI semantic conventions v1.37+

OTLP-Compatible Backends:

The tracing implementation uses standard OTLP HTTP protocol, so it works with any OTLP-compatible backend:

  • Grafana LGTM (Recommended for local dev): All-in-one stack with Tempo traces, Loki logs, Mimir metrics, and Grafana UI
  • Langfuse: LLM-focused observability and analytics
  • OpenLIT: Built-in LLM dashboards, cost tracking
  • DataDog, New Relic, Honeycomb: Commercial platforms

Example Configuration:

# Enable tracing
export HINDSIGHT_API_OTEL_TRACES_ENABLED=true

# Configure endpoint (example: OpenLIT Cloud)
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.openlit.io
export HINDSIGHT_API_OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer olit-xxx"

# Optional: Custom service name and environment
export HINDSIGHT_API_OTEL_SERVICE_NAME=hindsight-production
export HINDSIGHT_API_OTEL_DEPLOYMENT_ENVIRONMENT=production

Local Development:

For local development, we recommend the Grafana LGTM stack which provides traces, metrics, and logs in a single container:

./scripts/dev/start-grafana.sh

See scripts/dev/grafana/README.md for detailed setup instructions.

Other options: See scripts/dev/openlit/README.md for OpenLIT or scripts/dev/jaeger/README.md for standalone Jaeger.

Metrics

Hindsight exposes Prometheus metrics at the /metrics endpoint, including:

  • LLM call duration and token usage
  • Operation duration (retain/recall/reflect)
  • HTTP request metrics
  • Database connection pool metrics

Metrics are always enabled and available at http://localhost:8888/metrics.


Control Plane

The Control Plane is the web UI for managing memory banks.

VariableDescriptionDefault
HINDSIGHT_CP_DATAPLANE_API_URLURL of the API servicehttp://localhost:8888
NEXT_PUBLIC_BASE_PATHBase path for Control Plane UI when behind reverse proxy (e.g., /hindsight)"" (root)
# Point Control Plane to a remote API service
export HINDSIGHT_CP_DATAPLANE_API_URL=http://api.example.com:8888

Hierarchical Configuration

Hindsight supports per-bank configuration overrides through a hierarchical system: Global (env vars) → Tenant → Bank.

Type-Safe Config Access

To prevent accidentally using global defaults when bank-specific overrides exist, Hindsight enforces type-safe config access:

In Application Code:

from hindsight_api.config import get_config

# ✅ Access static (infrastructure) fields
config = get_config()
host = config.host # OK - static field
port = config.port # OK - static field

# ❌ Attempting to access bank-configurable fields raises an error
chunk_size = config.retain_chunk_size # ConfigFieldAccessError!

Error Message:

ConfigFieldAccessError: Field 'retain_chunk_size' is bank-configurable and cannot
be accessed from global config. Use ConfigResolver.resolve_full_config(bank_id, context)
to get bank-specific config.

For Bank-Specific Config:

# Internal code that needs bank-specific settings
from hindsight_api.config_resolver import ConfigResolver

# Resolve full config for a specific bank
config = await config_resolver.resolve_full_config(bank_id, request_context)
chunk_size = config.retain_chunk_size # ✅ Uses bank-specific value

This design prevents bugs where global defaults are used instead of bank overrides, making it impossible to make this mistake at compile/development time.

Security Model

Configuration fields are categorized for security:

  1. Configurable Fields - Safe behavioral settings that can be customized per-bank:

    • Retention: retain_chunk_size, retain_extraction_mode, retain_mission, retain_custom_instructions
    • Observations: enable_observations, observations_mission
    • MCP access control: mcp_enabled_tools
  2. Credential Fields - NEVER exposed or configurable via API:

    • API keys: *_api_key (all LLM API keys)
    • Infrastructure: *_base_url (all base URLs)
  3. Static Fields - Server-level only, cannot be overridden:

    • Infrastructure: database_url, port, host, worker_count
    • Provider/Model selection: llm_provider, llm_model (requires presets - not yet implemented)
    • Performance tuning: llm_max_concurrent, llm_timeout, retrieval settings, optimization flags

Enabling the API

VariableDescriptionDefault
HINDSIGHT_API_ENABLE_BANK_CONFIG_APIEnable per-bank config APItrue

API Endpoints

  • GET /v1/default/banks/{bank_id}/config - View resolved config (filtered by permissions)
  • PATCH /v1/default/banks/{bank_id}/config - Update bank overrides (only allowed fields)
  • DELETE /v1/default/banks/{bank_id}/config - Reset to defaults

Permission System

Tenant extensions can control which fields banks are allowed to modify via get_allowed_config_fields():

class CustomTenantExtension(TenantExtension):
async def get_allowed_config_fields(self, context, bank_id):
# Option 1: Allow all configurable fields
return None

# Option 2: Allow specific fields only
return {"retain_chunk_size", "retain_custom_instructions"}

# Option 3: Read-only (no modifications)
return set()

Examples

# Update retention settings for a bank
curl -X PATCH http://localhost:8888/v1/default/banks/my-bank/config \
-H "Content-Type: application/json" \
-d '{
"updates": {
"retain_chunk_size": 4000,
"retain_extraction_mode": "custom",
"retain_custom_instructions": "Focus on technical details and implementation specifics"
}
}'

# Note: retain_extraction_mode must be "custom" to use retain_custom_instructions

# View resolved config (respects permissions)
curl http://localhost:8888/v1/default/banks/my-bank/config

# Reset to defaults
curl -X DELETE http://localhost:8888/v1/default/banks/my-bank/config

Security Notes:

  • Credentials (API keys, base URLs) are never returned in responses
  • Only configurable fields can be modified
  • Responses are filtered by tenant permissions
  • Attempting to set credentials returns 400 error

Reverse Proxy / Subpath Deployment

To deploy Hindsight under a subpath (e.g., example.com/hindsight/):

  1. Set both environment variables to the same path:

    HINDSIGHT_API_BASE_PATH=/hindsight
    NEXT_PUBLIC_BASE_PATH=/hindsight
  2. Configure your reverse proxy to:

    • Forward /hindsight/* requests to Hindsight
    • Preserve the full path in forwarded requests
    • Set appropriate proxy headers (X-Forwarded-Proto, X-Forwarded-For)

Example: Nginx Configuration

location /hindsight/ {
proxy_pass http://localhost:8888/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}

Example: Traefik Configuration

http:
routers:
hindsight:
rule: "PathPrefix(`/hindsight`)"
service: hindsight
middlewares:
- hindsight-stripprefix

middlewares:
hindsight-stripprefix:
stripPrefix:
prefixes:
- "/hindsight"

services:
hindsight:
loadBalancer:
servers:
- url: "http://localhost:8888"

Important Notes:

  • The base path must start with / and should NOT end with /
  • Both API and Control Plane should use the same base path
  • After setting environment variables, restart both services
  • OpenAPI docs will be available at <base-path>/docs (e.g., /hindsight/docs)

Complete Examples:

See docker/compose-examples/ directory for:

  • Nginx configuration files (simple.conf, api-and-control-plane.conf)
  • Docker Compose setups (docker-compose.yml, reverse-proxy-only.yml)
  • Traefik and other reverse proxy examples
  • Full deployment documentation

Example .env File

# API Service
HINDSIGHT_API_DATABASE_URL=postgresql://hindsight:hindsight_dev@localhost:5432/hindsight
# HINDSIGHT_API_DATABASE_SCHEMA=public # optional, defaults to 'public'
HINDSIGHT_API_LLM_PROVIDER=groq
HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx

# Authentication (optional, recommended for production)
# HINDSIGHT_API_TENANT_EXTENSION=hindsight_api.extensions.builtin.tenant:ApiKeyTenantExtension
# HINDSIGHT_API_TENANT_API_KEY=your-secret-api-key

# File storage (optional, defaults to PostgreSQL native storage)
# HINDSIGHT_API_FILE_STORAGE_TYPE=s3
# HINDSIGHT_API_FILE_STORAGE_S3_BUCKET=my-hindsight-files
# HINDSIGHT_API_FILE_STORAGE_S3_REGION=us-east-1
# HINDSIGHT_API_FILE_STORAGE_S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
# HINDSIGHT_API_FILE_STORAGE_S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Control Plane
HINDSIGHT_CP_DATAPLANE_API_URL=http://localhost:8888

For configuration issues not covered here, please open an issue on GitHub.