Skip to main content

Models

Hindsight uses several machine learning models for different tasks.

Overview

Model TypePurposeDefaultConfigurable
LLMFact extraction, reasoning, generationProvider-specificYes
EmbeddingVector representations for semantic searchBAAI/bge-small-en-v1.5Yes
Cross-EncoderReranking search resultscross-encoder/ms-marco-MiniLM-L-6-v2Yes

All local models (embedding, cross-encoder) are automatically downloaded from HuggingFace on first run.


LLM

Used for fact extraction, entity resolution, mental model consolidation, and answer synthesis.

Supported providers: OpenAI, Anthropic, Gemini, Groq, Ollama, LM Studio, and any OpenAI-compatible API

OpenAI-Compatible Providers

Hindsight works with any provider that exposes an OpenAI-compatible API (e.g., Azure OpenAI). Simply set HINDSIGHT_API_LLM_PROVIDER=openai and configure HINDSIGHT_API_LLM_BASE_URL to point to your provider's endpoint.

See Configuration for setup examples.

Benchmarks

Not sure which model to use? The Model Leaderboard benchmarks models across accuracy, speed, cost, and reliability for retain, reflect, and observation consolidation so you can pick the right trade-off for your use case.

Model Leaderboard

Tested Models

The following models have been tested and verified to work correctly with Hindsight:

ProviderModel
OpenAIgpt-5.2
OpenAIgpt-5
OpenAIgpt-5-mini
OpenAIgpt-5-nano
OpenAIgpt-4.1-mini
OpenAIgpt-4.1-nano
OpenAIgpt-4o-mini
Anthropicclaude-sonnet-4-20250514
Anthropicclaude-3-5-sonnet-20241022
Geminigemini-3-pro-preview
Geminigemini-2.5-flash
Geminigemini-2.5-flash-lite
Groqopenai/gpt-oss-120b
Groqopenai/gpt-oss-20b

Provider Default Models

Each provider has a recommended default model that's used when HINDSIGHT_API_LLM_MODEL is not explicitly set. This makes configuration simpler - just specify the provider and get a sensible default:

ProviderDefault Model
openaigpt-4o-mini
anthropicclaude-haiku-4-5-20251001
geminigemini-2.5-flash
groqopenai/gpt-oss-120b
ollamagemma3:12b
lmstudiolocal-model
vertexaigemini-2.0-flash-001
openai-codexgpt-5.2-codex
claude-codeclaude-sonnet-4-5-20250929

Example: Setting just the provider uses its default model:

# Uses claude-haiku-4-5-20251001 automatically
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx

You can override the default by explicitly setting HINDSIGHT_API_LLM_MODEL:

# Override to use Sonnet instead
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-5-20250929

This also applies to per-operation overrides:

# Global: OpenAI gpt-4o-mini (default)
export HINDSIGHT_API_LLM_PROVIDER=openai

# Retain: Anthropic claude-haiku-4-5-20251001 (default)
export HINDSIGHT_API_RETAIN_LLM_PROVIDER=anthropic

Using Other Models

Other LLM models not listed above may work with Hindsight, but they must support at least 65,000 output tokens to ensure reliable fact extraction. If you need support for a specific model that doesn't meet this requirement, please open an issue to request an exception.

Models with Limited Output Tokens

If your model only supports 32k or fewer output tokens (e.g., some older models), you can reduce the retain completion token limit:

# For models that support 32k output tokens
export HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS=32000

# For models that support 16k output tokens
export HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS=16000

Important: HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS must be greater than HINDSIGHT_API_RETAIN_CHUNK_SIZE (default: 3000). The system will validate this on startup and provide an error message if the configuration is invalid.

Configuration

# Groq (recommended)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Ollama (local)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3

# LM Studio (local)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model

# Vertex AI (Google Cloud)
export HINDSIGHT_API_LLM_PROVIDER=vertexai
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash-001
export HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID=your-gcp-project-id
# Optional: region (default: us-central1)
# export HINDSIGHT_API_LLM_VERTEXAI_REGION=us-central1
# Optional: service account key (otherwise uses ADC)
# export HINDSIGHT_API_LLM_VERTEXAI_SERVICE_ACCOUNT_KEY=/path/to/key.json

Note: The LLM is the primary bottleneck for retain operations. See Performance for optimization strategies.


OpenAI Codex Setup (ChatGPT Plus/Pro)

Use your ChatGPT Plus or Pro subscription for Hindsight without separate OpenAI Platform API costs.

Prerequisites:

  • Active ChatGPT Plus or Pro subscription
  • Node.js/npm installed (for Codex CLI)

Setup Steps:

  1. Install Codex CLI:

    npm install -g @openai/codex
  2. Login with ChatGPT credentials:

    codex auth login

    This opens a browser window to authenticate with your ChatGPT account and saves OAuth tokens to ~/.codex/auth.json.

  3. Verify authentication:

    ls ~/.codex/auth.json  # Should show the auth file exists
  4. Configure Hindsight:

    export HINDSIGHT_API_LLM_PROVIDER=openai-codex
    # export HINDSIGHT_API_LLM_MODEL=gpt-5.1-codex # defaults to gpt-5.2-codex
    # No API key needed - reads from ~/.codex/auth.json automatically
  5. Start Hindsight:

    hindsight-api

You can use any model supported by OpenAI Codex CLI

Important Notes:

  • OAuth tokens are stored in ~/.codex/auth.json
  • Tokens refresh automatically when needed
  • Usage is billed to your ChatGPT subscription (not separate API costs)
  • For personal development use only (see ChatGPT Terms of Service)

Claude Code Setup (Claude Pro/Max)

Use your Claude Pro or Max subscription for Hindsight without separate Anthropic API costs.

Terms of Service Notice

This integration uses the Claude Agent SDK with your personal Claude Pro/Max subscription credentials. You must be logged into Claude Code on your own machine before using this provider.

Please be aware:

  • Anthropic's Agent SDK documentation states that third-party developers should not offer claude.ai login or rate limits for their products. Hindsight does not perform any login on your behalf — it uses credentials you've already authenticated via claude auth login.
  • In January 2026, Anthropic enforced restrictions against third-party tools using Claude subscription OAuth tokens. Those restrictions targeted tools that spoofed the Claude Code client identity — Hindsight uses the official Claude Agent SDK instead.
  • This provider is intended for local, personal development use only. Do not use it in production deployments or shared environments.
  • Anthropic's terms may change. If you want guaranteed compliance, use the anthropic provider with an API key instead.
  • Usage counts against your Claude Pro/Max subscription limits.

For production or team use, we recommend using HINDSIGHT_API_LLM_PROVIDER=anthropic with an API key from the Anthropic Console.

Prerequisites:

  • Active Claude Pro or Max subscription
  • Claude Code CLI installed

Setup Steps:

  1. Install Claude Code CLI:

    npm install -g @anthropics/claude-code
    # Or via Homebrew
    brew install anthropics/claude-code/claude-code
  2. Login with Claude credentials:

    claude auth login

    This opens a browser window to authenticate with your Claude account. Authentication is automatically managed by the Claude Agent SDK.

  3. Verify authentication:

    claude --version
    # Should show version without errors
  4. Configure Hindsight:

    export HINDSIGHT_API_LLM_PROVIDER=claude-code
    # No API key needed - uses claude auth login credentials
  5. Start Hindsight:

    hindsight-api

You can use any model supported by Claude Code CLI.

Important Notes:

  • Authentication handled by Claude Agent SDK (uses bundled CLI)
  • Credentials managed securely by Claude Code
  • Usage billed to your Claude subscription (not separate API costs)
  • For personal development use only (see Claude Terms of Service)

Vertex AI Setup (Google Cloud)

Google Cloud's Vertex AI provides access to Gemini models via the native Google GenAI SDK.

Prerequisites:

  • GCP project with Vertex AI API enabled
  • IAM role roles/aiplatform.user for your credentials

Environment Variables:

VariableDescriptionRequired
HINDSIGHT_API_LLM_VERTEXAI_PROJECT_IDYour GCP project IDYes
HINDSIGHT_API_LLM_VERTEXAI_REGIONGCP region (e.g., us-central1)No (default: us-central1)
HINDSIGHT_API_LLM_VERTEXAI_SERVICE_ACCOUNT_KEYPath to service account JSON key fileNo (uses ADC if not set)

Authentication Methods:

  1. Application Default Credentials (ADC) - Recommended for development

    # Setup ADC
    gcloud auth application-default login

    # Configure Hindsight
    export HINDSIGHT_API_LLM_PROVIDER=vertexai
    export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash-001
    export HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID=your-project-id
  2. Service Account Key - Recommended for production

    # Create service account and download key
    gcloud iam service-accounts create hindsight-api
    gcloud projects add-iam-policy-binding your-project-id \
    --member="serviceAccount:hindsight-api@your-project-id.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"
    gcloud iam service-accounts keys create key.json \
    --iam-account=hindsight-api@your-project-id.iam.gserviceaccount.com

    # Configure Hindsight
    export HINDSIGHT_API_LLM_PROVIDER=vertexai
    export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash-001
    export HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID=your-project-id
    export HINDSIGHT_API_LLM_VERTEXAI_SERVICE_ACCOUNT_KEY=/path/to/key.json

Notes:

  • Model names can optionally include the google/ prefix (e.g., google/gemini-2.0-flash-001) — it will be stripped automatically
  • The native SDK handles token refresh automatically
  • Uses service account credentials if provided, otherwise falls back to ADC

Embedding Model

Converts text into dense vector representations for semantic similarity search.

Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130MB)

Supported Providers

ProviderDescriptionBest For
localSentenceTransformers (default)Development, low latency
openaiOpenAI embeddings APIProduction, high quality
cohereCohere embeddings APIProduction, multilingual
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
litellmLiteLLM proxy (unified gateway)Multi-provider setups

Local Models

ModelDimensionsUse Case
BAAI/bge-small-en-v1.5384Default, fast, good quality
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384Multilingual (50+ languages)

OpenAI Models

ModelDimensionsUse Case
text-embedding-3-small1536Default OpenAI, cost-effective
text-embedding-3-large3072Higher quality, more expensive
text-embedding-ada-0021536Legacy model

Cohere Models

ModelDimensionsUse Case
embed-english-v3.01024English text
embed-multilingual-v3.01024100+ languages
Embedding Dimensions

Hindsight automatically detects the embedding dimension at startup and adjusts the database schema. Once memories are stored, you cannot change dimensions without losing data.

Configuration Examples:

# Local provider (default)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# OpenAI
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small

# Cohere
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0

# TEI (self-hosted)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=tei
export HINDSIGHT_API_EMBEDDINGS_TEI_URL=http://localhost:8080

# LiteLLM proxy
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_EMBEDDINGS_LITELLM_MODEL=text-embedding-3-small

See Configuration for all options including Azure OpenAI and custom endpoints.


Cross-Encoder (Reranker)

Reranks initial search results to improve precision.

Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85MB)

Supported Providers

ProviderDescriptionBest For
localSentenceTransformers CrossEncoder (default)Development, low latency
cohereCohere rerank APIProduction, high quality
zeroentropyZeroEntropy rerank API (zerank-2)Production, state-of-the-art accuracy
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
flashrankFlashRank (lightweight, fast)Resource-constrained environments
litellmLiteLLM proxy (unified gateway)Multi-provider setups
litellm-sdkLiteLLM SDK (direct API, no proxy)Multi-provider, simpler setup
rrfRRF-only (no neural reranking)Testing, minimal resources

Local Models

ModelUse Case
cross-encoder/ms-marco-MiniLM-L-6-v2Default, fast
cross-encoder/ms-marco-MiniLM-L-12-v2Higher accuracy
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1Multilingual

Cohere Models

ModelUse Case
rerank-english-v3.0English text
rerank-multilingual-v3.0100+ languages

ZeroEntropy Models

ModelUse Case
zerank-2Flagship multilingual reranker (default)
zerank-2-smallFaster, lighter variant

LiteLLM Supported Providers

LiteLLM supports multiple reranking providers via the /rerank endpoint:

ProviderModel Example
Coherecohere/rerank-english-v3.0
Together AItogether_ai/...
Voyage AIvoyage/rerank-2
Jina AIjina_ai/...
AWS Bedrockbedrock/...

Configuration Examples:

# Local provider (default)
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Cohere
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0

# ZeroEntropy (state-of-the-art accuracy)
export HINDSIGHT_API_RERANKER_PROVIDER=zeroentropy
export HINDSIGHT_API_RERANKER_ZEROENTROPY_API_KEY=your-api-key
export HINDSIGHT_API_RERANKER_ZEROENTROPY_MODEL=zerank-2 # default, can omit

# TEI (self-hosted)
export HINDSIGHT_API_RERANKER_PROVIDER=tei
export HINDSIGHT_API_RERANKER_TEI_URL=http://localhost:8081

# FlashRank (lightweight)
export HINDSIGHT_API_RERANKER_PROVIDER=flashrank

# LiteLLM proxy
export HINDSIGHT_API_RERANKER_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_RERANKER_LITELLM_MODEL=cohere/rerank-english-v3.0

# RRF-only (no neural reranking)
export HINDSIGHT_API_RERANKER_PROVIDER=rrf

See Configuration for all options including Azure-hosted endpoints and batch settings.