Skip to main content

Models

Hindsight uses several machine learning models for different tasks.

Overview

Model TypePurposeDefaultConfigurable
LLMFact extraction, reasoning, generationProvider-specificYes
EmbeddingVector representations for semantic searchBAAI/bge-small-en-v1.5Yes
Cross-EncoderReranking search resultscross-encoder/ms-marco-MiniLM-L-6-v2Yes

All local models (embedding, cross-encoder) are automatically downloaded from HuggingFace on first run.


LLM

Used for fact extraction, entity resolution, opinion generation, and answer synthesis.

Supported providers: OpenAI, Anthropic, Gemini, Groq, Ollama, LM Studio, and any OpenAI-compatible API

OpenAI-Compatible Providers

Hindsight works with any provider that exposes an OpenAI-compatible API (e.g., Azure OpenAI). Simply set HINDSIGHT_API_LLM_PROVIDER=openai and configure HINDSIGHT_API_LLM_BASE_URL to point to your provider's endpoint.

See Configuration for setup examples.

Tested Models

The following models have been tested and verified to work correctly with Hindsight:

ProviderModel
OpenAIgpt-5.2
OpenAIgpt-5
OpenAIgpt-5-mini
OpenAIgpt-5-nano
OpenAIgpt-4.1-mini
OpenAIgpt-4.1-nano
OpenAIgpt-4o-mini
Anthropicclaude-sonnet-4-20250514
Anthropicclaude-3-5-sonnet-20241022
Geminigemini-3-pro-preview
Geminigemini-2.5-flash
Geminigemini-2.5-flash-lite
Groqopenai/gpt-oss-120b
Groqopenai/gpt-oss-20b

Using Other Models

Other LLM models not listed above may work with Hindsight, but they must support at least 65,000 output tokens to ensure reliable fact extraction. If you need support for a specific model that doesn't meet this requirement, please open an issue to request an exception.

Configuration

# Groq (recommended)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Ollama (local)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3

# LM Studio (local)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model

Note: The LLM is the primary bottleneck for retain operations. See Performance for optimization strategies.


Embedding Model

Converts text into dense vector representations for semantic similarity search.

Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130MB)

Supported Providers

ProviderDescriptionBest For
localSentenceTransformers (default)Development, low latency
openaiOpenAI embeddings APIProduction, high quality
cohereCohere embeddings APIProduction, multilingual
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
litellmLiteLLM proxy (unified gateway)Multi-provider setups

Local Models

ModelDimensionsUse Case
BAAI/bge-small-en-v1.5384Default, fast, good quality
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384Multilingual (50+ languages)

OpenAI Models

ModelDimensionsUse Case
text-embedding-3-small1536Default OpenAI, cost-effective
text-embedding-3-large3072Higher quality, more expensive
text-embedding-ada-0021536Legacy model

Cohere Models

ModelDimensionsUse Case
embed-english-v3.01024English text
embed-multilingual-v3.01024100+ languages
Embedding Dimensions

Hindsight automatically detects the embedding dimension at startup and adjusts the database schema. Once memories are stored, you cannot change dimensions without losing data.

Configuration Examples:

# Local provider (default)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# OpenAI
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=openai
export HINDSIGHT_API_EMBEDDINGS_OPENAI_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_EMBEDDINGS_OPENAI_MODEL=text-embedding-3-small

# Cohere
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_EMBEDDINGS_COHERE_MODEL=embed-english-v3.0

# TEI (self-hosted)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=tei
export HINDSIGHT_API_EMBEDDINGS_TEI_URL=http://localhost:8080

# LiteLLM proxy
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_EMBEDDINGS_LITELLM_MODEL=text-embedding-3-small

See Configuration for all options including Azure OpenAI and custom endpoints.


Cross-Encoder (Reranker)

Reranks initial search results to improve precision.

Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85MB)

Supported Providers

ProviderDescriptionBest For
localSentenceTransformers CrossEncoder (default)Development, low latency
cohereCohere rerank APIProduction, high quality
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
flashrankFlashRank (lightweight, fast)Resource-constrained environments
litellmLiteLLM proxy (unified gateway)Multi-provider setups
rrfRRF-only (no neural reranking)Testing, minimal resources

Local Models

ModelUse Case
cross-encoder/ms-marco-MiniLM-L-6-v2Default, fast
cross-encoder/ms-marco-MiniLM-L-12-v2Higher accuracy
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1Multilingual

Cohere Models

ModelUse Case
rerank-english-v3.0English text
rerank-multilingual-v3.0100+ languages

LiteLLM Supported Providers

LiteLLM supports multiple reranking providers via the /rerank endpoint:

ProviderModel Example
Coherecohere/rerank-english-v3.0
Together AItogether_ai/...
Voyage AIvoyage/rerank-2
Jina AIjina_ai/...
AWS Bedrockbedrock/...

Configuration Examples:

# Local provider (default)
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Cohere
export HINDSIGHT_API_RERANKER_PROVIDER=cohere
export HINDSIGHT_API_COHERE_API_KEY=your-api-key
export HINDSIGHT_API_RERANKER_COHERE_MODEL=rerank-english-v3.0

# TEI (self-hosted)
export HINDSIGHT_API_RERANKER_PROVIDER=tei
export HINDSIGHT_API_RERANKER_TEI_URL=http://localhost:8081

# FlashRank (lightweight)
export HINDSIGHT_API_RERANKER_PROVIDER=flashrank

# LiteLLM proxy
export HINDSIGHT_API_RERANKER_PROVIDER=litellm
export HINDSIGHT_API_LITELLM_API_BASE=http://localhost:4000
export HINDSIGHT_API_RERANKER_LITELLM_MODEL=cohere/rerank-english-v3.0

# RRF-only (no neural reranking)
export HINDSIGHT_API_RERANKER_PROVIDER=rrf

See Configuration for all options including Azure-hosted endpoints and batch settings.