Skip to main content

Models

Hindsight uses several machine learning models for different tasks.

Overview

Model TypePurposeDefaultConfigurable
EmbeddingVector representations for semantic searchBAAI/bge-small-en-v1.5Yes
Cross-EncoderReranking search resultscross-encoder/ms-marco-MiniLM-L-6-v2Yes
LLMFact extraction, reasoning, generationProvider-specificYes

All local models (embedding, cross-encoder) are automatically downloaded from HuggingFace on first run.


Embedding Model

Converts text into dense vector representations for semantic similarity search.

Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130MB)

Alternatives:

ModelDimensionsUse Case
BAAI/bge-small-en-v1.5384Default, fast, good quality
BAAI/bge-base-en-v1.5768Higher accuracy, slower
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384Multilingual (50+ languages)
warning

All embedding models must produce 384-dimensional vectors to match the database schema.

Configuration:

# Local provider (default)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=local
export HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL=BAAI/bge-small-en-v1.5

# TEI provider (remote)
export HINDSIGHT_API_EMBEDDINGS_PROVIDER=tei
export HINDSIGHT_API_EMBEDDINGS_TEI_URL=http://localhost:8080

Cross-Encoder (Reranker)

Reranks initial search results to improve precision.

Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85MB)

Alternatives:

ModelUse Case
cross-encoder/ms-marco-MiniLM-L-6-v2Default, fast
cross-encoder/ms-marco-MiniLM-L-12-v2Higher accuracy
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1Multilingual

Configuration:

# Local provider (default)
export HINDSIGHT_API_RERANKER_PROVIDER=local
export HINDSIGHT_API_RERANKER_LOCAL_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# TEI provider (remote)
export HINDSIGHT_API_RERANKER_PROVIDER=tei
export HINDSIGHT_API_RERANKER_TEI_URL=http://localhost:8081

LLM

Used for fact extraction, entity resolution, opinion generation, and answer synthesis.

Supported providers: Groq, OpenAI, Gemini, Ollama

ProviderRecommended ModelBest For
Groqopenai/gpt-oss-20bFast inference, high throughput (recommended)
OpenAIgpt-4oGood quality
Geminigemini-2.0-flashGood quality, cost effective
Ollamallama3.1Local deployment, privacy

Configuration:

# Groq (recommended)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Ollama (local)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3.1

Note: The LLM is the primary bottleneck for retain operations. See Performance for optimization strategies.