Skip to main content

Recall: How Hindsight Retrieves Memories

When you call recall(), Hindsight uses multiple search strategies in parallel to find the most relevant memories, regardless of how you phrase your query.


The Challenge of Memory Recall

Different queries need different search approaches:

  • "Alice works at Google" → needs exact name matching
  • "Where does Alice work?" → needs semantic understanding
  • "What did Alice do last spring?" → needs temporal reasoning
  • "Why did Alice leave?" → needs causal relationship tracing

No single search method handles all these well. Hindsight solves this with TEMPR — four complementary strategies that run in parallel.


Four Search Strategies

What it does: Understands the meaning behind words, not just the words themselves.

Best for:

  • Conceptual matches: "Alice's job" → "Alice works as a software engineer"
  • Paraphrasing: "Bob's expertise" → "Bob specializes in machine learning"
  • Synonyms: "meeting" matches "conference", "discussion", "gathering"

Why it matters: You can ask questions naturally without matching exact keywords.


What it does: Finds exact terms and names, even when they're spelled uniquely.

Best for:

  • Proper nouns: "Google", "Alice Chen", "MIT"
  • Technical terms: "PostgreSQL", "HNSW", "TensorFlow"
  • Unique identifiers: URLs, product names, specific phrases

Why it matters: Ensures you never miss results that mention specific names or terms, even if they're semantically distant from your query.


Graph Traversal

What it does: Follows connections between entities to find indirectly related information.

Best for:

  • Indirect relationships: "What does Alice do?" → Alice → Google → Google's products
  • Entity exploration: "Bob's colleagues" → Bob → co-workers → shared projects
  • Multi-hop reasoning: "Alice's team's achievements"

Why it matters: Retrieves facts that aren't semantically or lexically similar but are structurally connected through the knowledge graph.

Example: Even if Alice and her manager are never mentioned together, graph traversal can find the manager through shared projects or team relationships.


What it does: Understands time expressions and filters by when events occurred.

Best for:

  • Historical queries: "What did Alice do in 2023?"
  • Time ranges: "What happened last spring?"
  • Relative time: "What did Bob work on last year?"
  • Before/after: "What happened before Alice joined Google?"

How it works: Combines semantic understanding with time filtering to find events within specific periods.

Why it matters: Enables precise historical queries without losing old information.


Result Fusion

After the four strategies run, results are fused together:

  • Memories appearing in multiple strategies rank higher (consensus)
  • Rank matters more than score (robust across different scoring systems)
  • Final results are re-ranked using a neural model that considers query-memory interaction

Why fusion matters: A fact that's both semantically similar AND mentions the right entity will rank higher than one that's only semantically similar.


Why Multiple Strategies?

Consider the query: "What did Alice think about Python last spring?"

  • Semantic finds facts about Alice's opinions on programming
  • Keyword ensures "Python" is actually mentioned
  • Graph connects Alice → opinions → programming languages
  • Temporal filters to "last spring" timeframe

The fusion of all four gives you exactly what you're looking for, even though no single strategy would suffice.


Token Budget Management

Hindsight is built for AI agents, not humans. Traditional search systems return "top-k" results, but agents don't think in terms of result counts—they think in tokens. An agent's context window is measured in tokens, and that's exactly how Hindsight measures results.

How it works:

  • Top-ranked memories selected first
  • Stops when token budget is exhausted
  • You specify context budget, Hindsight fills it with the most relevant memories

Parameters you control:

  • max_tokens: How much memory content to return (default: 4096 tokens)
  • budget: Search depth level (low, mid, high)
  • types: Filter by world, experience, opinion, or all
  • tags: Filter memories by visibility tags
  • tags_match: How to match tags (see Recall API for all options)

Expanding Context: Chunks and Entity Observations

Memories are distilled facts—concise but sometimes missing nuance. When your agent needs deeper context, you can optionally retrieve the source material and related knowledge:

OptionParametersWhen to Use
Chunksinclude_chunks, max_chunk_tokensNeed exact quotes, original phrasing, or surrounding context
Entity Observationsinclude_entities, max_entity_tokensNeed broader knowledge about people/things mentioned in results

Chunks return the raw text that generated each memory—useful when the distilled fact loses important nuance:

Memory: "Alice prefers Python over JavaScript"
Chunk: "Alice mentioned she prefers Python over JavaScript, mainly because
of its data science ecosystem, though she admits JS is better for
frontend work and she's been learning TypeScript lately."

Entity Observations pull in related facts about entities mentioned in your results. If a memory mentions "Alice", you automatically get her role, skills, and other relevant context—without needing a separate query:

Query: "What programming languages does Alice like?"
Memory: "Alice prefers Python over JavaScript"
Entity Observations (Alice):
- "Alice is a senior data scientist at Google"
- "Alice specializes in machine learning"
- "Alice has been learning TypeScript"

When to include them:

  • Chunks: When generating responses that need verbatim quotes or when context matters (e.g., "What exactly did Alice say about the project?")
  • Entity Observations: When building complete profiles or when the conversation might reference multiple aspects of an entity (e.g., "Tell me about Alice's work")

Each has its own token budget, giving you precise control over total context size.


Tuning Recall: Quality vs Latency

Different use cases require different trade-offs between recall quality and response speed. Two parameters control this:

Budget: Search Depth

Controls how thoroughly Hindsight explores the memory bank—affecting graph traversal depth, candidate pool size, and cross-encoder re-ranking:

BudgetBest ForTrade-off
lowQuick lookups, simple queriesFast, may miss indirect connections
midMost queries, balancedGood coverage, reasonable speed
highComplex queries requiring deep explorationThorough, slower

Example: "What did Alice's manager's team work on?" benefits from high budget to traverse multiple hops (Alice → manager → team → projects) and evaluate more candidates.

Max Tokens: Context Window Size

Controls how much memory content to return:

Max Tokens~Pages of TextBest ForTrade-off
2048~2 pagesFocused answers, fast LLMFewer memories, faster
4096 (default)~4 pagesBalanced contextGood coverage, standard
8192~8 pagesComprehensive contextMore memories, slower LLM

Example: "Summarize everything about Alice" benefits from higher max_tokens to include more facts.

Two Independent Dimensions

Budget and max_tokens control different aspects of recall:

ParameterWhat it controlsLatency impactExample
BudgetHow thoroughly to explore memoriesSearch timeHigh budget finds Alice → manager → team → projects
Max TokensHow much context to returnLLM processing timeHigh tokens returns more memories to the agent

They're independent. Common combinations:

BudgetMax TokensUse Case
highlowDeep search, return only the best results
lowhighQuick search, return everything found
highhighComprehensive research queries
lowlowFast chatbot responses
Use CaseBudgetMax TokensWhy
Chatbot replieslow2048Fast responses, focused context
Document Q&Amid4096Balanced coverage and speed
Research querieshigh8192Comprehensive, multi-hop reasoning
Real-time searchlow2048Minimize latency

Graph Retrieval Algorithms

Hindsight supports multiple graph traversal algorithms. The default (link_expansion) is optimized for fast retrieval with target latency under 100ms.

See Configuration → Retrieval for available algorithms and how to configure them.


Next Steps

  • Retain — How memories are stored with rich context
  • Reflect — How disposition influences reasoning
  • Recall API — Code examples, parameters, and tag filtering