Best Practices
Practical guidance for agents and developers integrating Hindsight memory into production systems.
Contents
- Core Concepts — Memory banks, taxonomy, memory types
- Bank Configuration — Missions, dispositions, entity labels
- Retaining Data — Content format, context, document_id, tags, observation scopes
- Recalling Memories — Budget, tag filtering, include options, query_timestamp
- Reflecting — Recall vs reflect, response_schema, auditing
- Mental Models — When to create, tag strategy, refresh
- Anti-patterns
Core Concepts
Memory Banks
A memory bank is an isolated memory store — the unit of separation between users, agents, or contexts. All operations (retain, recall, reflect) target a single bank. Banks do not share data.
- One bank per user is the most common pattern for multi-user applications
- One bank per agent is common for agent-specific long-term memory
- A shared bank with tags can work for cross-user analysis (see Tags)
Banks are auto-created on first use. Configure them before ingesting data to steer behavior.
Taxonomy
| Operation | What it does | When to call it |
|---|---|---|
| Retain | Ingests raw content (conversations, documents, notes). The LLM extracts facts, entities, and relationships — raw content is never stored verbatim. | After each conversation turn or session ends |
| Recall | Retrieves relevant memories using 4 parallel strategies: semantic search, BM25, graph traversal, and temporal ranking. Returns a ranked list of facts. | Before generating a response that benefits from past context |
| Reflect | Autonomous reasoning loop: searches memory, synthesizes an answer, and returns it directly. Uses mental models and observations hierarchically. | When you want Hindsight to answer a question, not just retrieve facts |
| Observations | Auto-synthesized knowledge patterns produced by the consolidation operation, which runs asynchronously after retain completes. Consolidate facts into durable insights (preferences, behavioral patterns, contradictions). | Triggered automatically after retain — not part of the retain call itself |
| Mental Models | Pre-computed reflect responses stored for common queries. Return instantly and consistently. | Create for repeated high-traffic queries or slowly-changing user profiles |
Memory Types
Facts extracted during retain are classified into three types:
| Type | Description | Example |
|---|---|---|
world | General knowledge, external facts | "The Eiffel Tower is in Paris" |
experience | Personal events, user-specific facts | "User moved to Berlin in 2024" |
observation | Consolidated patterns synthesized from facts | "User consistently prefers async communication" |
Use types filtering in recall to target specific memory types.
Bank Configuration
Configure a bank before first use to steer memory behavior for your domain. Misconfigured missions are the single biggest cause of low-quality memories.
Writing Effective Missions
All three missions accept plain language. Be specific about your domain — vague missions produce vague results.
retain_mission
Injected into the fact extraction prompt. Tells the LLM what to extract and what to ignore.
| Quality | Example |
|---|---|
| Good | Always extract technical decisions, API design choices, architectural trade-offs, blockers, and error messages. Ignore greetings, small talk, and scheduling logistics. |
| Good | Extract personal preferences, ongoing commitments, deadlines, health info, and relationship details. Ignore filler phrases and pleasantries. |
| Bad | Extract all information — too vague, extracts noise |
| Bad | Be helpful — not an extraction directive |
Tips:
- List the fact types you want (preferences, decisions, errors, commitments)
- List what to ignore — this is as important as what to include
- Match the mission to your actual data type (conversations vs documents vs tickets)
observations_mission
Steers what patterns are synthesized during consolidation. Runs after retain.
Identify evolving preferences, recurring patterns, behavioral shifts, and contradictions
with prior knowledge. Focus on durable patterns — not transient states. Highlight when
user behavior contradicts previous observations.
Tips:
- Emphasize "durable patterns" to avoid ephemeral observation noise
- Mention contradiction detection explicitly if you need historical tracking
- Match scope to how often you expect patterns to change
reflect_mission
Sets the agent persona and reasoning frame for reflect operations.
| Use Case | Mission |
|---|---|
| Coding assistant | You are a senior developer helping optimize the user's workflow. Always factor in past technical decisions, current project context, and stated preferences. Be direct and opinionated. |
| Customer support | You are a support agent with full context of this customer's history. Reference past tickets and resolutions where relevant. Be concise and solution-focused. |
| Personal assistant | You are a personal assistant who remembers everything important to the user. Personalize every response using what you know about their preferences, schedule, and ongoing projects. |
| Medical assistant | You are a health assistant. Reference the user's history accurately. Always recommend consulting a professional for medical decisions. Do not speculate. |
Disposition Traits
Dispositions affect reflect only (not recall). Scale 1–5.
| Trait | 1 | 5 |
|---|---|---|
skepticism | Trusts all memories at face value | Questions contradictions, flags uncertain info |
literalism | Liberal interpretation, infers intent | Strict literal reading, no inference |
empathy | Clinical, neutral tone | Warm, personal, emotionally aware |
Common profiles:
| Agent type | Skepticism | Literalism | Empathy |
|---|---|---|---|
| Code review | 4 | 5 | 1 |
| Customer support | 2 | 3 | 4 |
| Personal assistant | 2 | 2 | 4 |
| Medical assistant | 5 | 4 | 3 |
| Research assistant | 4 | 4 | 2 |
Entity Labels
Define a controlled vocabulary for classification. The LLM will extract and normalize values to your defined set.
{
"entity_labels": [
{
"key": "tech_stack",
"type": "multi-values",
"values": [
{"value": "python", "description": "Python programming language"},
{"value": "typescript", "description": "TypeScript / Node.js"},
{"value": "react", "description": "React frontend framework"}
]
},
{
"key": "priority",
"type": "value",
"tag": true,
"values": [
{"value": "high", "description": "Urgent or blocking"},
{"value": "low", "description": "Nice to have"}
]
}
]
}
type: "value"— single value per entity (last write wins)type: "multi-values"— accumulates multiple valuestag: true— extracted label values are also added as tags (enables filtering by entity value)
Use entity labels when you need consistent classification — domain-specific terms, status values, priority levels, engagement types.
Retaining Data
Content Format
Pass the richest representation available. Never pre-summarize.
| Format | Recommendation |
|---|---|
| JSON conversation array | Preferred for conversations — preserves structure, roles, and relationships |
| Prefixed plain text | Acceptable — [ISO-timestamp] role: text per line |
| Markdown / HTML / raw text | Works for documents and notes |
| Pre-summarized text | Avoid — loses entity relationships, temporal markers, structural context |
Conversation JSON (preferred):
[
{"role": "user", "content": "I'm using React for the frontend.", "timestamp": "2025-06-01T10:30:00Z"},
{"role": "assistant", "content": "Got it. What state management are you using?"},
{"role": "user", "content": "Zustand. We moved away from Redux last quarter."}
]
Why not pre-summarize: The LLM extracts facts, entities, and relationships from structure. A summary like "user uses React and Zustand" loses the temporal reference ("last quarter"), the entity relationship (React↔frontend, Redux↔migration), and the causal context (moved away from).
The context Field
High-impact on extraction quality. Always set it. Describes the nature and source of the content.
# Good — specific, descriptive
context="Customer support ticket #12345 from user Alice about a billing discrepancy"
context="Developer's architecture review session for the payments service"
context="User's onboarding form: stated goals, current tools, and team size"
context="Weekly standup notes: blockers, progress, and upcoming tasks"
# Bad — generic, adds no signal
context="some data"
context="conversation"
# Omitted entirely — extraction uses no context
The document_id Field
Use for upsert behavior. Same document_id = delete previous version and reprocess.
Rules:
- Use stable, meaningful IDs (session ID, ticket ID, document UUID)
- Always use the same ID for a growing conversation — retain the full conversation with each new message
- Do NOT use random UUIDs per retain call — this creates duplicates
# Good — stable session ID
client.retain(bank_id="user-alice", items=[{
"content": full_conversation,
"document_id": f"session-{session_id}",
}])
# Bad — new random ID every call = duplicates
client.retain(bank_id="user-alice", items=[{
"content": full_conversation,
"document_id": str(uuid.uuid4()), # ❌ creates a new document each time
}])
The timestamp Field
Set whenever you have temporal context. Enables temporal retrieval strategies.
- ISO 8601 format:
"2025-06-01T10:32:00Z" - For conversations: set to when the conversation started
- Omitting it disables temporal ranking entirely
Tags: Naming Conventions
Tags scope visibility. A memory tagged user:alice is only returned for recall/reflect calls that include user:alice in their tags filter (with strict matching).
Standard naming conventions:
| Pattern | Example | Use for |
|---|---|---|
user:<id> | user:alice, user:u_123 | Per-user isolation |
session:<id> | session:s_abc | Session-scoped memories |
team:<name> | team:engineering | Shared team knowledge |
topic:<name> | topic:billing, topic:technical | Domain filtering |
scope:<name> | scope:private, scope:public | Visibility tiers |
Multi-tenant minimum: Every retain for user data must include at least user:<id>. Omitting it makes the memory globally visible.
# Multi-tenant retain — always tag with user ID
items=[{
"content": conversation,
"tags": ["user:alice", "session:s_abc", "topic:billing"],
"document_id": f"session-{session_id}",
}]
Metadata Schema
Use for source tracking and downstream linking. Not filterable — use tags for filtering.
# Source tracking
metadata={"source": "slack", "channel": "#engineering", "thread_id": "T123456"}
# Ticket linking
metadata={"ticket_id": "JIRA-123", "priority": "high", "reporter": "alice"}
# Document provenance
metadata={"url": "https://...", "section": "pricing-faq", "version": "2025-Q1"}
Metadata is returned with every recalled memory — use it to link memories back to source systems for UI display, deep-linking, or audit trails.
Observation Scopes
Controls which tag combinations get their own observation pass.
| Value | Behavior | When to use |
|---|---|---|
"combined" | One pass with all tags together | Default — single-user banks, general use |
"per_tag" | One pass per tag independently | Users should have isolated behavioral observations |
"all_combinations" | All possible subsets of tags | Complex multi-dimensional analysis (expensive) |
| Custom list | Explicit scope list | Precise multi-tenant control |
Custom scope example (recommended for multi-tenant):
# Observations scoped to: user-level, team-level, and combined
observation_scopes=[
["user:alice"],
["team:engineering"],
["user:alice", "team:engineering"],
]
Sync vs Async
| Mode | When to use |
|---|---|
async_=False (default) | When you need confirmation before proceeding |
async_=True | End-of-turn or end-of-session retain; user-facing flows where latency matters |
Do not retain and recall in the same turn — retain is a write operation and the extracted memories will not be available immediately.
Recalling Memories
Budget Selection
| Budget | Latency | Use when |
|---|---|---|
low | 50–100ms | Simple fact lookups, single-hop questions |
mid | 100–300ms | Multi-hop reasoning, relationship queries (default) |
high | 300–500ms | Deep exploration, complex cross-domain patterns |
Default to mid. Use low for high-frequency agent loops. Reserve high for explicit "deep recall" user-triggered flows.
Tag Filtering Modes
| Mode | Includes untagged? | Condition |
|---|---|---|
any (default) | Yes | At least one tag matches, OR untagged |
all | Yes | All specified tags present, OR untagged |
any_strict | No | At least one tag matches |
all_strict | No | All specified tags present |
Decision guide:
- Shared global knowledge + per-user:
tags=["user:alice"], tags_match="any"— returns Alice's memories and untagged global memories - Fully partitioned (no leakage):
tags=["user:alice"], tags_match="any_strict"— Alice's memories only - Multi-condition AND:
tags=["user:alice", "topic:billing"], tags_match="all_strict"— only where both tags present
tag_groups for complex filters:
Tag groups use a tree structure with and/or/not compound nodes and {"tags": [...], "match": "..."} leaf nodes.
# Alice's billing memories OR shared billing memories (no user tag)
recall(
query="...",
tag_groups=[
{"or": [
{"tags": ["user:alice", "topic:billing"], "match": "all_strict"},
{"and": [
{"tags": ["topic:billing"], "match": "any_strict"},
{"not": {"tags": ["user:alice"], "match": "any_strict"}},
]},
]}
]
)
include Options
| Option | Default | Enable when |
|---|---|---|
include.entities | Enabled | — (leave on; provides entity context for graph traversal) |
include.chunks | Disabled | Agent needs exact wording or source quotation |
include.source_facts | Disabled | Tracing observation provenance for auditing |
types Filtering
| Value | Returns |
|---|---|
| (not set) | All types |
["observation"] | Consolidated patterns only — faster for high-level questions |
["world", "experience"] | Raw facts only — for ground-truth or citation-sensitive queries |
query_timestamp
Set for time-sensitive queries. Anchors temporal ranking to a specific point in time.
# "What was the team working on in January?"
recall(query="team priorities", query_timestamp="2025-01-31T23:59:59Z")
# Current context (most common)
recall(query="user preferences", query_timestamp=datetime.utcnow().isoformat() + "Z")
Reflecting
Recall vs Reflect
Use recall when | Use reflect when |
|---|---|
| Agent will reason over facts itself | You want Hindsight to reason and return an answer |
| You need raw citations | You need a synthesized response |
| You're building a RAG pipeline | You want an autonomous multi-step search loop |
| Latency is critical | Response quality matters more than latency |
| You need precise fact counts | You need a contextual, nuanced answer |
response_schema
Use when you need structured output for programmatic consumption.
reflect(
query="What are the user's top 3 technical preferences?",
response_schema={
"type": "object",
"properties": {
"preferences": {
"type": "array",
"items": {"type": "string"},
"maxItems": 3
},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["preferences", "confidence"]
}
)
# Returns: result.structured_output["preferences"], result.structured_output["confidence"]
Auditing and Debugging
| Option | Purpose |
|---|---|
include.facts=True | Exposes which memories and mental models were used (for transparency/auditing) |
include.tool_calls=True | Full execution trace of the internal search loop (for debugging) |
Enable include.facts in production for audit trails. Enable include.tool_calls only during development.
Mental Models
Mental models are pre-computed reflect responses stored for common queries. They return instantly and consistently.
When to Create
- Common repeated queries that should return consistent answers
- High-traffic agents that need sub-100ms responses
- User profiles or personas read on every request
- Knowledge summaries reviewed or approved by humans
- Cross-session state that changes slowly (preferences, skills, background)
Tag Strategy
Tags on a mental model filter BOTH which memories are used to build it AND which recall/reflect calls can see it.
# Per-user mental model — uses Alice's memories (all_strict applied automatically during refresh)
create_mental_model(
bank_id="shared-bank",
name="Alice's Technical Profile",
source_query="Summarize Alice's technical background, preferred stack, and current projects",
tags=["user:alice"],
)
# Global mental model — uses all memories, visible to everyone
create_mental_model(
bank_id="shared-bank",
name="Team Engineering Standards",
source_query="What are the team's agreed engineering standards and conventions?",
# No tags — reads all memories, visible to all
)
Refresh Strategy
| Trigger | When to use |
|---|---|
| Manual via API | After significant data updates or review cycles |
trigger={"refresh_after_consolidation": True} | When observations update frequently and the model should stay current |
Create narrow, scoped models — one per knowledge dimension. A mental model titled "Everything about the user" is as useful as none.
Model granularity examples for a personal assistant:
- "User Profile" — demographics, preferences, stated goals
- "Current Projects" — active work, deadlines, blockers
- "Technical Stack" — languages, tools, frameworks used
- "Communication Style" — formality preferences, response length preferences
Anti-patterns
| Anti-pattern | Problem | Fix |
|---|---|---|
| Pre-summarizing before retain | Loses entity relationships, temporal markers, structural context | Retain raw content; Hindsight extracts facts |
Using random UUIDs as document_id | Creates duplicate documents on every retain | Use stable session/ticket/document IDs |
Omitting the context field | Reduces extraction quality significantly | Always describe what kind of data this is |
Using metadata for filtering | Metadata is not filterable | Use tags for anything you'll filter on |
| Vague or generic missions | Generic extraction = noisy, low-value memories | Be specific about domain, data type, what to ignore |
tags_match="any" for multi-tenant banks | Leaks memories across users | Use any_strict or all_strict for user-partitioned data |
| Retaining and recalling in the same request | Retained memories not yet indexed | Retain end-of-turn; recall at the start of next turn |
| One mental model for everything | Low accuracy, slow refresh, hard to scope | Create one model per knowledge dimension |
high budget for every recall | Expensive, slow, usually unnecessary | Use low for simple lookups, mid default |
Missing timestamp on retain | Disables temporal retrieval strategies | Always set from actual content timestamps |