Skip to main content

Best Practices

Practical guidance for agents and developers integrating Hindsight memory into production systems.

Contents


Core Concepts

Memory Banks

A memory bank is an isolated memory store — the unit of separation between users, agents, or contexts. All operations (retain, recall, reflect) target a single bank. Banks do not share data.

  • One bank per user is the most common pattern for multi-user applications
  • One bank per agent is common for agent-specific long-term memory
  • A shared bank with tags can work for cross-user analysis (see Tags)

Banks are auto-created on first use. Configure them before ingesting data to steer behavior.


Taxonomy

OperationWhat it doesWhen to call it
RetainIngests raw content (conversations, documents, notes). The LLM extracts facts, entities, and relationships — raw content is never stored verbatim.After each conversation turn or session ends
RecallRetrieves relevant memories using 4 parallel strategies: semantic search, BM25, graph traversal, and temporal ranking. Returns a ranked list of facts.Before generating a response that benefits from past context
ReflectAutonomous reasoning loop: searches memory, synthesizes an answer, and returns it directly. Uses mental models and observations hierarchically.When you want Hindsight to answer a question, not just retrieve facts
ObservationsAuto-synthesized knowledge patterns produced by the consolidation operation, which runs asynchronously after retain completes. Consolidate facts into durable insights (preferences, behavioral patterns, contradictions).Triggered automatically after retain — not part of the retain call itself
Mental ModelsPre-computed reflect responses stored for common queries. Return instantly and consistently.Create for repeated high-traffic queries or slowly-changing user profiles

Memory Types

Facts extracted during retain are classified into three types:

TypeDescriptionExample
worldGeneral knowledge, external facts"The Eiffel Tower is in Paris"
experiencePersonal events, user-specific facts"User moved to Berlin in 2024"
observationConsolidated patterns synthesized from facts"User consistently prefers async communication"

Use types filtering in recall to target specific memory types.


Bank Configuration

Configure a bank before first use to steer memory behavior for your domain. Misconfigured missions are the single biggest cause of low-quality memories.

Writing Effective Missions

All three missions accept plain language. Be specific about your domain — vague missions produce vague results.

retain_mission

Injected into the fact extraction prompt. Tells the LLM what to extract and what to ignore.

QualityExample
GoodAlways extract technical decisions, API design choices, architectural trade-offs, blockers, and error messages. Ignore greetings, small talk, and scheduling logistics.
GoodExtract personal preferences, ongoing commitments, deadlines, health info, and relationship details. Ignore filler phrases and pleasantries.
BadExtract all information — too vague, extracts noise
BadBe helpful — not an extraction directive

Tips:

  • List the fact types you want (preferences, decisions, errors, commitments)
  • List what to ignore — this is as important as what to include
  • Match the mission to your actual data type (conversations vs documents vs tickets)

observations_mission

Steers what patterns are synthesized during consolidation. Runs after retain.

Identify evolving preferences, recurring patterns, behavioral shifts, and contradictions
with prior knowledge. Focus on durable patterns — not transient states. Highlight when
user behavior contradicts previous observations.

Tips:

  • Emphasize "durable patterns" to avoid ephemeral observation noise
  • Mention contradiction detection explicitly if you need historical tracking
  • Match scope to how often you expect patterns to change

reflect_mission

Sets the agent persona and reasoning frame for reflect operations.

Use CaseMission
Coding assistantYou are a senior developer helping optimize the user's workflow. Always factor in past technical decisions, current project context, and stated preferences. Be direct and opinionated.
Customer supportYou are a support agent with full context of this customer's history. Reference past tickets and resolutions where relevant. Be concise and solution-focused.
Personal assistantYou are a personal assistant who remembers everything important to the user. Personalize every response using what you know about their preferences, schedule, and ongoing projects.
Medical assistantYou are a health assistant. Reference the user's history accurately. Always recommend consulting a professional for medical decisions. Do not speculate.

Disposition Traits

Dispositions affect reflect only (not recall). Scale 1–5.

Trait15
skepticismTrusts all memories at face valueQuestions contradictions, flags uncertain info
literalismLiberal interpretation, infers intentStrict literal reading, no inference
empathyClinical, neutral toneWarm, personal, emotionally aware

Common profiles:

Agent typeSkepticismLiteralismEmpathy
Code review451
Customer support234
Personal assistant224
Medical assistant543
Research assistant442

Entity Labels

Define a controlled vocabulary for classification. The LLM will extract and normalize values to your defined set.

{
"entity_labels": [
{
"key": "tech_stack",
"type": "multi-values",
"values": [
{"value": "python", "description": "Python programming language"},
{"value": "typescript", "description": "TypeScript / Node.js"},
{"value": "react", "description": "React frontend framework"}
]
},
{
"key": "priority",
"type": "value",
"tag": true,
"values": [
{"value": "high", "description": "Urgent or blocking"},
{"value": "low", "description": "Nice to have"}
]
}
]
}
  • type: "value" — single value per entity (last write wins)
  • type: "multi-values" — accumulates multiple values
  • tag: true — extracted label values are also added as tags (enables filtering by entity value)

Use entity labels when you need consistent classification — domain-specific terms, status values, priority levels, engagement types.


Retaining Data

Content Format

Pass the richest representation available. Never pre-summarize.

FormatRecommendation
JSON conversation arrayPreferred for conversations — preserves structure, roles, and relationships
Prefixed plain textAcceptable — [ISO-timestamp] role: text per line
Markdown / HTML / raw textWorks for documents and notes
Pre-summarized textAvoid — loses entity relationships, temporal markers, structural context

Conversation JSON (preferred):

[
{"role": "user", "content": "I'm using React for the frontend.", "timestamp": "2025-06-01T10:30:00Z"},
{"role": "assistant", "content": "Got it. What state management are you using?"},
{"role": "user", "content": "Zustand. We moved away from Redux last quarter."}
]

Why not pre-summarize: The LLM extracts facts, entities, and relationships from structure. A summary like "user uses React and Zustand" loses the temporal reference ("last quarter"), the entity relationship (React↔frontend, Redux↔migration), and the causal context (moved away from).


The context Field

High-impact on extraction quality. Always set it. Describes the nature and source of the content.

# Good — specific, descriptive
context="Customer support ticket #12345 from user Alice about a billing discrepancy"
context="Developer's architecture review session for the payments service"
context="User's onboarding form: stated goals, current tools, and team size"
context="Weekly standup notes: blockers, progress, and upcoming tasks"

# Bad — generic, adds no signal
context="some data"
context="conversation"
# Omitted entirely — extraction uses no context

The document_id Field

Use for upsert behavior. Same document_id = delete previous version and reprocess.

Rules:

  • Use stable, meaningful IDs (session ID, ticket ID, document UUID)
  • Always use the same ID for a growing conversation — retain the full conversation with each new message
  • Do NOT use random UUIDs per retain call — this creates duplicates
# Good — stable session ID
client.retain(bank_id="user-alice", items=[{
"content": full_conversation,
"document_id": f"session-{session_id}",
}])

# Bad — new random ID every call = duplicates
client.retain(bank_id="user-alice", items=[{
"content": full_conversation,
"document_id": str(uuid.uuid4()), # ❌ creates a new document each time
}])

The timestamp Field

Set whenever you have temporal context. Enables temporal retrieval strategies.

  • ISO 8601 format: "2025-06-01T10:32:00Z"
  • For conversations: set to when the conversation started
  • Omitting it disables temporal ranking entirely

Tags: Naming Conventions

Tags scope visibility. A memory tagged user:alice is only returned for recall/reflect calls that include user:alice in their tags filter (with strict matching).

Standard naming conventions:

PatternExampleUse for
user:<id>user:alice, user:u_123Per-user isolation
session:<id>session:s_abcSession-scoped memories
team:<name>team:engineeringShared team knowledge
topic:<name>topic:billing, topic:technicalDomain filtering
scope:<name>scope:private, scope:publicVisibility tiers

Multi-tenant minimum: Every retain for user data must include at least user:<id>. Omitting it makes the memory globally visible.

# Multi-tenant retain — always tag with user ID
items=[{
"content": conversation,
"tags": ["user:alice", "session:s_abc", "topic:billing"],
"document_id": f"session-{session_id}",
}]

Metadata Schema

Use for source tracking and downstream linking. Not filterable — use tags for filtering.

# Source tracking
metadata={"source": "slack", "channel": "#engineering", "thread_id": "T123456"}

# Ticket linking
metadata={"ticket_id": "JIRA-123", "priority": "high", "reporter": "alice"}

# Document provenance
metadata={"url": "https://...", "section": "pricing-faq", "version": "2025-Q1"}

Metadata is returned with every recalled memory — use it to link memories back to source systems for UI display, deep-linking, or audit trails.


Observation Scopes

Controls which tag combinations get their own observation pass.

ValueBehaviorWhen to use
"combined"One pass with all tags togetherDefault — single-user banks, general use
"per_tag"One pass per tag independentlyUsers should have isolated behavioral observations
"all_combinations"All possible subsets of tagsComplex multi-dimensional analysis (expensive)
Custom listExplicit scope listPrecise multi-tenant control

Custom scope example (recommended for multi-tenant):

# Observations scoped to: user-level, team-level, and combined
observation_scopes=[
["user:alice"],
["team:engineering"],
["user:alice", "team:engineering"],
]

Sync vs Async

ModeWhen to use
async_=False (default)When you need confirmation before proceeding
async_=TrueEnd-of-turn or end-of-session retain; user-facing flows where latency matters

Do not retain and recall in the same turn — retain is a write operation and the extracted memories will not be available immediately.


Recalling Memories

Budget Selection

BudgetLatencyUse when
low50–100msSimple fact lookups, single-hop questions
mid100–300msMulti-hop reasoning, relationship queries (default)
high300–500msDeep exploration, complex cross-domain patterns

Default to mid. Use low for high-frequency agent loops. Reserve high for explicit "deep recall" user-triggered flows.


Tag Filtering Modes

ModeIncludes untagged?Condition
any (default)YesAt least one tag matches, OR untagged
allYesAll specified tags present, OR untagged
any_strictNoAt least one tag matches
all_strictNoAll specified tags present

Decision guide:

  • Shared global knowledge + per-user: tags=["user:alice"], tags_match="any" — returns Alice's memories and untagged global memories
  • Fully partitioned (no leakage): tags=["user:alice"], tags_match="any_strict" — Alice's memories only
  • Multi-condition AND: tags=["user:alice", "topic:billing"], tags_match="all_strict" — only where both tags present

tag_groups for complex filters:

Tag groups use a tree structure with and/or/not compound nodes and {"tags": [...], "match": "..."} leaf nodes.

# Alice's billing memories OR shared billing memories (no user tag)
recall(
query="...",
tag_groups=[
{"or": [
{"tags": ["user:alice", "topic:billing"], "match": "all_strict"},
{"and": [
{"tags": ["topic:billing"], "match": "any_strict"},
{"not": {"tags": ["user:alice"], "match": "any_strict"}},
]},
]}
]
)

include Options

OptionDefaultEnable when
include.entitiesEnabled— (leave on; provides entity context for graph traversal)
include.chunksDisabledAgent needs exact wording or source quotation
include.source_factsDisabledTracing observation provenance for auditing

types Filtering

ValueReturns
(not set)All types
["observation"]Consolidated patterns only — faster for high-level questions
["world", "experience"]Raw facts only — for ground-truth or citation-sensitive queries

query_timestamp

Set for time-sensitive queries. Anchors temporal ranking to a specific point in time.

# "What was the team working on in January?"
recall(query="team priorities", query_timestamp="2025-01-31T23:59:59Z")

# Current context (most common)
recall(query="user preferences", query_timestamp=datetime.utcnow().isoformat() + "Z")

Reflecting

Recall vs Reflect

Use recall whenUse reflect when
Agent will reason over facts itselfYou want Hindsight to reason and return an answer
You need raw citationsYou need a synthesized response
You're building a RAG pipelineYou want an autonomous multi-step search loop
Latency is criticalResponse quality matters more than latency
You need precise fact countsYou need a contextual, nuanced answer

response_schema

Use when you need structured output for programmatic consumption.

reflect(
query="What are the user's top 3 technical preferences?",
response_schema={
"type": "object",
"properties": {
"preferences": {
"type": "array",
"items": {"type": "string"},
"maxItems": 3
},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["preferences", "confidence"]
}
)
# Returns: result.structured_output["preferences"], result.structured_output["confidence"]

Auditing and Debugging

OptionPurpose
include.facts=TrueExposes which memories and mental models were used (for transparency/auditing)
include.tool_calls=TrueFull execution trace of the internal search loop (for debugging)

Enable include.facts in production for audit trails. Enable include.tool_calls only during development.


Mental Models

Mental models are pre-computed reflect responses stored for common queries. They return instantly and consistently.

When to Create

  • Common repeated queries that should return consistent answers
  • High-traffic agents that need sub-100ms responses
  • User profiles or personas read on every request
  • Knowledge summaries reviewed or approved by humans
  • Cross-session state that changes slowly (preferences, skills, background)

Tag Strategy

Tags on a mental model filter BOTH which memories are used to build it AND which recall/reflect calls can see it.

# Per-user mental model — uses Alice's memories (all_strict applied automatically during refresh)
create_mental_model(
bank_id="shared-bank",
name="Alice's Technical Profile",
source_query="Summarize Alice's technical background, preferred stack, and current projects",
tags=["user:alice"],
)

# Global mental model — uses all memories, visible to everyone
create_mental_model(
bank_id="shared-bank",
name="Team Engineering Standards",
source_query="What are the team's agreed engineering standards and conventions?",
# No tags — reads all memories, visible to all
)

Refresh Strategy

TriggerWhen to use
Manual via APIAfter significant data updates or review cycles
trigger={"refresh_after_consolidation": True}When observations update frequently and the model should stay current

Create narrow, scoped models — one per knowledge dimension. A mental model titled "Everything about the user" is as useful as none.

Model granularity examples for a personal assistant:

  • "User Profile" — demographics, preferences, stated goals
  • "Current Projects" — active work, deadlines, blockers
  • "Technical Stack" — languages, tools, frameworks used
  • "Communication Style" — formality preferences, response length preferences

Anti-patterns

Anti-patternProblemFix
Pre-summarizing before retainLoses entity relationships, temporal markers, structural contextRetain raw content; Hindsight extracts facts
Using random UUIDs as document_idCreates duplicate documents on every retainUse stable session/ticket/document IDs
Omitting the context fieldReduces extraction quality significantlyAlways describe what kind of data this is
Using metadata for filteringMetadata is not filterableUse tags for anything you'll filter on
Vague or generic missionsGeneric extraction = noisy, low-value memoriesBe specific about domain, data type, what to ignore
tags_match="any" for multi-tenant banksLeaks memories across usersUse any_strict or all_strict for user-partitioned data
Retaining and recalling in the same requestRetained memories not yet indexedRetain end-of-turn; recall at the start of next turn
One mental model for everythingLow accuracy, slow refresh, hard to scopeCreate one model per knowledge dimension
high budget for every recallExpensive, slow, usually unnecessaryUse low for simple lookups, mid default
Missing timestamp on retainDisables temporal retrieval strategiesAlways set from actual content timestamps