Ingest Data
Store documents, conversations, and raw content into Hindsight to automatically extract and create memories.
When you retain content, Hindsight doesn't just store the raw text—it intelligently analyzes the content to extract meaningful facts, identify entities, and build a connected knowledge graph. This process transforms unstructured information into structured, queryable memories.
Learn about fact extraction, entity resolution, and graph construction in the Retain Architecture guide.
Make sure you've completed the Quick Start to install the client and start the server.
Store a Single Memory
- Python
- Node.js
- CLI
client.retain(
bank_id="my-bank",
content="Alice works at Google as a software engineer"
)
await client.retain('my-bank', 'Alice works at Google as a software engineer');
hindsight memory retain my-bank "Alice works at Google as a software engineer"
The Importance of Context
The context parameter is crucial for guiding how Hindsight extracts memories from your content. Think of it as providing a lens through which the system interprets the information.
Why context matters:
- Steers memory extraction: Context tells the memory bank what type of information to focus on and how to interpret ambiguous content
- Improves relevance: Memories extracted with proper context are more accurately categorized and easier to retrieve
- Disambiguates meaning: The same sentence can have different implications depending on context (e.g., "the project was terminated" means different things in a career vs. product context)
Store with Context and Date
Always provide context and event dates for optimal memory extraction:
- Python
- Node.js
- CLI
client.retain(
bank_id="my-bank",
content="Alice got promoted to senior engineer",
context="career update",
timestamp="2024-03-15T10:00:00Z"
)
await client.retain('my-bank', 'Alice got promoted to senior engineer', {
context: 'career update',
timestamp: '2024-03-15T10:00:00Z'
});
hindsight memory retain my-bank "Alice got promoted" \
--context "career update"
The timestamp defaults to the current time if not specified. Providing explicit timestamps enables temporal queries like "What happened last spring?"
Response Fields
The retain response includes:
| Field | Type | Description |
|---|---|---|
success | bool | Whether the operation succeeded |
bank_id | string | The memory bank ID |
items_count | int | Number of items processed |
async | bool | Whether processed asynchronously |
usage | TokenUsage | Token usage metrics for LLM calls (synchronous only) |
The usage field contains token metrics for cost tracking:
input_tokens: Tokens consumed by promptsoutput_tokens: Tokens generated by the LLMtotal_tokens: Sum of input and output tokens
Note: usage is only present for synchronous operations. Async operations (async: true) do not return usage metrics.
Batch Ingestion
Store multiple items in a single request. Batch ingestion is the recommended approach as it significantly improves performance by reducing network overhead and allowing Hindsight to optimize the memory extraction process across related content.
- Python
- Node.js
client.retain_batch(
bank_id="my-bank",
items=[
{"content": "Alice works at Google", "context": "career"},
{"content": "Bob is a data scientist at Meta", "context": "career"},
{"content": "Alice and Bob are friends", "context": "relationship"}
],
document_id="conversation_001"
)
await client.retainBatch('my-bank', [
{ content: 'Alice works at Google', context: 'career' },
{ content: 'Bob is a data scientist at Meta', context: 'career' },
{ content: 'Alice and Bob are friends', context: 'relationship' }
], { documentId: 'conversation_001' });
The document_id groups related memories for later management.
Store from Files
- CLI
# Single file
hindsight memory retain-files my-bank document.txt
# Directory (recursive by default)
hindsight memory retain-files my-bank ./documents/
Async Ingestion
For large batches, use async ingestion to avoid blocking:
- Python
- Node.js
# Start async ingestion (returns immediately)
result = client.retain_batch(
bank_id="my-bank",
items=[
{"content": "Large batch item 1"},
{"content": "Large batch item 2"},
],
document_id="large-doc",
retain_async=True
)
# Check if it was processed asynchronously
print(result.var_async) # True
// Start async ingestion (returns immediately)
await client.retainBatch('my-bank', [
{ content: 'Large batch item 1' },
{ content: 'Large batch item 2' },
], {
documentId: 'large-doc',
async: true
});
Tagging Memories
Tags enable visibility scoping—useful when one memory bank serves multiple users but each should only see relevant memories. For example, an agent that chats with multiple users can tag memories by user ID and filter during recall.
Tag Individual Items
- Python
# Tag individual items for visibility scoping
client.retain_batch(
bank_id="my-bank",
items=[
{
"content": "User Alice said she loves the new dashboard",
"tags": ["user:alice", "feedback"]
},
{
"content": "User Bob reported a bug in the search feature",
"tags": ["user:bob", "bug-report"]
}
],
document_id="user_feedback_001"
)
Apply Tags to All Items in a Batch
Use document_tags to apply the same tags to all items in a request:
- Python
# Apply tags to all items in a batch
client.retain_batch(
bank_id="my-bank",
items=[
{"content": "Alice mentioned she prefers dark mode"},
{"content": "Bob asked about keyboard shortcuts"}
],
document_id="support_session_123",
document_tags=["session:123", "support"] # Applied to all items
)
When both document_tags and item-level tags are provided, they are merged together.
Tag Naming Conventions
Use consistent naming patterns for tags:
| Pattern | Example | Use Case |
|---|---|---|
user:<id> | user:alice | Multi-user agent filtering |
session:<id> | session:123 | Session-based scoping |
room:<id> | room:general | Chat room isolation |
topic:<name> | topic:feedback | Topic categorization |
Listing Tags
Use the list tags API to discover existing tags, useful for UI autocomplete or wildcard expansion:
# List all tags in a bank
tags = client.list_tags(bank_id="my-bank")
for tag in tags.items:
print(f"{tag.tag}: {tag.count} memories")
# Search with wildcards (* matches any characters)
user_tags = client.list_tags(bank_id="my-bank", q="user:*")
admin_tags = client.list_tags(bank_id="my-bank", q="*-admin")
See Recall API for filtering memories by tags during retrieval.