Skip to main content

Ingest Data

Store documents, conversations, and raw content into Hindsight to automatically extract and create memories.

When you retain content, Hindsight doesn't just store the raw text—it intelligently analyzes the content to extract meaningful facts, identify entities, and build a connected knowledge graph. This process transforms unstructured information into structured, queryable memories.

How Retain Works

Learn about fact extraction, entity resolution, and graph construction in the Retain Architecture guide.

Prerequisites

Make sure you've completed the Quick Start to install the client and start the server.

Store a Single Memory

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

client.retain(
bank_id="my-bank",
content="Alice works at Google as a software engineer"
)

The Importance of Context

The context parameter is crucial for guiding how Hindsight extracts memories from your content. Think of it as providing a lens through which the system interprets the information.

Why context matters:

  • Steers memory extraction: Context tells the memory bank what type of information to focus on and how to interpret ambiguous content
  • Improves relevance: Memories extracted with proper context are more accurately categorized and easier to retrieve
  • Disambiguates meaning: The same sentence can have different implications depending on context (e.g., "the project was terminated" means different things in a career vs. product context)

Store with Context and Date

Always provide context and event dates for optimal memory extraction:

client.retain(
bank_id="my-bank",
content="Alice got promoted to senior engineer",
context="career update",
timestamp="2024-03-15T10:00:00Z"
)

The timestamp defaults to the current time if not specified. Providing explicit timestamps enables temporal queries like "What happened last spring?"

Batch Ingestion

Store multiple items in a single request. Batch ingestion is the recommended approach as it significantly improves performance by reducing network overhead and allowing Hindsight to optimize the memory extraction process across related content.

client.retain_batch(
bank_id="my-bank",
items=[
{"content": "Alice works at Google", "context": "career"},
{"content": "Bob is a data scientist at Meta", "context": "career"},
{"content": "Alice and Bob are friends", "context": "relationship"}
],
document_id="conversation_001"
)

The document_id groups related memories for later management.

Store from Files

# Single file
hindsight memory put-files my-bank document.txt

# Multiple files
hindsight memory put-files my-bank doc1.txt doc2.md notes.txt

# With document ID
hindsight memory put-files my-bank report.pdf --document-id "q4-report"

Async Ingestion

For large batches, use async ingestion to avoid blocking:

# Start async ingestion (returns immediately)
result = client.retain_batch(
bank_id="my-bank",
items=[...large batch...],
document_id="large-doc",
retain_async=True
)

# Check if it was processed asynchronously
print(result.var_async) # True