Skip to main content

Documents

Track and manage document sources in your memory bank. Documents provide traceability — knowing where memories came from.

Prerequisites

Make sure you've completed the Quick Start and understand how retain works.

What Are Documents?

Documents are containers for retained content. They help you:

  • Track sources — Know which PDF, conversation, or file a memory came from
  • Update content — Re-retain a document to update its facts
  • Delete in bulk — Remove all memories from a document at once
  • Organize memories — Group related facts by source

Chunks

When you retain content, Hindsight splits it into chunks before extracting facts. These chunks are stored alongside the extracted memories, preserving the original text segments.

Why chunks matter:

  • Context preservation — Chunks contain the raw text that generated facts, useful when you need the exact wording
  • Richer recall — Including chunks in recall provides surrounding context for matched facts
Include Chunks in Recall

Use include_chunks=True in your recall calls to get the original text chunks alongside fact results. See Recall for details.

Retain with Document ID

Associate retained content with a document:

# Retain with document ID
client.retain(
bank_id="my-bank",
content="Alice presented the Q4 roadmap...",
document_id="meeting-2024-03-15"
)

# Batch retain for a document with different sections
client.retain_batch(
bank_id="my-bank",
items=[
{"content": "Item 1: Product launch delayed to Q2", "document_id": "meeting-2024-03-15-section-1"},
{"content": "Item 2: New hiring targets announced", "document_id": "meeting-2024-03-15-section-2"},
{"content": "Item 3: Budget approved for ML team", "document_id": "meeting-2024-03-15-section-3"}
]
)

Update Documents

Re-retaining with the same document_id replaces the old content:

# Original
client.retain(
bank_id="my-bank",
content="Project deadline: March 31",
document_id="project-plan"
)

# Update (deletes old facts, creates new ones)
client.retain(
bank_id="my-bank",
content="Project deadline: April 15 (extended)",
document_id="project-plan"
)
from hindsight_client_api import ApiClient, Configuration
from hindsight_client_api.api import DocumentsApi
from hindsight_client_api.models import UpdateDocumentRequest

async def update_document_example():
config = Configuration(host="http://localhost:8888")
api_client = ApiClient(config)
api = DocumentsApi(api_client)

# Fix tags on a document retained with the wrong scope
result = await api.update_document(
bank_id="my-bank",
document_id="meeting-2024-03-15",
update_document_request=UpdateDocumentRequest(tags=["team-a", "team-b"]),
)
print(f"Updated: {result.success}")

# Remove all tags (make document visible everywhere)
await api.update_document(
bank_id="my-bank",
document_id="meeting-2024-03-15",
update_document_request=UpdateDocumentRequest(tags=[]),
)

asyncio.run(update_document_example())

Get Document

Retrieve a document's original text and metadata. This is useful for expanding document context after a recall operation returns memories with document references.

from hindsight_client_api import ApiClient, Configuration
from hindsight_client_api.api import DocumentsApi

async def get_document_example():
config = Configuration(host="http://localhost:8888")
api_client = ApiClient(config)
api = DocumentsApi(api_client)

# Get document to expand context from recall results
doc = await api.get_document(
bank_id="my-bank",
document_id="meeting-2024-03-15"
)

print(f"Document: {doc.id}")
print(f"Original text: {doc.original_text}")
print(f"Memory count: {doc.memory_unit_count}")
print(f"Created: {doc.created_at}")

asyncio.run(get_document_example())

Update Document

Update mutable fields on an existing document without re-processing the content. Currently supports updating tags.

# Original
client.retain(
bank_id="my-bank",
content="Project deadline: March 31",
document_id="project-plan"
)

# Update (deletes old facts, creates new ones)
client.retain(
bank_id="my-bank",
content="Project deadline: April 15 (extended)",
document_id="project-plan"
)
from hindsight_client_api import ApiClient, Configuration
from hindsight_client_api.api import DocumentsApi
from hindsight_client_api.models import UpdateDocumentRequest

async def update_document_example():
config = Configuration(host="http://localhost:8888")
api_client = ApiClient(config)
api = DocumentsApi(api_client)

# Fix tags on a document retained with the wrong scope
result = await api.update_document(
bank_id="my-bank",
document_id="meeting-2024-03-15",
update_document_request=UpdateDocumentRequest(tags=["team-a", "team-b"]),
)
print(f"Updated: {result.success}")

# Remove all tags (make document visible everywhere)
await api.update_document(
bank_id="my-bank",
document_id="meeting-2024-03-15",
update_document_request=UpdateDocumentRequest(tags=[]),
)

asyncio.run(update_document_example())
Observations are re-consolidated

When tags change, any consolidated observations derived from the document's memories are invalidated and queued for re-consolidation under the new tags. Co-source memories from other documents that shared those observations are also reset.

Delete Document

Remove a document and all its associated memories:

from hindsight_client_api import ApiClient, Configuration
from hindsight_client_api.api import DocumentsApi

async def delete_document_example():
config = Configuration(host="http://localhost:8888")
api_client = ApiClient(config)
api = DocumentsApi(api_client)

# Delete document and all its memories
result = await api.delete_document(
bank_id="my-bank",
document_id="meeting-2024-03-15"
)

print(f"Deleted {result.memory_units_deleted} memories")

asyncio.run(delete_document_example())
warning

Deleting a document permanently removes all memories extracted from it. This action cannot be undone.

List Documents

List documents in a bank with optional filtering by ID and tags.

from hindsight_client_api import ApiClient, Configuration
from hindsight_client_api.api import DocumentsApi

async def list_documents_example():
config = Configuration(host="http://localhost:8888")
api_client = ApiClient(config)
api = DocumentsApi(api_client)

# List all documents
result = await api.list_documents(bank_id="my-bank")
print(f"Total documents: {result.total}")

# Filter by document ID substring
result = await api.list_documents(bank_id="my-bank", q="report")

# Filter by tags — only docs tagged with "team-a" (untagged excluded)
result = await api.list_documents(
bank_id="my-bank",
tags=["team-a"],
tags_match="any_strict",
)

# Combine ID search and tags
result = await api.list_documents(
bank_id="my-bank",
q="meeting",
tags=["team-a", "team-b"],
tags_match="all_strict", # must have both tags
)

# Paginate
result = await api.list_documents(bank_id="my-bank", limit=20, offset=40)
print(f"Page items: {len(result.items)}")

import asyncio
asyncio.run(list_documents_example())

Filtering Options

ParameterDescription
qCase-insensitive substring match on document ID. report matches report-2024, annual-report, etc.
tagsFilter by document tags. Accepts multiple values.
tags_matchHow to match tags (default: any_strict). See below.
limit / offsetPagination. Default limit is 100.

tags_match modes:

ModeBehaviour
any_strict (default)Document must have at least one of the specified tags. Untagged docs excluded.
anySame as any_strict but also includes untagged documents.
all_strictDocument must have all specified tags. Untagged docs excluded.
allSame as all_strict but also includes untagged documents.

Document Response Format

{
"id": "meeting-2024-03-15",
"bank_id": "my-bank",
"original_text": "Alice presented the Q4 roadmap...",
"content_hash": "abc123def456",
"memory_unit_count": 12,
"created_at": "2024-03-15T14:00:00Z",
"updated_at": "2024-03-15T14:00:00Z"
}

Next Steps