Skip to main content

LlamaIndex

Persistent long-term memory for LlamaIndex agents via Hindsight. The hindsight-llamaindex package provides two complementary patterns:

  • HindsightToolSpec — Agent-driven memory tools (retain/recall/reflect)
  • HindsightMemory — Automatic memory via LlamaIndex's BaseMemory interface

Installation

pip install hindsight-llamaindex

Automatic Memory (BaseMemory)

The simplest way to add Hindsight memory to a LlamaIndex agent. Messages are automatically stored on each turn, and relevant memories are recalled and injected as context.

import asyncio
from hindsight_client import Hindsight
from hindsight_llamaindex import HindsightMemory
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

async def main():
client = Hindsight(base_url="http://localhost:8888")

memory = HindsightMemory.from_client(
client=client,
bank_id="user-123",
mission="Track user preferences and project context",
)

agent = ReActAgent(tools=[], llm=OpenAI(model="gpt-4o"))
response = await agent.run("Remember that I prefer dark mode", memory=memory)
print(response)

asyncio.run(main())

How It Works

EventWhat Happens
Agent receives inputaget(input) recalls relevant memories from Hindsight, prepends as system message
Agent produces outputaput(message) retains the message to Hindsight for future recall
New session startsPrevious memories are available via recall; local chat buffer starts empty

HindsightMemory.from_client()

ParameterTypeDefaultDescription
clientHindsightrequiredHindsight client instance
bank_idstrrequiredMemory bank ID
missionstrNoneBank mission — auto-creates bank on first use
contextstr"llamaindex"Source label for retain operations
budgetstr"mid"Recall budget level
max_tokensint4096Max recall tokens
tagslist[str]NoneTags for retain operations
recall_tagslist[str]NoneTags to filter recall
recall_tags_matchstr"any"Tag matching mode
system_promptstr(built-in)Template for memory system message. Must contain {memories}
chat_history_limitint100Max messages in local buffer

Also available: HindsightMemory.from_url(hindsight_api_url, bank_id, ...) for creating without a pre-built client.


Agent-Driven Tools (BaseToolSpec)

For explicit control, expose retain/recall/reflect as tools the agent can choose to call.

Quick Start: Tool Spec

import asyncio
from hindsight_client import Hindsight
from hindsight_llamaindex import HindsightToolSpec
from llama_index.llms.openai import OpenAI
from llama_index.core.agent import ReActAgent

async def main():
client = Hindsight(base_url="http://localhost:8888")

spec = HindsightToolSpec(
client=client,
bank_id="user-123",
mission="Track user preferences",
)
tools = spec.to_tool_list()

agent = ReActAgent(tools=tools, llm=OpenAI(model="gpt-4o"))
response = await agent.run("Remember that I prefer dark mode")
print(response)

asyncio.run(main())

Quick Start: Factory Function

from hindsight_llamaindex import create_hindsight_tools

tools = create_hindsight_tools(
client=client,
bank_id="user-123",
mission="Track user preferences",
)

Selecting Tools

# Via to_tool_list()
tools = spec.to_tool_list(spec_functions=["recall_memory", "reflect_on_memory"])

# Via factory flags
tools = create_hindsight_tools(
client=client,
bank_id="user-123",
include_retain=True,
include_recall=True,
include_reflect=False,
)

Configuration

Set defaults via configure(), override per-call:

from hindsight_llamaindex import configure

configure(
hindsight_api_url="http://localhost:8888",
api_key="your-api-key", # or set HINDSIGHT_API_KEY env var
budget="mid",
tags=["source:llamaindex"],
context="my-app",
mission="Track user preferences",
)

# Now create tools without passing client/url
tools = create_hindsight_tools(bank_id="user-123")

HindsightToolSpec()

ParameterTypeDefaultDescription
bank_idstrrequiredHindsight memory bank to operate on
clientHindsightNonePre-configured Hindsight client
hindsight_api_urlstrNoneAPI URL (used if no client provided)
api_keystrNoneAPI key (used if no client provided)
budgetstrNone"mid"Recall/reflect budget: low, mid, high
max_tokensintNone4096Max tokens for recall results
tagslist[str]NoneTags applied when storing memories
recall_tagslist[str]NoneTags to filter recall results
recall_tags_matchstrNone"any"Tag matching: any, all, any_strict, all_strict
retain_metadatadict[str, str]NoneDefault metadata for retain operations
retain_document_idstrNoneDocument ID for retain. Auto-generates {session}-{timestamp} if not set
retain_contextstr"llamaindex"Source label for retain operations
recall_typeslist[str]NoneFact types: world, experience, opinion, observation
recall_include_entitiesboolFalseInclude entity info in recall results
reflect_contextstrNoneAdditional context for reflect
reflect_max_tokensintNoneMax tokens for reflect (defaults to max_tokens)
reflect_response_schemadictNoneJSON schema to constrain reflect output
reflect_tagslist[str]NoneTags for reflect (defaults to recall_tags)
reflect_tags_matchstrNoneTag matching for reflect (defaults to recall_tags_match)
missionstrNoneBank mission — auto-creates bank on first use

Production Patterns

Bank Mission

Set a mission to give the memory engine context for fact extraction:

# Tools
spec = HindsightToolSpec(
client=client,
bank_id="user-123",
mission="Track user coding preferences, project context, and technical decisions",
)

# Memory
memory = HindsightMemory.from_client(
client=client,
bank_id="user-123",
mission="Track user coding preferences, project context, and technical decisions",
)

The bank is created automatically on first use. If it already exists, creation is silently skipped.

Memory Scoping with Tags

spec = HindsightToolSpec(
client=client,
bank_id="user-123",
tags=["source:chat", "session:abc"], # applied to all retains
recall_tags=["source:chat"], # filter recalls to chat memories
recall_tags_match="any",
)

Error Handling

Both patterns handle errors gracefully — operations are logged and return friendly messages instead of raising exceptions. Agents continue functioning even if memory is unavailable.

Combining Tools + Memory

Use both patterns together for maximum flexibility:

from hindsight_llamaindex import create_hindsight_tools, HindsightMemory

# Automatic memory for context enrichment
memory = HindsightMemory.from_client(client=client, bank_id="user-123")

# Explicit tools for agent-driven reflect
tools = create_hindsight_tools(
client=client,
bank_id="user-123",
include_retain=False, # memory handles retain automatically
include_recall=False, # memory handles recall automatically
include_reflect=True, # agent can still explicitly reflect
)

agent = ReActAgent(tools=tools, llm=llm)

# Pass memory to run()
response = await agent.run("What should I prioritize?", memory=memory)

Requirements

  • Python 3.10+
  • llama-index-core >= 0.11.0
  • hindsight-client >= 0.4.0