Skip to main content

What's new in Hindsight 0.4.21

· 7 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.4.21 adds three new framework integrations (LlamaIndex, AG2, Strands), a Codex CLI integration, delta retain to skip unchanged content, a LiteLLM provider for 100+ LLM backends, native Windows support, audit logging, and a batch of reliability fixes across the board.

LlamaIndex Integration

hindsight-llamaindex adds persistent memory to LlamaIndex agents with two complementary patterns in a single package.

pip install hindsight-llamaindex

Agent-driven tools — expose retain/recall/reflect as tools the agent decides when to call:

from hindsight_client import Hindsight
from hindsight_llamaindex import HindsightToolSpec

client = Hindsight(base_url="http://localhost:8888")
spec = HindsightToolSpec(
client=client,
bank_id="user-123",
mission="Track user preferences",
)
tools = spec.to_tool_list()

agent = ReActAgent(tools=tools, llm=OpenAI(model="gpt-4o"))

Automatic memory — messages are stored on every turn and recalled as context automatically:

from hindsight_llamaindex import HindsightMemory

memory = HindsightMemory.from_client(
client=client,
bank_id="user-123",
mission="Track user preferences and project context",
)

agent = ReActAgent(tools=tools, llm=llm, memory=memory)

Both patterns support bank auto-creation via mission=, tag-based memory scoping, and both sync and async agents. See the LlamaIndex integration documentation for the full API reference.

AG2 Integration

hindsight-ag2 brings persistent memory to AG2 multi-agent workflows.

pip install hindsight-ag2
from autogen import AssistantAgent, UserProxyAgent, LLMConfig
from hindsight_ag2 import register_hindsight_tools

llm_config = LLMConfig(api_type="openai", model="gpt-4o-mini")

with llm_config:
assistant = AssistantAgent(
name="assistant",
system_message="You are a helpful assistant with long-term memory.",
)
user_proxy = UserProxyAgent(
name="user",
human_input_mode="NEVER",
)

# Register Hindsight memory tools on both agents
register_hindsight_tools(
assistant, user_proxy,
bank_id="my-bank",
hindsight_api_url="http://localhost:8888",
)

result = user_proxy.initiate_chat(
assistant,
message="Remember that I prefer Python over JavaScript.",
)

See the AG2 integration documentation for details.

Strands Integration

hindsight-strands adds retain, recall, and reflect tools to Strands Agents SDK agents.

pip install hindsight-strands
from strands import Agent
from hindsight_strands import create_hindsight_tools

tools = create_hindsight_tools(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)

agent = Agent(tools=tools)
agent("Remember that I prefer dark mode")
agent("What are my preferences?")

See the Strands integration documentation for the full setup guide.

Codex CLI Integration

Hindsight now integrates with OpenAI's Codex CLI. Three Python hook scripts automatically recall relevant context before each prompt and retain conversations after each turn — no changes to your Codex workflow required.

curl -fsSL https://hindsight.vectorize.io/get-codex | bash

The installer guides you through local or cloud mode. Once installed, start a new Codex session and memory is live. See the Codex integration documentation for configuration details.

Delta Retain

Retain now supports delta mode — when upserting a document, Hindsight computes content hashes per chunk and skips LLM fact extraction for chunks that haven't changed. Only new or modified chunks go through the extraction pipeline, significantly reducing LLM costs and processing time.

This is particularly impactful for the most common use case: conversations that get updated in real time. When an integration retains the full conversation transcript on every turn (as Claude Code, Codex, and most chat integrations do), delta retain means only the new messages trigger fact extraction — previous turns are skipped entirely. The same applies to documents that change incrementally, like codebase files or evolving notes.

Delta mode activates automatically when you retain with a document_id that already exists. To force full reingestion of a document, delete it first and retain again.

LiteLLM Provider

A new litellm LLM provider gives Hindsight access to 100+ LLM backends through LiteLLM, including AWS Bedrock, Azure OpenAI, Cohere, Together AI, and many more.

# Azure OpenAI via LiteLLM
export HINDSIGHT_API_LLM_PROVIDER=litellm
export HINDSIGHT_API_LLM_API_KEY=your-azure-api-key
export HINDSIGHT_API_LLM_MODEL=azure/gpt-4o

# Together AI via LiteLLM
export HINDSIGHT_API_LLM_PROVIDER=litellm
export HINDSIGHT_API_LLM_API_KEY=your-together-api-key
export HINDSIGHT_API_LLM_MODEL=together_ai/meta-llama/Llama-3-70b-chat-hf

This complements the existing native providers (OpenAI, Anthropic, Gemini, Groq, etc.) for cases where you need a backend that isn't directly supported. LiteLLM is also available for embeddings and reranking. Also new in this release: Ark and Volcano Engine providers for ByteDance's Doubao models.

Native Windows Support

Hindsight now runs natively on Windows without Docker. Install via pip and start the server directly:

pip install hindsight-all
hindsight-api

This uses the embedded PostgreSQL (pg0) and local models, same as on macOS and Linux. See the Windows installation guide for details.

Audit Logging

A new audit log tracks feature usage across your Hindsight deployment. Every retain, recall, and reflect operation is logged with request duration, bank ID, and operation metadata. Query audit logs via the API to understand usage patterns, identify slow operations, and track adoption across teams.

MCP Improvements

Two additions to the MCP server:

  • Per-user tool filtering — the new filter_mcp_tools hook lets extensions control which MCP tools are visible to each user. Useful for multi-tenant deployments where different users should see different capabilities.
  • Retain strategy selection — the MCP retain tool now accepts a strategy parameter so clients can choose the retain strategy (e.g., verbose, fast) per call.
  • Stateless HTTP mode — the MCP server can now be configured for stateless HTTP operation, improving compatibility with Claude Code and other clients that probe the server with GET requests.

Other Updates

Improvements

  • OpenClaw logging is now configurable and supports structured output.
  • Source fact inclusion in observation search results is now configurable.
  • Integrations no longer use hardcoded default models, relying on configured defaults instead.
  • Claude Code integration now retains full sessions with document upsert and configurable tags, and records tool calls as structured JSON.
  • Per-bank observation limits are now configurable via max_observations_per_scope.

Bug Fixes

  • Per-bank vector index creation now respects the configured vector extension setting.
  • Verbose retain extraction now correctly includes the retain mission context.
  • Codex integration no longer crashes on startup when the API quota is exhausted (HTTP 429).
  • OpenAI embeddings client now correctly parses query parameters included in base_url.
  • Fixed tool_choice handling for Codex and Claude Code when forcing specific tool calls.
  • Control plane UI fixes for recall and data viewing.
  • Recall responses now include associated metadata.
  • Python client update_bank_config() now exposes all configurable fields.
  • JSON-string tags are now coerced to lists for MemoryItem and MCP tools.
  • Docker containers now handle graceful shutdown properly to prevent pg0 data loss on restart.
  • Migration runner can now bypass PgBouncer for advisory locks via MIGRATION_DATABASE_URL.

Feedback and Community

Hindsight 0.4.21 is a drop-in replacement for 0.4.x with no breaking changes.

Share your feedback:

For detailed changes, see the full changelog.