Skip to main content

What's new in Hindsight 0.4.20

· 7 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.4.20 adds Claude Code and LangGraph integrations, a NemoClaw setup CLI, independent versioning for integration packages, and reflect improvements including fact type filters, mental model exclusion, and a wall-clock timeout—plus a batch of reliability fixes.

Claude Code Integration

hindsight-memory is a new plugin that gives Claude Code persistent long-term memory. It works in interactive sessions and through Claude Code Channels (Telegram, Discord, Slack).

Install from the marketplace:

# Add the Hindsight marketplace and install the plugin
claude plugin marketplace add vectorize-io/hindsight
claude plugin install hindsight-memory

# Configure your LLM provider for memory extraction
# Option A: OpenAI (auto-detected)
export OPENAI_API_KEY="sk-your-key"

# Option B: Anthropic (auto-detected)
export ANTHROPIC_API_KEY="your-key"

# Option C: No API key needed (uses Claude Code's own model)
export HINDSIGHT_LLM_PROVIDER=claude-code

# Start Claude Code — the plugin activates automatically
claude

The plugin uses four Claude Code hooks to manage memory automatically:

  • SessionStart — verifies Hindsight is reachable and optionally starts a local daemon.
  • UserPromptSubmit — auto-recalls relevant memories and injects them as context.
  • Stop — auto-retains the conversation transcript to long-term memory.
  • SessionEnd — cleanup and optional daemon shutdown.

Three connection modes are supported:

  1. External API — point hindsightApiUrl and hindsightApiToken at a remote Hindsight server.
  2. Local daemon — leave hindsightApiUrl empty and the plugin auto-starts a local instance via uvx hindsight-embed@latest. Just provide an LLM API key.
  3. Existing local server — set apiPort to match a server you already run.

Dynamic bank IDs let you isolate memories per agent, project, session, channel, or user—controlled by the dynamicBankId and dynamicBankGranularity settings. The plugin ships as pure Python with zero external dependencies.

See the Claude Code integration documentation for the full configuration reference, and the Claude Code + Telegram + Hindsight blog post for a walkthrough of using it with Claude Code Channels.

LangGraph Integration

hindsight-langgraph adds Hindsight memory to LangGraph agents with three integration patterns.

pip install hindsight-langgraph

Tools — works with both LangChain and LangGraph:

from hindsight_client import Hindsight
from hindsight_langgraph import create_hindsight_tools
from langchain_openai import ChatOpenAI

client = Hindsight(base_url="http://localhost:8888")
tools = create_hindsight_tools(client=client, bank_id="user-123")

model = ChatOpenAI(model="gpt-4o").bind_tools(tools)

Memory nodes — dedicated recall and retain nodes for LangGraph graphs:

from hindsight_client import Hindsight
from hindsight_langgraph import create_recall_node, create_retain_node

client = Hindsight(base_url="http://localhost:8888")

recall = create_recall_node(client=client, bank_id="user-123")
retain = create_retain_node(client=client, bank_id="user-123")

builder.add_node("recall", recall)
builder.add_node("retain", retain)

BaseStore — LangGraph-native checkpoint memory backed by Hindsight:

from hindsight_client import Hindsight
from hindsight_langgraph import HindsightStore

client = Hindsight(base_url="http://localhost:8888")
store = HindsightStore(client=client)

graph = builder.compile(checkpointer=checkpointer, store=store)

await store.aput(("user", "123", "prefs"), "theme", {"value": "dark mode"})
results = await store.asearch(("user", "123", "prefs"), query="theme preference")

Multi-tenant routing is built in—pass bank_id_from_config="user_id" to any node or tool factory, and the bank ID is resolved dynamically from the LangGraph configurable dict at runtime.

See the LangGraph integration documentation for the full API reference.

NemoClaw Integration

hindsight-nemoclaw is a new integration that automates the five-step process of adding Hindsight memory to a NemoClaw sandbox:

npx @vectorize-io/hindsight-nemoclaw setup \
--sandbox my-assistant \
--api-url https://api.hindsight.vectorize.io \
--api-token <your-api-key> \
--bank-prefix my-sandbox

The CLI handles plugin installation, configuration, network policy updates (so the sandbox can reach the Hindsight API), and gateway restart. Use --dry-run to preview changes before applying.

See the NemoClaw integration documentation for the full setup guide.

Independent Integration Versioning

Integration packages—LiteLLM, Pydantic AI, CrewAI, AI SDK, Chat, OpenClaw, LangGraph, and NemoClaw—now have their own version numbers and release lifecycle, decoupled from the core Hindsight server.

Previously, every integration was bumped and published together with each core release, even when nothing changed. This made it hard to tell whether a new version of an integration contained actual changes or was just a no-op bump.

Starting with 0.4.20, each integration is versioned and released independently. What this means for you:

  • Version numbers reflect real changes: when you see a new version of hindsight-langgraph or hindsight-litellm on PyPI/npm, it contains actual updates to that package.
  • Faster integration fixes: a bug fix or feature in one integration ships as soon as it's ready, without waiting for the next core release.
  • Per-integration changelogs: each integration now has its own changelog page at /changelog/integrations/<name> so you can track what changed in the packages you use.

Reflect Improvements

Three additions make reflect more controllable and predictable.

Fact type filtersfact_types restricts which fact types the reflection agent retrieves. Pass a subset of ["world", "experience", "observation"] to limit the search scope and save LLM tokens:

{
"query": "What does the user prefer?",
"fact_types": ["experience", "observation"]
}

Mental model exclusionexclude_mental_models skips mental model search entirely, and exclude_mental_model_ids excludes specific models by ID. These filters also work in mental model triggers, so you can persist them on auto-refreshing models. The reflect agent enforces filters at the tool level—if the LLM tries to call a disabled tool, it receives an error rather than silently returning results from excluded sources.

Wall-clock timeout — reflect operations now enforce a configurable timeout (default: 300 seconds). Previously, a slow LLM provider or high iteration count could cause a reflect call to hang for up to 40 minutes. When the timeout fires, the API returns HTTP 504.

HINDSIGHT_API_REFLECT_WALL_TIMEOUT=300  # seconds

Other Updates

Improvements

  • The hindsight-api package is now runnable directly via uvx hindsight-api thanks to new script entry points.
  • The default MiniMax model has been upgraded from M2.5 to M2.7.
  • The OperationValidator extension now receives richer context when validating operations.
  • OpenAI-compatible client initialization now supports passing query parameters for broader provider compatibility.

Bug Fixes

  • Fixed a memory leak in entity resolution where _pending_stats and _pending_cooccurrences dicts could grow unbounded when exceptions occurred between fact extraction and flush.
  • Fixed startup crash and silent retain failures when the PostgreSQL pg_trgm extension is unavailable. The migration now handles missing pg_trgm gracefully, and entity resolution falls back to full table scan with a warning.
  • Fixed context overflow in reflect by disabling source facts in observation search results.
  • Markdown code fences are now stripped from LLM outputs across all providers, not just local models.
  • Empty recall queries now return a clear 400 error instead of failing with a SQL parameter gap.
  • File retain requests now include authentication headers so uploads work in authenticated deployments.
  • Fixed MCP tool calls failing when MCP_AUTH_TOKEN and TENANT_API_KEY differ.
  • Fixed claude-agent-sdk installation on Linux/Docker environments.
  • LiteLLM integration now falls back to the last user message when no explicit hindsight_query is provided.
  • Fixed non-atomic async operation creation that could produce inconsistent operation records.
  • Fixed orphaned parent operations when a batch retain child fails via unhandled exception.
  • Fixed failures for non-ASCII entity names by ensuring entity IDs are set correctly.
  • Fixed LLM facts labeled "assistant" being stored with the wrong fact type instead of "experience".

Feedback and Community

Hindsight 0.4.20 is a drop-in replacement for 0.4.x with no breaking changes.

Share your feedback:

For detailed changes, see the full changelog.