What's new in Hindsight 0.5.0

April 7, 2026 · 8 min read

Hindsight Team

Hindsight 0.5.0 introduces the Bank Template Hub for portable configuration, a Constellation graph view in the Control Plane, major retain and recall performance improvements, new LLM providers (llama.cpp, OpenRouter, Google), retain append mode, new framework integrations (AutoGen, OpenCode), and a breaking removal of legacy graph retrieval strategies and the Hermes integration.

Bank Template Hub: Export and import full bank configurations as reusable manifests.
Constellation View: Interactive entity graph visualization in the Control Plane.
Performance: Retain and Recall: 3-phase retain pipeline and capped entity graph expansion for faster queries.
New LLM Providers: Built-in llama.cpp for local inference, OpenRouter, and Google embeddings/reranker.
Retain Append Mode: Concatenate new content onto existing documents with update_mode='append'.
AutoGen Integration: Persistent memory for AutoGen agents.
Breaking Changes: BFS/MPFP strategies removed; Hermes integration dropped.

Bank Template Hub

Banks can now be exported as portable template manifests that capture the full configuration — settings, mental models, and directives — in a single JSON document. Import a manifest into any bank to replicate the setup instantly.

The manifest format captures everything needed to reproduce a bank's behavior:

{
  "version": "1",
  "bank": {
    "retain_mission": "Extract customer issues, resolutions, and sentiment.",
    "enable_observations": true,
    "observations_mission": "Track recurring customer pain points."
  },
  "mental_models": [
    {
      "id": "sentiment-overview",
      "name": "Customer Sentiment Overview",
      "source_query": "What is the overall sentiment trend?",
      "trigger": { "refresh_after_consolidation": true }
    }
  ],
  "directives": [
    {
      "name": "Acknowledge frustration",
      "content": "Always acknowledge frustration before offering solutions.",
      "priority": 10
    }
  ]
}

A dry-run mode (dry_run=True) validates manifests without applying changes, and a schema endpoint returns the full JSON Schema for tooling integration. Mental models are matched by id and directives by name — existing entries are updated, new ones are created.

This is particularly useful for teams standardizing agent configurations or sharing proven bank setups across projects.

See the Bank Template Hub documentation for the full manifest schema and API reference.

Constellation View

The Control Plane now includes an interactive Constellation view that renders memory entity graphs as a zoomable, pannable canvas. Nodes represent entities and their positions are deterministically computed, so the layout is stable across visits.

Links are color-coded by type — semantic (blue), temporal (teal), entity (amber), causal (purple) — and node color intensity maps to connectivity: brighter nodes have more connections. The view supports dark mode automatically and is built on the @chenglou/pretext layout engine for smooth text rendering at any zoom level.

Click any node to navigate to its entity detail page, or zoom out for a birds-eye view of how memories connect across your bank.

Performance: Retain and Recall

3-Phase Retain Pipeline

The retain pipeline has been restructured into three distinct phases to eliminate database lock contention under concurrent load:

Pre-resolve — Entity resolution and semantic ANN search run outside a transaction on read-only connections, preventing slow reads from blocking writes.
Insert — Facts, temporal links, semantic links, and causal links are written atomically in a single transaction, ensuring retrieval consistency.
Post-link — Entity co-occurrence links (used only for UI visualization) are built after the transaction commits, as a best-effort background step.

Previously, the entire pipeline ran inside one long transaction, meaning concurrent agents would queue behind each other during the O(bank_size) ANN lookup. The new structure moves all read-heavy work out of the critical write path, resulting in dramatically lower latency when many agents write simultaneously.

Capped Entity Graph Expansion

On large banks, the entity co-occurrence self-join in graph expansion could produce massive intermediate row counts when seed results reference high-fanout entities (e.g. an entity mentioned 25K+ times). This caused recall latency to spike unpredictably.

The graph expansion query now uses a LATERAL per-entity cap (graph_per_entity_limit, default 200), reducing intermediate rows from potentially millions to at most num_entities × 200. Results are recency-biased via ORDER BY unit_id DESC, which rides the primary key index with no extra sort cost. A timeout fallback (graph_expansion_timeout, default 10s) drops entity expansion entirely and falls back to semantic + causal signals if the query still takes too long.

A new composite index on (entity_id, unit_id) in unit_entities enables index-only scans for the capped subquery, keeping the expansion fast even on very large banks.

New LLM Providers

Built-in llama.cpp

Hindsight now ships with a built-in llama.cpp LLM provider, enabling fully local inference without any external API calls. Set the provider to llama-cpp and point it at a GGUF model file:

HINDSIGHT_API_LLM_PROVIDER=llama-cpp
HINDSIGHT_API_LLM_MODEL=/path/to/model.gguf

This is ideal for air-gapped environments, development setups, or anywhere you want to avoid external API costs. The provider uses the llama-cpp-python bindings and supports all standard Hindsight LLM operations (fact extraction, consolidation, reflect).

OpenRouter

Hindsight now supports OpenRouter as a provider for LLM, embeddings, and reranking. This gives you access to hundreds of models through a single API key:

HINDSIGHT_API_LLM_PROVIDER=openrouter
HINDSIGHT_API_LLM_API_KEY=sk-or-...
HINDSIGHT_API_LLM_MODEL=anthropic/claude-sonnet-4-20250514

OpenRouter is particularly useful for comparing models or accessing providers that don't have a direct Hindsight integration yet.

Google Embeddings and Reranker

Google is now supported as a provider for embeddings and reranking, complementing the existing Gemini LLM provider support.

Retain Append Mode

A new update_mode='append' option for retain lets you concatenate new content onto an existing document instead of replacing it. This is useful for streaming or incremental ingestion scenarios — for example, appending new log entries or conversation turns to an existing document:

from hindsight_client import Hindsight

client = Hindsight(base_url="http://localhost:8888")

# First retain creates the document
client.retain(bank_id="my-bank", content="Day 1 notes...", document_id="journal")

# Subsequent retains append instead of replacing
client.retain(bank_id="my-bank", content="Day 2 notes...", document_id="journal", update_mode="append")

The default update_mode remains replace for backward compatibility.

AutoGen Integration

hindsight-autogen provides persistent long-term memory for AutoGen agents via three FunctionTool wrappers: hindsight_retain, hindsight_recall, and hindsight_reflect.

pip install hindsight-autogen

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from hindsight_client import Hindsight
from hindsight_autogen import create_hindsight_tools

client = Hindsight(base_url="http://localhost:8888")
await client.acreate_bank(bank_id="user-123")

model_client = OpenAIChatCompletionClient(model="gpt-4o")
tools = create_hindsight_tools(client=client, bank_id="user-123")

agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    tools=tools,
)

await agent.run(task="Remember that I prefer dark mode")
await agent.run(task="What are my UI preferences?")

The integration supports memory scoping via tags, fact type filtering, custom metadata, and selective tool inclusion. Use configure() to set global defaults like budget, max tokens, and tag filters.

See the AutoGen integration documentation for the full API reference.

Breaking Changes

Graph Retrieval Simplification

The BFS (breadth-first spreading activation) and MPFP (multi-path fact propagation) graph retrieval strategies have been removed. LinkExpansionRetriever is now the sole graph retrieval algorithm.

LinkExpansionRetriever operates on three precomputed, first-class signals — entity links, semantic kNN links, and causal links — without iterative graph walks or fan-out caps. It is simpler to maintain, faster at query time, and empirically more accurate in our benchmarks.

Migration: If you were explicitly selecting BFS or MPFP via configuration, remove that setting. The default has been LinkExpansionRetriever since 0.4.x, so most deployments require no changes.

Hermes Integration Dropped

The hindsight-hermes integration package has been removed. Hermes Agent now ships with a native Hindsight memory provider built into the framework itself, making the external integration package unnecessary. See the Hermes integration documentation for setup instructions with the native provider.

Other Updates

Features

Added OpenCode persistent memory plugin for the OpenCode editor.
Helm chart now supports persistent volumes for local model cache.
MCP server adds a sync_retain tool and validates UUID inputs.
OpenClaw now supports bankId for static bank configurations.
Recall combined scoring includes proof_count boost for better ranking.
Fact serialization in think-prompt now includes occurred_end and mentioned_at for richer temporal context.

Improvements

Consolidation observation quality improved with structured processing rules for better synthesis.
OpenClaw gains a JSONL-backed retain queue that buffers retain calls locally when the external API is unreachable, preventing data loss during outages.
LiteLLM SDK embeddings encoding_format is now configurable instead of hardcoded.

Bug Fixes

Fixed out-of-range content_index crash in recall result mapping.
Experience fact types are now preserved correctly during normalization instead of being silently reclassified.
clear memories endpoint no longer deletes the bank profile along with the memories.
Embedding daemon clears stale processes on the port before starting, preventing startup failures.
Per-bank vector index migration now respects the configured vector extension.
Timeline group sort uses numeric date comparison instead of locale string comparison.
MCP server auto-coerces string-encoded JSON in tool arguments.
Entity labels structure validated on PATCH to prevent invalid configurations.
Fixed bank_id metric label to be opt-in, preventing OTel memory leak.
Fixed max_tokens handling for OpenAI-compatible endpoints with custom base URLs.
Query analyzer handles dateparser internal crashes gracefully.
Windows compatibility fix for hindsight-embed.
Addressed critical and high severity security vulnerabilities in dependencies.

Feedback and Community

Note: Hindsight 0.5.0 contains breaking changes (BFS/MPFP removal, Hermes integration dropped). If you were using the default graph retrieval strategy, no action is needed. If you were using hindsight-hermes, switch to the native Hermes memory provider.

Share your feedback:

For detailed changes, see the full changelog.

Bank Template Hub​

Constellation View​

Performance: Retain and Recall​

3-Phase Retain Pipeline​

Capped Entity Graph Expansion​

New LLM Providers​

Built-in llama.cpp​

OpenRouter​

Google Embeddings and Reranker​

Retain Append Mode​

AutoGen Integration​

Breaking Changes​

Graph Retrieval Simplification​

Hermes Integration Dropped​

Other Updates​

Feedback and Community​