What's new in Hindsight 0.7.0
Hindsight 0.7.0 is a major release that makes Hindsight faster, leaner, and more broadly deployable. Multilingual deployments get first-class language tuning and a polyglot CJK backend, the graph layer is roughly half the size it used to be and self-heals after deletes, the Control Plane is available in eight languages, consolidation gets meaningfully better at maintaining a clean observation set, and banks can now run on horizontally scaled Postgres clusters with true BM25 search.
- Better Multilingual Search: Per-language BM25 tuning on the native backend and a new polyglot PGroonga backend for mixed-language banks.
- Leaner, Self-Healing Graph: Roughly half the storage and an automatic graph repair after deletes.
- Control Plane in 8 Languages: English, Spanish, French, German, Portuguese, Japanese, Korean, and Chinese.
- Smarter Consolidation: Fewer duplicate observations, plus targeted re-consolidation by tag scope.
- New Embedding Options: ZeroEntropy and Codex-OAuth-backed OpenAI embeddings.
- ParadeDB pg_search for Citus: True BM25 on horizontally scaled Postgres clusters.
Better Multilingual Search
Hindsight already detected the language of incoming content and preserved it through retain/recall — but BM25 search was effectively English-only. The native tsvector index applied English stemming regardless of the bank's actual language, which hurt keyword recall on non-English content and was particularly painful for CJK languages that need bigram-style tokenization to work at all.
0.7.0 fixes this on two fronts:
- The native BM25 backend now uses a configurable language dictionary, so a Spanish, French, or German bank gets the right stemming out of the box.
- A new opt-in PGroonga backend uses a single polyglot index that handles English, Chinese, Japanese, Korean, and more simultaneously — the right choice for mixed-language banks. A docker-compose recipe is included.
You can also pin the fact extractor's output language independently of the index language, which is useful when you want to retain in one language and consolidate or recall in another.
See the Multilingual Support docs for the full configuration matrix and trade-offs between backends.
Leaner, Self-Healing Graph
The graph layer that powers entity expansion and link-based recall gets a significant cleanup in 0.7.0 — together cutting its storage footprint roughly in half and keeping it correct over time.
Smaller graph. A large portion of the graph was made up of pairwise entity links that retain wrote eagerly but recall never actually read — entity expansion has long traversed the underlying entity table directly. 0.7.0 stops materializing those links and derives them on demand for the /graph and /stats endpoints. On a 10k-unit bench, that alone removed about half of all graph rows and freed ~190 MB of table and index space. A separate index audit then dropped nine indexes that were either unused or fully covered by composite indexes the query planner already preferred. API response shapes don't change — no SDK regeneration needed.
Self-healing after deletes. When a memory unit is deleted, the units it was connected to used to silently lose links and stay permanently under-capped, which slowly eroded graph-expansion recall. 0.7.0 detects this and asynchronously re-fills those neighbours back up to the configured cap, with no impact on the hot retain path. Banks that see regular deletes — document re-ingest, GDPR erasure, periodic cleanup — keep a healthy graph indefinitely.
Control Plane in 8 Languages
The Control Plane UI is now fully internationalized, with a language switcher in the header. Pick from:
- English, Spanish, French, German, Portuguese
- Japanese, Korean, Chinese
Locale-prefixed routing keeps URLs predictable, and a CI check fails the build if a future change introduces an untranslated user-facing string — so the translations stay in sync as the UI evolves.
Smarter Consolidation
Observation consolidation gets better at maintaining a clean, deduplicated set — and gives you finer control over when it runs.
Fewer duplicate observations. The consolidator was occasionally producing near-duplicate sibling observations instead of merging them. 0.7.0 tunes its behaviour to strongly prefer updating an existing observation over creating a new one when the content overlaps. Per-bank observations_mission settings now cascade more cleanly into the consolidator's behaviour too, so banks with a custom mission consolidate the way you'd expect.
Targeted re-consolidation. The consolidate endpoint accepts a new observation_scopes parameter so you can re-consolidate only memories matching specific tag combinations, instead of rebuilding the entire bank. Useful when a tag-policy change invalidates a slice of observations.
Opt out of auto-consolidation. A new per-bank enable_auto_consolidation flag disables the automatic consolidation that normally runs after retain. Deployments that prefer to consolidate on a schedule or on-demand can now opt out cleanly.
New Embedding Options
Two new ways to embed:
- ZeroEntropy is a new embeddings provider — set
HINDSIGHT_API_EMBEDDINGS_PROVIDER=zeroentropyand pick a model. - Codex OAuth embeddings lets the existing OpenAI embeddings provider authenticate via your Codex OAuth token instead of a separate API key. Convenient if you're already running Codex and want to reuse the same credentials for embeddings.
ParadeDB pg_search for Citus
Hindsight now supports ParadeDB pg_search as a BM25 backend. This is the option to pick if you're running on a Citus distributed Postgres cluster — pg_search is the only true-BM25 extension that works across Citus shards, so horizontally scaled deployments no longer have to fall back to the simpler native tsvector backend.
Set it via:
HINDSIGHT_API_TEXT_SEARCH_EXTENSION=pg_search
A ready-to-run docker-compose recipe ships under docker/docker-compose/pg_search/. You can also tune tokenization for your corpus via HINDSIGHT_API_PG_SEARCH_TOKENIZER (default, whitespace, raw, etc.).
Other Notable Changes
- Clear mental model endpoint resets a mental model's content so the next refresh re-synthesizes from scratch — useful for periodic compaction of delta-mode models that have drifted over many incremental refreshes. Exposed in the Control Plane and in the SDKs.
- Alibaba Qwen3 reranker support in the local cross-encoder path.
- Right Agent integration — a new entry in the integrations catalog.
- OpenCode Go LLM provider for the
opencode-goruntime. - Ollama Cloud provider with corrected authentication for cloud endpoints.
- Per-operation LLM concurrency caps keep parallel workloads from overloading a provider.
- Reranker score normalization auto-detects pre-normalized scores and switches strategies accordingly, fixing a class of subtle ranking regressions.
- Anthropic compatibility —
temperatureis no longer sent (the latest models reject it). - Webhook deliveries are deduplicated so retain batches don't trigger duplicate notifications.
- Oversized retain items are split before the call, so very large payloads no longer fail the whole batch.
- Disabled reflect tools are hidden from the agent's system prompt, not just gated at invocation — so the agent never sees a tool it can't use.
- User-defined label entities are no longer subject to fuzzy resolution, so user-defined identities stay stable.
- Mental model history is capped to prevent JSONB overflow, and full refreshes correctly rebase pending delta baselines.
- Control Plane sessions now verify a signed cookie instead of just checking for its presence — a meaningful auth hardening.
- Windows embedded UI startup correctly resolves
npx, fixing a long-standing launch failure. - API access logging can be enabled via an env var without code changes.
- Gzip middleware keeps compressed graph responses parseable.
- Tag groups and triggers are preserved correctly when updating tags.
