What's new in Hindsight 0.7.1
Hindsight 0.7.1 is a fast follow-up to 0.7.0 focused on reliability — including a critical fix for memory corruption on oversized retains and tougher consolidation — plus a few useful tuning knobs (LLM reasoning effort, per-provider reranker timeouts) and a bit more i18n polish. All users on 0.7.0 should upgrade, especially anyone ingesting large documents.
- Critical: Oversized Retain Corruption Fix: Large documents could lose memories because concurrent child operations cascade-deleted each other's
memory_units. - Tougher Consolidation: Indefinite retry with backoff, dedup-by-bank, and priority-based scheduling so the worker keeps making progress under pressure.
- New Tuning Knobs: Configure LLM reasoning effort and per-provider reranker HTTP timeouts from env.
- More Chinese Locales: Additional zh variants and a polish pass on the existing translation.
Critical: Oversized Retain Corruption Fix
When you retained a single document that was too large for one LLM call, 0.7.0 could end up saving only a fraction of the memories that should have been extracted — different chunks of the same document ran in parallel and overwrote each other's results. 0.7.1 makes large documents process sequentially within a single worker so all extracted memories are kept.
If you've been ingesting large documents on 0.7.0, upgrade and re-retain the affected documents to recover the lost memories.
Tougher Consolidation
The consolidation worker gets two related fixes that together keep it healthy under load:
- Indefinite retry with backoff. Previously, a transient failure during consolidation could cause the worker to give up on the bank until the next trigger. 0.7.1 retries consolidation indefinitely with exponential backoff and a per-bank deduplication guard so the same bank can't be picked up twice concurrently and waste cycles.
- Priority-based bank scheduling. When multiple banks have pending consolidation, the worker now pulls them in priority order. Higher-priority banks get their consolidation latency back under control in mixed-workload deployments instead of waiting behind large, low-priority banks.
New Tuning Knobs
Two new environment variables let you tune model behaviour without code changes:
HINDSIGHT_API_LLM_REASONING_EFFORT— sets the reasoning effort for reasoning models (e.g.low,medium,high). Useful for trading latency against quality on extraction and reflect calls. Thanks to @s9rkn.- Per-provider reranker HTTP timeouts — separate timeout env vars per reranker provider so you can give slower providers more headroom without globally relaxing the timeout for everyone.
More Chinese Locales
Building on the 8-language Control Plane shipped in 0.7.0:
- Added additional Chinese locale variants. Thanks to @MapleEve.
- A polish pass on the existing
zhtranslation for more natural phrasing.
Other Notable Changes
- Recall recency anchored to query timestamp so recall on historical queries returns the same results regardless of when you re-run them. Thanks to @Sanderhoff-alt.
- Codex OAuth embeddings token refresh — the Codex OAuth embeddings provider added in 0.7.0 now refreshes its access token automatically, so long-running embedding workloads don't fail when the token expires. Thanks to @DK09876.
graph_maintenancefilter in the Control Plane operations dropdown so you can filter for graph-repair runs alongside other operation types.- OpenClaw: flushes un-retained turns on
session_end, no longer silently skips dispatch on synthetic-main + static-banking setups, and labels "Current time" as UTC in injected memory context. - Docs: a docker-compose example for running Hindsight against a local llama.cpp sidecar, Codex Docker recipe for the Claude Code integration, and BM25 backends table now lists PGroonga and the pg_search tokenizer option.
