Skip to main content

What's new in Hindsight 0.7.1

· 3 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.7.1 is a fast follow-up to 0.7.0 focused on reliability — including a critical fix for memory corruption on oversized retains and tougher consolidation — plus a few useful tuning knobs (LLM reasoning effort, per-provider reranker timeouts) and a bit more i18n polish. All users on 0.7.0 should upgrade, especially anyone ingesting large documents.

  • Critical: Oversized Retain Corruption Fix: Large documents could lose memories because concurrent child operations cascade-deleted each other's memory_units.
  • Tougher Consolidation: Indefinite retry with backoff, dedup-by-bank, and priority-based scheduling so the worker keeps making progress under pressure.
  • New Tuning Knobs: Configure LLM reasoning effort and per-provider reranker HTTP timeouts from env.
  • More Chinese Locales: Additional zh variants and a polish pass on the existing translation.

Critical: Oversized Retain Corruption Fix

When you retained a single document that was too large for one LLM call, 0.7.0 could end up saving only a fraction of the memories that should have been extracted — different chunks of the same document ran in parallel and overwrote each other's results. 0.7.1 makes large documents process sequentially within a single worker so all extracted memories are kept.

If you've been ingesting large documents on 0.7.0, upgrade and re-retain the affected documents to recover the lost memories.

Tougher Consolidation

The consolidation worker gets two related fixes that together keep it healthy under load:

  • Indefinite retry with backoff. Previously, a transient failure during consolidation could cause the worker to give up on the bank until the next trigger. 0.7.1 retries consolidation indefinitely with exponential backoff and a per-bank deduplication guard so the same bank can't be picked up twice concurrently and waste cycles.
  • Priority-based bank scheduling. When multiple banks have pending consolidation, the worker now pulls them in priority order. Higher-priority banks get their consolidation latency back under control in mixed-workload deployments instead of waiting behind large, low-priority banks.

New Tuning Knobs

Two new environment variables let you tune model behaviour without code changes:

  • HINDSIGHT_API_LLM_REASONING_EFFORT — sets the reasoning effort for reasoning models (e.g. low, medium, high). Useful for trading latency against quality on extraction and reflect calls. Thanks to @s9rkn.
  • Per-provider reranker HTTP timeouts — separate timeout env vars per reranker provider so you can give slower providers more headroom without globally relaxing the timeout for everyone.

More Chinese Locales

Building on the 8-language Control Plane shipped in 0.7.0:

  • Added additional Chinese locale variants. Thanks to @MapleEve.
  • A polish pass on the existing zh translation for more natural phrasing.

Other Notable Changes

  • Recall recency anchored to query timestamp so recall on historical queries returns the same results regardless of when you re-run them. Thanks to @Sanderhoff-alt.
  • Codex OAuth embeddings token refresh — the Codex OAuth embeddings provider added in 0.7.0 now refreshes its access token automatically, so long-running embedding workloads don't fail when the token expires. Thanks to @DK09876.
  • graph_maintenance filter in the Control Plane operations dropdown so you can filter for graph-repair runs alongside other operation types.
  • OpenClaw: flushes un-retained turns on session_end, no longer silently skips dispatch on synthetic-main + static-banking setups, and labels "Current time" as UTC in injected memory context.
  • Docs: a docker-compose example for running Hindsight against a local llama.cpp sidecar, Codex Docker recipe for the Claude Code integration, and BM25 backends table now lists PGroonga and the pg_search tokenizer option.