Skip to main content

What's new in Hindsight 0.8.4

· 5 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.8.4 builds on 0.8.3 with a focus on reliability and control: keep memory operations running when a provider degrades with multi-LLM failover and round-robin routing, tune retrieval more precisely with finer recall controls, keep knowledge current with scheduled mental-model refresh, and get more accurate token accounting. It also lands a batch of data-integrity and robustness fixes. Self-managed deployments should upgrade.

Multi-LLM Failover & Routing

You can now configure multiple LLMs and have Hindsight route across them automatically, with failover when one provider errors and round-robin to spread load. A single provider outage or rate-limit no longer has to stall memory operations.

Each member of a multi-LLM configuration can be tuned individually — including LiteLLM Router settings, Vertex AI service account keys, and per-member Vertex project and region — so you can mix providers and accounts with the controls each one needs.

This release also adds two new OpenAI-compatible providers, Requesty and Atlas Cloud, so you can point Hindsight at them directly.

Finer Recall Control

Several changes give you more precise control over what recall returns and how it's ranked:

  • Per-stage scores and two-level filtering. Recall now exposes structured per-stage scores, and min_scores supports two-level filtering — so you can require minimum relevance at each retrieval stage instead of a single blunt cutoff. (Now correctly threaded through the maintained Python and TypeScript SDK wrappers, too.)
  • Configurable recency decay. Choose how strongly recency influences ranking — linear, exponential, or none.
  • Observation-aware dedup. A new prefer_observations option drops raw facts that have already been superseded by consolidated observations, so results favor the synthesized view.
  • Exact filtering of global observations. You can now exactly filter untagged/global observations.

Scheduled Mental-Model Refresh

Mental models can now be refreshed on a cron schedule. Instead of relying solely on activity-triggered refreshes, you can keep a bank's consolidated knowledge current on a cadence you define.

Sharper LLM Control & Accounting

  • Per-operation temperature. Set different temperatures per operation for tighter control over response style across retain, recall, and reflect.
  • Per-scope timeouts and retries. Per-scope LLM timeout and retry policies are now applied consistently across providers.
  • Richer token accounting. Usage totals now include cached and "thoughts" tokens where supported, and reasoning tokens are tracked for OpenAI-compatible providers — for more accurate cost reporting.
  • More reliable structured output. Anthropic strict structured output now goes through forced tool use, and reflect's structured-output retries are capped to avoid runaway retry loops.
  • Configurable upload size. The maximum upload size is now configurable in the control plane.
  • Faster, more scalable bank stats. Bank statistics now scale to large deployments, and an on-demand refresh lets you force an exact recount when you need it.
  • Full memory details in the explorer. The control plane's memory explorer now shows the complete details of each memory, making inspection and debugging easier.

Data-Integrity & Robustness Fixes

The reason to upgrade — a set of fixes that protect what gets stored and keep background work healthy:

  • Append-mode chunking. Retain append mode now merges JSON arrays so conversation-aware chunking is preserved.
  • Observation search vectors. Search vectors are now maintained on observation insert/update, keeping search and consolidation correct.
  • Consolidation safety. Consolidation now keeps items by default when a dedup action is missing, preventing unintended drops.
  • Migration bootstrap. Migration bootstrap now respects the configured vector extension.
  • Accurate bank stats. Bank stats cache is correctly invalidated after deletes and clears, so counts stay accurate.
  • Graph maintenance deadlocks. Concurrent inserts no longer trigger enqueue deadlocks during graph maintenance.

Other notable fixes:

  • Token usage on parse failure. Provider token usage is now preserved even when tool-call argument parsing or validation fails, so cost reporting stays accurate.
  • Reflect JSON envelopes. Reflect now unwraps JSON-wrapped answers returned by some models.
  • Cleaner error paths. List endpoints reject negative limit/offset with a 422 instead of a server error, async operations return 404 when a bank doesn't exist, and PATCH bank / dry-run extract no longer create banks unintentionally.
  • Retain error messages. Retain error summaries now preserve the underlying exception message for easier debugging.