Skip to main content

What's new in Hindsight 0.5.3

· 4 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.5.3 focuses on multi-tenant fairness and operational resilience. A new consolidation round limit prevents a single bank from monopolizing worker slots, mental models get smarter delta-based refreshes, the OpenAI Agents SDK gains first-class support, and a batch of fixes hardens file retain, migration paths, and GPU reranking.

Consolidation Round Limit

When a bank has thousands of unconsolidated memories — after a bulk ingest, for example — the consolidation job could hold its worker slot for minutes, starving other banks. A new HINDSIGHT_API_CONSOLIDATION_MAX_MEMORIES_PER_ROUND setting (default: 100) caps how many memories a single consolidation round will process. When the limit is hit, the job yields its slot and re-queues itself so other banks get their turn. Mental model refreshes are deferred to the final round to avoid redundant work.

The setting is configurable per bank via the config API, so high-priority banks can run unlimited (0) while shared deployments keep the default cap.

OpenAI Agents SDK Integration

The new hindsight-openai-agents package provides FunctionTool instances for retain, recall, and reflect that plug directly into the OpenAI Agents SDK Agent. Configure once with configure(), create tools anywhere, and the SDK's async runtime handles the rest. Selective tool inclusion, tag-based scoping, and global configuration are all supported out of the box.

from hindsight_openai_agents import configure, create_tools

configure(base_url="http://localhost:8888", bank_id="my-bank")
tools = create_tools()
agent = Agent(name="assistant", tools=tools)

Mental Model Delta Refresh

Mental model refreshes now use structured delta operations (insert, update, delete, reorder sections) instead of regenerating the entire document from scratch. This means smaller, cheaper LLM calls, less churn in observation-backed models, and a clearer history trail showing exactly what changed and why. The UI in the control plane has also been updated to show staleness signals and per-refresh history snapshots.

CLI Connection Profiles

The CLI now supports named connection profiles via -p/--profile. Define profiles in ~/.hindsight/profiles.toml with different base URLs, API keys, and default banks, then switch between local, staging, and production with a single flag. No more juggling environment variables.

Worker Fairness & Reliability

Three changes improve how the worker distributes and handles tasks:

  • Per-tenant fair rotation. claim_batch now rotates through tenant schemas round-robin so no single tenant monopolizes worker slots, even under skewed load.
  • Idempotent task submission. submit_task gracefully handles the case where a payload is already set, preventing duplicate or invalid submissions from concurrent callers.
  • Extension-driven requeue. A new DeferOperation exception lets extensions explicitly request that a task be re-queued instead of failing, useful for rate-limit back-off or dependency-wait patterns.

Reliability Fixes

  • Migration chain restored. Upgrades from v0.4.22 to v0.5.x now follow the correct migration path without manual intervention.
  • Apple Metal GPU crash. The jina-mlx reranker serializes Metal GPU operations to prevent SIGSEGV crashes on macOS.
  • Ollama think mode. Native Ollama calls now explicitly disable "think" mode, fixing responses that included reasoning traces in the output.
  • TEI reranker timeout. The TEI reranker timeout is now configurable (HINDSIGHT_API_RERANKER_TEI_HTTP_TIMEOUT), with better error messages when it fires.
  • File retain. Fixed upload failures and orphaned retains, including proper handling of the timestamp field in the file retain API.
  • Orphan observations. Consolidation no longer creates orphan observations when a source memory is deleted mid-run.
  • Mental model max_tokens. The configured max_tokens setting is now correctly forwarded during mental model refresh.
  • Control plane bank IDs. Bank IDs with special characters are now correctly encoded in URLs end-to-end.
  • LLM retry defaults. Default LLM max retries reduced from 10 to 3 to avoid long delays when a provider is failing.
  • Recall budget per bank. The recall thinking-budget mapping is now configurable per bank for fine-grained cost control.
  • Consolidation drilldown. Failed consolidation counts are now visible with drilldown in the control plane.
  • JSON log tenant field. JSON logs now include a tenant identifier and support a configurable field allowlist.

Feedback and Community

Hindsight 0.5.3 is a drop-in replacement for 0.5.2 with no breaking changes to the core API.

Share your feedback:

For detailed changes, see the full changelog.