What's new in Hindsight 0.5.3
Hindsight 0.5.3 focuses on multi-tenant fairness and operational resilience. A new consolidation round limit prevents a single bank from monopolizing worker slots, mental models get smarter delta-based refreshes, the OpenAI Agents SDK gains first-class support, and a batch of fixes hardens file retain, migration paths, and GPU reranking.
- Consolidation Round Limit: Cap memories per round so no single bank hogs a worker.
- OpenAI Agents SDK Integration: First-class memory tools for the OpenAI Agents SDK.
- Mental Model Delta Refresh: Smaller, structured updates instead of full rewrites.
- CLI Connection Profiles: Switch between environments with named profiles.
- Worker Fairness & Reliability: Tenant-fair batch claiming, idempotent task submission, and extension-driven requeue.
- Reliability Fixes: Migration chain, GPU crashes, Ollama, file retain, and more.
Consolidation Round Limit
When a bank has thousands of unconsolidated memories — after a bulk ingest, for example — the consolidation job could hold its worker slot for minutes, starving other banks. A new HINDSIGHT_API_CONSOLIDATION_MAX_MEMORIES_PER_ROUND setting (default: 100) caps how many memories a single consolidation round will process. When the limit is hit, the job yields its slot and re-queues itself so other banks get their turn. Mental model refreshes are deferred to the final round to avoid redundant work.
The setting is configurable per bank via the config API, so high-priority banks can run unlimited (0) while shared deployments keep the default cap.
OpenAI Agents SDK Integration
The new hindsight-openai-agents package provides FunctionTool instances for retain, recall, and reflect that plug directly into the OpenAI Agents SDK Agent. Configure once with configure(), create tools anywhere, and the SDK's async runtime handles the rest. Selective tool inclusion, tag-based scoping, and global configuration are all supported out of the box.
from hindsight_openai_agents import configure, create_tools
configure(base_url="http://localhost:8888", bank_id="my-bank")
tools = create_tools()
agent = Agent(name="assistant", tools=tools)
Mental Model Delta Refresh
Mental model refreshes now use structured delta operations (insert, update, delete, reorder sections) instead of regenerating the entire document from scratch. This means smaller, cheaper LLM calls, less churn in observation-backed models, and a clearer history trail showing exactly what changed and why. The UI in the control plane has also been updated to show staleness signals and per-refresh history snapshots.
CLI Connection Profiles
The CLI now supports named connection profiles via -p/--profile. Define profiles in ~/.hindsight/profiles.toml with different base URLs, API keys, and default banks, then switch between local, staging, and production with a single flag. No more juggling environment variables.
Worker Fairness & Reliability
Three changes improve how the worker distributes and handles tasks:
- Per-tenant fair rotation.
claim_batchnow rotates through tenant schemas round-robin so no single tenant monopolizes worker slots, even under skewed load. - Idempotent task submission.
submit_taskgracefully handles the case where a payload is already set, preventing duplicate or invalid submissions from concurrent callers. - Extension-driven requeue. A new
DeferOperationexception lets extensions explicitly request that a task be re-queued instead of failing, useful for rate-limit back-off or dependency-wait patterns.
Reliability Fixes
- Migration chain restored. Upgrades from v0.4.22 to v0.5.x now follow the correct migration path without manual intervention.
- Apple Metal GPU crash. The jina-mlx reranker serializes Metal GPU operations to prevent SIGSEGV crashes on macOS.
- Ollama think mode. Native Ollama calls now explicitly disable "think" mode, fixing responses that included reasoning traces in the output.
- TEI reranker timeout. The TEI reranker timeout is now configurable (
HINDSIGHT_API_RERANKER_TEI_HTTP_TIMEOUT), with better error messages when it fires. - File retain. Fixed upload failures and orphaned retains, including proper handling of the
timestampfield in the file retain API. - Orphan observations. Consolidation no longer creates orphan observations when a source memory is deleted mid-run.
- Mental model max_tokens. The configured
max_tokenssetting is now correctly forwarded during mental model refresh. - Control plane bank IDs. Bank IDs with special characters are now correctly encoded in URLs end-to-end.
- LLM retry defaults. Default LLM max retries reduced from 10 to 3 to avoid long delays when a provider is failing.
- Recall budget per bank. The recall thinking-budget mapping is now configurable per bank for fine-grained cost control.
- Consolidation drilldown. Failed consolidation counts are now visible with drilldown in the control plane.
- JSON log tenant field. JSON logs now include a tenant identifier and support a configurable field allowlist.
Feedback and Community
Hindsight 0.5.3 is a drop-in replacement for 0.5.2 with no breaking changes to the core API.
Share your feedback:
For detailed changes, see the full changelog.
