What's new in Hindsight 0.6.1

May 8, 2026 · 5 min read

Hindsight Team

Hindsight 0.6.1 is a follow-up to 0.6.0 with two new LLM provider integrations (z.ai and a LiteLLM router with fallback chains), an AlloyDB ScaNN vector index for high-performance recall, map-type entity labels for structured entity extraction, an optional read-only backend for splitting recall traffic from writes, and access-key login for the Control Plane. Several reliability fixes also land — most notably retry-storm prevention in fact extraction and macOS daemon-mode compatibility with PyTorch MPS.

New LLM Providers: z.ai (智谱) and a LiteLLM router with automatic fallback chains.
AlloyDB ScaNN Vector Index: High-performance vector index for AlloyDB deployments.
Map-Type Entity Labels: Structured entity groups instead of flat string labels.
Read-Only Recall Backend: Route recall to a replica while writes stay on the primary.
Control Plane Access-Key Login: Lightweight auth for the admin UI without full SSO.
Bank Dropdown Memory Stats: The bank selector now shows per-bank memory counts at a glance.
Reliability Fixes: Fact extraction retries, macOS daemon, retain memory pressure, and more.

New LLM Providers

Two LLM-side improvements ship in 0.6.1:

z.ai (智谱) is now a first-class provider. Set HINDSIGHT_API_LLM_PROVIDER=zai and pick a model — the default uses the free-tier-friendly glm-4.5-flash so you can try it without a paid plan. Thanks to @Burgunthy for the contribution.
LiteLLM router is a new provider that wraps multiple LLM endpoints behind a single configuration with automatic fallback chains. If your primary model rate-limits or fails, the router transparently falls over to the next candidate, so retain/reflect stays available without you wiring retry logic into your own application.

AlloyDB ScaNN Vector Index

Google's AlloyDB ScaNN extension brings a tree-based ANN index with strong recall/throughput trade-offs at scale. Hindsight now detects ScaNN at install time and uses it as the vector index when available — large deployments running on AlloyDB get faster recall without changing schemas or queries. Thanks to @can1357 for landing this.

Map-Type Entity Labels

Entity extraction can now produce structured groups in addition to flat string labels. For example, an address entity can be extracted as a structured map of {street, city, country, postal_code} fields instead of a single concatenated string. This makes downstream filtering and joining far cleaner — especially for facts about people, places, and organizations that have well-defined sub-fields.

Set the entity label type to map in your bank's entity-label config to opt in.

Read-Only Recall Backend

Recall can now be configured to use a separate, read-only database backend, while retain and reflect continue to write through the primary. For deployments running PostgreSQL with read replicas, this lets recall traffic scale horizontally without contending with write workload — and keeps the read path resilient to primary failovers.

Configure it via HINDSIGHT_API_READ_DATABASE_URL. When unset, recall uses the primary as before.

The Control Plane now supports an optional access-key login flow. This is a lightweight alternative to setting up full SSO when you just want to gate the admin UI behind a shared secret — useful for staging environments, internal tooling, or single-team deployments. Middleware enforces the access key on every protected route and the login UX has been hardened with clearer errors and proper redirect handling.

The Control Plane's bank selector now surfaces per-bank memory counts directly in the dropdown, so you can pick the right bank without first navigating into each one. Useful when you're juggling many banks across users, agents, or environments — the counts give you an immediate sense of which banks are active and which are empty.

Bank dropdown with memory stats

Reliability Fixes

Several fixes improve correctness and robustness:

Fact extraction retry storms: Removed multiplicative retry layers in fact extraction. Previously, a single transient LLM error could cascade into many retries; now retries are bounded at a single layer.
macOS daemon + PyTorch MPS: Daemon mode replaces os.fork() with subprocess.Popen, fixing a long-standing crash when running with PyTorch MPS on Apple Silicon.
Retain memory pressure: Content references are now cleared after use, helping long-running workers avoid OOM on large retain batches.
batch_retain error propagation: Failed child operations now propagate their error_message to the parent, so failures surface a meaningful reason instead of a generic error.
Reflect document metadata: Reflect now reads document metadata from the original retain parameters, so reflect outputs include the correct document context.
TypeScript client version: CLIENT_VERSION is now derived via tsup define so the published package always reports the correct version.
Docker --user overrides: /home/hindsight is now chmod 755, so containers run cleanly under --user UID:GID.
Worker compatibility with older databases: The worker probes pg_proc before calling the optional schemas_with_pending_work() function, so older PostgreSQL deployments don't crash the worker.
Meta package pinning: hindsight-api, hindsight-all, and hindsight-all-slim are now hard-pinned to the matching hindsight-api-slim version, preventing stale slim installs after pip install -U.

Other Notable Changes

CLI --strategy flag: memory retain-files accepts a new --strategy flag for selecting how files are split before retention.
Worker fanout scoping: Progress-stats fanout is now scoped to schemas with pending work, eliminating wasted polling on idle tenants.

New LLM Providers​

AlloyDB ScaNN Vector Index​

Map-Type Entity Labels​

Read-Only Recall Backend​

Control Plane Access-Key Login​

Bank Dropdown Memory Stats​

Reliability Fixes​

Other Notable Changes​