What's new in Hindsight 0.5.1
Hindsight 0.5.1 is a polish release focused on the CLI and reliability fixes across retain, recall, and the embedded daemon. It also introduces a Cloudflare OAuth proxy for self-hosted deployments, SiliconFlow as a reranker provider, a default bank template applied at bank creation, and a new @vectorize-io/hindsight-all daemon lifecycle package.
- CLI Coverage: Every OpenAPI endpoint and request-body parameter is now reachable from
hindsight. - Cloudflare OAuth Proxy: Drop-in OAuth front door for self-hosted Hindsight.
- Default Bank Template: Apply a template manifest to every newly created bank via a single env var.
- SiliconFlow Reranker: New reranker provider option.
- Daemon Lifecycle Package:
@vectorize-io/hindsight-allfor programmatic daemon start/stop. - Reliability Fixes: Recall, retain, consolidation worker, and embedded daemon fixes.
CLI Coverage
The hindsight CLI has been expanded to cover every endpoint in the OpenAPI spec, including every request-body parameter — no more dropping down to curl for operations the CLI didn't implement yet. Webhook, audit, operation, and memory-history subcommands are now fully documented.
Error output is also easier to debug: API errors now surface the HTTP response body, and the memory list command correctly labels fact types instead of showing [UNKNOWN] for every row.
Cloudflare OAuth Proxy
Self-hosted Hindsight deployments often want OAuth in front of the API without running a full identity provider. The new Cloudflare OAuth proxy is a Cloudflare Worker that sits in front of Hindsight, handles the OAuth handshake, and forwards authenticated requests downstream. It ships with tests, CI, and security hardening, and is intended as a drop-in option for teams already using Cloudflare as their edge.
Default Bank Template
A new HINDSIGHT_API_DEFAULT_BANK_TEMPLATE environment variable applies a template manifest to every newly created bank. This makes it easy to standardize bank configuration — retain mission, observations, mental models, directives — across a deployment without touching application code or post-provisioning hooks.
Pair it with the Bank Template Hub introduced in 0.5.0 to export a known-good bank once and have every new bank come up with the same baseline.
SiliconFlow Reranker
SiliconFlow is now a supported reranker provider alongside the existing options, giving self-hosted and cost-sensitive deployments another hosted reranker to choose from. Configure it like any other provider via HINDSIGHT_API_RERANKER_PROVIDER.
Daemon Lifecycle Package
@vectorize-io/hindsight-all is a new npm package that manages the lifecycle of the all-in-one daemon — start, stop, health checks — so Node-based applications can boot and shut down Hindsight locally without shelling out. Useful for local development, embedded deployments, and test harnesses that need a fresh Hindsight instance per run.
Reliability Fixes
A cluster of fixes addresses edge cases spotted in production:
- Recall RRF ranking. When the reranker is configured as a passthrough, RRF ordering is now preserved end-to-end instead of being silently replaced.
- Async query embeddings. Recall now uses the async batch embeddings path for the query, removing a blocking call on the event loop.
- Idempotent retain chunk inserts. Chunk insertion no longer retries on integrity errors, eliminating a class of duplicate-insert loops under concurrent retain.
- ANN seeds inside a transaction. The retain ANN seeds temp table is now created inside its transaction, fixing an intermittent "relation does not exist" under concurrent load.
- Consolidation slot accounting. The worker now reserves consolidation slots within the configured
max_slotsinstead of occasionally exceeding it. - Embedded daemon lifecycle. Daemon start and stop are now serialized so that starting a new instance can no longer kill a healthy one mid-request.
- macOS local embeddings/reranker. The macOS
FORCE_CPUdefault for local embeddings and reranker has been restored after a regression caused GPU initialization to fail on some Apple Silicon setups. - Reranker error surfacing. Reranker initialization now surfaces the real import error instead of a generic failure, and works around a Transformers 5.x race in the jina-mlx path.
- Reasoning models & Azure OpenAI. LLM calls now send
max_completion_tokens(instead ofmax_tokens) for reasoning models and Azure OpenAI, fixing request failures for recent OpenAI reasoning-family models. - Billing for reflect sub-recalls. Internal recall calls made by reflect are now marked as internal so they don't double-bill against the caller's usage.
Documentation
- Mental model tag filtering now clarifies that tags filter the source memories used during refresh.
- Audit logging is clearly documented as off by default.
- The retain
update_modeparameter is fully documented. - CLI reference gains docs for webhook, audit, operation, and memory history subcommands.
Feedback and Community
Hindsight 0.5.1 is a drop-in replacement for 0.5.0 with no breaking changes to the core API.
Share your feedback:
For detailed changes, see the full changelog.
