What's new in Hindsight 0.5.5

April 28, 2026 · 6 min read

Hindsight Team

warning

0.5.5 contains a regression that can cause 0 facts to be extracted during retain. The JSON schema simplification change was reverted in 0.5.6. Please upgrade to 0.5.6.

Hindsight 0.5.5 ships a brand‑new LlamaParse file parser for PDFs/DOCX/etc., a redesigned Mental Models page in the control plane, a new Pipecat voice‑agent integration, full Windows support for the embedded runtime, and a wave of LLM‑provider compatibility fixes (DeepSeek, Bedrock Converse, Ollama, LiteLLM streaming). It also includes one small breaking change worth a 30‑second read before you upgrade.

Breaking change: GET /banks/{bank_id}/profile no longer auto‑creates banks.
LlamaParse file parser: high‑fidelity PDF/DOCX → markdown via LlamaIndex Cloud.
Mental Models — List view: new default split‑pane view with server‑side tag filtering.
Pipecat integration: persistent memory for voice agents.
Windows support for embedded mode.
LLM compatibility fixes: DeepSeek, Bedrock Converse, Ollama, LiteLLM streaming.

Breaking change: bank profile no longer auto‑creates banks

GET /v1/default/banks/{bank_id}/profile previously created the bank if it didn't exist and returned a default profile. That side effect made it easy to accidentally produce empty banks (e.g. from a typo in bank_id) and complicated tenancy / lifecycle reasoning. As of 0.5.5, the endpoint returns 404 when the bank doesn't exist — same as every other read endpoint.

Retain still works on non‑existing banks. This change does not affect the write path: calling retain (or any of the memory‑write endpoints) against a bank that doesn't exist still creates it implicitly, exactly as before. You don't need to call POST /v1/default/banks first, and you don't need to "warm up" a bank with a profile call. If you want a bank to exist, just write to it.

What you might need to change. Only callers that treated GET /profile as an idempotent "create or fetch" call need to update. Replace the bootstrap profile call with either an explicit POST /v1/default/banks (if you specifically want a bank with no memories) or simply your first retain call (recommended — same end state, one fewer round trip).

LlamaParse file parser

Hindsight already supported file uploads — every file you retain is converted to markdown, chunked, and stored as memories. Until now you had two parser choices: the bundled markitdown parser (good for simple files, runs in‑process) and Iris (Vectorize's hosted parser, better for complex layouts). 0.5.5 adds a third option: LlamaParse, the LlamaIndex Cloud document parsing API.

LlamaParse handles the messy real‑world PDFs that markitdown stumbles on — multi‑column papers, scanned receipts, slide decks with embedded tables, contracts with form fields. It's a paid service (you bring your own LlamaIndex Cloud API key), but it's the highest‑fidelity option for these kinds of documents.

Enabling it. Set the parser via configuration (env var HINDSIGHT_API_FILE_PARSER plus your LlamaParse API key):

HINDSIGHT_API_FILE_PARSER=llama_parse
HINDSIGHT_API_FILE_PARSER_LLAMA_PARSE_API_KEY=llx-...

Then upload files exactly like before — the parser is selected at the engine level, so no client changes are needed. Polling interval and timeout are tunable for very large or slow documents.

Mental Models — List view

The Mental Models page in the control plane has a new default List view: a sidebar of mental models on the left (each entry shows name, source query, and last‑refreshed time) and the rendered markdown content on the right — file‑browser style.

The page also gains a server‑side tag filter with autocomplete from the actual tags present on your mental models, plus an any / all match toggle when you select more than one tag. The autocomplete is powered by a new ?source=mental_models parameter on the existing GET /v1/default/banks/{bank_id}/tags endpoint, so the same primitive can list memory tags or mental‑model tags depending on caller intent.

The previous "Table" view has been removed; the card "Dashboard" view stays as a secondary toggle.

Pipecat voice agent integration

A new Pipecat integration brings Hindsight memory into the voice‑AI pipeline: store every turn, recall relevant context, and inject it into the LLM prompt. Available as hindsight-pipecat on PyPI. Thanks to @benfrank241 for the contribution.

Windows support for embedded mode

hindsight-embed (and the embedded hindsight-all) now run cleanly on Windows. The daemon manager handles Windows process groups, atomic metadata writes, and CI installs the Rust CLI alongside the Python wheel for sibling‑binary verification. If you've been waiting on this to evaluate Hindsight on a Windows dev box, give it a try.

LLM compatibility fixes

A handful of fixes make Hindsight play nicer with a wider range of LLM providers:

DeepSeek is now a first‑class provider; tool‑calling quirks are handled.
Amazon Bedrock Converse previously rejected our causal‑relations Pydantic model because of unsupported numeric constraints; that's been simplified.
Ollama / Hermes / MiniMax sometimes wrap JSON responses in markdown code fences even with json_object response format. The fact‑extraction parser now strips those before parsing, so retain stops silently producing 0 facts on these providers.
LiteLLM streaming conversations are stored correctly (a streaming chunk‑aggregation bug was breaking retain on streamed responses).
JSON schemas sent to LLMs have been simplified to drop $ref/$defs/anyOf/const/oneOf constructs that Ollama's grammar engine — and other strict structured‑output engines — don't fully support. This change was reverted in 0.5.6 — the inlined schema caused some LLMs to echo the schema structure instead of producing valid responses, resulting in 0 facts extracted.

Other notable changes

Cohere embedding output dimensions are now configurable.
Force IPv4 for Gemini embeddings in restricted network environments.
Statement timeout for PostgreSQL connections, configurable per deployment.
Operation status API and UI now expose processing and cancelled states; responses include retry count and next retry time.
Reduced retrieval fan‑out during consolidation recall to keep memory usage and latency in check on large banks.
exclude_parents filter on list_operations to suppress parent rollups when scanning leaves.
Document Chunks API + reprocessing UI for fine‑grained inspection and re‑ingestion of uploaded files.
Asyncpg‑style PostgreSQL URLs are accepted by the external DB connection string.
First‑time UI launch no longer asks you to manually confirm the control‑plane install.
MCP delete_memory removed to close an authorization‑bypass.

Feedback and Community

Share your feedback:

For detailed changes, see the full changelog.

Breaking change: bank profile no longer auto‑creates banks​

LlamaParse file parser​

Mental Models — List view​

Pipecat voice agent integration​

Windows support for embedded mode​

LLM compatibility fixes​

Other notable changes​

Feedback and Community​