What's new in Hindsight 0.4.19
Hindsight 0.4.19 adds Agno and Hermes Agent integrations, three new retain extraction modes with named per-call strategies, Deno support for the TypeScript client, and a recovery mechanism for consolidation failures.
- Agno Integration: Add persistent memory to Agno agents with a native Toolkit.
- Hermes Agent Integration: Give Hermes agents long-term memory via a zero-config plugin.
- New Retain Modes:
verbatim,chunks, and named strategies for mixed-content banks. - TypeScript Deno Compatibility: Use the TypeScript client in Deno environments.
Agno Integration
hindsight-agno is a new integration package that adds persistent memory to Agno agents using Hindsight's native Toolkit pattern—the same pattern as Agno's built-in Mem0Tools.
pip install hindsight-agno
Pass HindsightTools directly to your agent's tools list:
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from hindsight_agno import HindsightTools
agent = Agent(
model=OpenAIChat(id="gpt-4o-mini"),
tools=[HindsightTools(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)],
)
agent.print_response("Remember that I prefer dark mode")
agent.print_response("What are my preferences?")
The toolkit registers three tools the agent can call: retain_memory (store), recall_memory (search), and reflect_on_memory (synthesize). You can include any combination by toggling enable_retain, enable_recall, and enable_reflect.
Per-user bank isolation is built in. If you don't pass a bank_id, the toolkit resolves it from RunContext.user_id automatically—so each user gets their own isolated memory bank with no extra code. A custom bank_resolver callable is also supported for more complex routing:
tools=[HindsightTools(bank_resolver=lambda ctx: f"team-{ctx.user_id}")]
Memory instructions let you pre-recall relevant context at startup and inject it into the agent's system prompt, so the agent starts every conversation with relevant memories already loaded:
from hindsight_agno import HindsightTools, memory_instructions
agent = Agent(
model=OpenAIChat(id="gpt-4o-mini"),
tools=[HindsightTools(bank_id="user-123", hindsight_api_url="http://localhost:8888")],
instructions=[memory_instructions(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)],
)
See the Agno integration documentation for the full API reference.
Hermes Agent Integration
hindsight-hermes is a new plugin package for Hermes Agent (NousResearch). It uses Hermes's plugin discovery system—no code changes required—and registers three tools under a [hindsight] toolset.
uv pip install hindsight-hermes --python $HOME/.hermes/hermes-agent/venv/bin/python
Set environment variables and start Hermes:
export HINDSIGHT_API_URL=http://localhost:8888
export HINDSIGHT_BANK_ID=my-agent
hermes
Type /tools to verify the plugin loaded:
[hindsight]
* hindsight_recall - Search long-term memory for relevant information.
* hindsight_reflect - Synthesize a thoughtful answer from long-term memories.
* hindsight_retain - Store information to long-term memory for later retrieval.
Hermes has its own built-in memory tool that writes to local files. Since the LLM will prefer the one it's most familiar with, disable it so Hermes uses Hindsight instead:
hermes tools disable memory
If neither HINDSIGHT_API_URL nor HINDSIGHT_API_KEY is set, the plugin silently skips registration and Hermes starts normally.
See the Hermes integration documentation for setup and troubleshooting.
New Retain Modes
Three additions to retain_extraction_mode give you control over how much LLM work happens per ingested document—from full extraction down to zero.
verbatim — preserve original text, extract metadata only
In the default (concise) mode, the LLM rewrites each chunk into compact extracted facts. verbatim mode skips that rewriting step: each chunk is stored exactly as it appears in the source, one memory per chunk. The LLM still runs, but only to extract entities, temporal information, and location—not to paraphrase the content. The fact text is the original chunk verbatim.
This is useful for RAG-style indexing and benchmarks where preserving the source wording matters, or when downstream systems need to display the exact original text in recall results.
HINDSIGHT_API_RETAIN_EXTRACTION_MODE=verbatim
chunks — zero LLM cost
chunks skips the LLM entirely. Chunks are stored as-is with no entity extraction and no temporal indexing—only embeddings are generated for semantic search. Any entities you pass via RetainContent.entities are used directly. This is the fastest and cheapest retain mode; use it when ingestion speed and cost matter more than structured metadata.
HINDSIGHT_API_RETAIN_EXTRACTION_MODE=chunks
Named retain strategies — mix content types in one bank
Named strategies let you configure multiple extraction profiles on a single bank and select among them at retain time. Any hierarchical config field can be overridden per strategy, including retain_extraction_mode, retain_chunk_size, entity_labels, retain_mission, and more.
Configure strategies via the bank config API:
{
"retain_default_strategy": "conversations",
"retain_strategies": {
"conversations": {
"retain_extraction_mode": "concise",
"retain_chunk_size": 3000
},
"documents": {
"retain_extraction_mode": "verbatim",
"retain_chunk_size": 800
}
}
}
Then specify the strategy per retain call:
# Uses default strategy ("conversations")
client.retain(bank_id, items=[{"content": "Alice joined the team today"}])
# Use document strategy for this item
client.retain(bank_id, items=[{"content": "...document text..."}], strategy="documents")
If no strategy is specified, retain_default_strategy is used. If neither is set, the bank/global config applies directly. Each item in a batch can also carry its own strategy field for fine-grained control.
TypeScript Deno Compatibility
The TypeScript client (hindsight-client) now works in Deno environments. The build was migrated from tsc to tsup, producing dual CJS + ESM output with a proper exports field. The AI SDK and chat integrations were updated in the same pass.
No installation needed — import directly via the npm: specifier:
import { HindsightClient } from "npm:@vectorize-io/hindsight-client";
Import paths and API are otherwise unchanged. No code changes are required if you're migrating an existing project from Node.js to Deno.
Other Updates
Improvements
- Local reranker performance: FP16 inference (
HINDSIGHT_API_RERANKER_LOCAL_FP16=true) and length-sorted bucket batching (HINDSIGHT_API_RERANKER_LOCAL_BUCKET_BATCHING=true) are now available as opt-in flags. Both default to off to preserve existing behavior. FP16 benefits GPU/MPS hardware; bucket batching reduces padding overhead and produces a 36–54% speedup in benchmarks. Also fixes XLM-RoBERTa model loading under Transformers 5.x.
Bug Fixes
- Prevented silent memory loss when consolidation LLM calls fail. Previously, a batch that exhausted all retries was marked consolidated and permanently excluded from future runs—losing those memories silently. Now: the batch is split in half and retried recursively down to a single item (recovering most transient failures); only individual items that still fail all retries are tracked in a new
consolidation_failed_atcolumn. A new API endpointPOST /v1/default/banks/{bank_id}/consolidation/recoverresets failed memories so they are picked up on the next consolidation run. - Fixed Docker control-plane startup to respect
HINDSIGHT_CP_HOSTNAMEwhen constructing the API URL. - Database cleanup migration removes orphaned observation memory units—rows left behind before the document delete cascade was fixed in 0.4.18—to prevent inconsistent memory state on existing installations.
Feedback and Community
Hindsight 0.4.19 is a drop-in replacement for 0.4.x with no breaking changes.
Share your feedback:
For detailed changes, see the full changelog.
