Skip to main content

What's new in Hindsight 0.4.19

· 6 min read
Nicolò Boschi
Hindsight Team

Hindsight 0.4.19 adds Agno and Hermes Agent integrations, three new retain extraction modes with named per-call strategies, Deno support for the TypeScript client, and a recovery mechanism for consolidation failures.

Agno Integration

hindsight-agno is a new integration package that adds persistent memory to Agno agents using Hindsight's native Toolkit pattern—the same pattern as Agno's built-in Mem0Tools.

pip install hindsight-agno

Pass HindsightTools directly to your agent's tools list:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from hindsight_agno import HindsightTools

agent = Agent(
model=OpenAIChat(id="gpt-4o-mini"),
tools=[HindsightTools(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)],
)

agent.print_response("Remember that I prefer dark mode")
agent.print_response("What are my preferences?")

The toolkit registers three tools the agent can call: retain_memory (store), recall_memory (search), and reflect_on_memory (synthesize). You can include any combination by toggling enable_retain, enable_recall, and enable_reflect.

Per-user bank isolation is built in. If you don't pass a bank_id, the toolkit resolves it from RunContext.user_id automatically—so each user gets their own isolated memory bank with no extra code. A custom bank_resolver callable is also supported for more complex routing:

tools=[HindsightTools(bank_resolver=lambda ctx: f"team-{ctx.user_id}")]

Memory instructions let you pre-recall relevant context at startup and inject it into the agent's system prompt, so the agent starts every conversation with relevant memories already loaded:

from hindsight_agno import HindsightTools, memory_instructions

agent = Agent(
model=OpenAIChat(id="gpt-4o-mini"),
tools=[HindsightTools(bank_id="user-123", hindsight_api_url="http://localhost:8888")],
instructions=[memory_instructions(
bank_id="user-123",
hindsight_api_url="http://localhost:8888",
)],
)

See the Agno integration documentation for the full API reference.

Hermes Agent Integration

hindsight-hermes is a new plugin package for Hermes Agent (NousResearch). It uses Hermes's plugin discovery system—no code changes required—and registers three tools under a [hindsight] toolset.

uv pip install hindsight-hermes --python $HOME/.hermes/hermes-agent/venv/bin/python

Set environment variables and start Hermes:

export HINDSIGHT_API_URL=http://localhost:8888
export HINDSIGHT_BANK_ID=my-agent
hermes

Type /tools to verify the plugin loaded:

[hindsight]
* hindsight_recall - Search long-term memory for relevant information.
* hindsight_reflect - Synthesize a thoughtful answer from long-term memories.
* hindsight_retain - Store information to long-term memory for later retrieval.

Hermes has its own built-in memory tool that writes to local files. Since the LLM will prefer the one it's most familiar with, disable it so Hermes uses Hindsight instead:

hermes tools disable memory

If neither HINDSIGHT_API_URL nor HINDSIGHT_API_KEY is set, the plugin silently skips registration and Hermes starts normally.

See the Hermes integration documentation for setup and troubleshooting.

New Retain Modes

Three additions to retain_extraction_mode give you control over how much LLM work happens per ingested document—from full extraction down to zero.

verbatim — preserve original text, extract metadata only

In the default (concise) mode, the LLM rewrites each chunk into compact extracted facts. verbatim mode skips that rewriting step: each chunk is stored exactly as it appears in the source, one memory per chunk. The LLM still runs, but only to extract entities, temporal information, and location—not to paraphrase the content. The fact text is the original chunk verbatim.

This is useful for RAG-style indexing and benchmarks where preserving the source wording matters, or when downstream systems need to display the exact original text in recall results.

HINDSIGHT_API_RETAIN_EXTRACTION_MODE=verbatim

chunks — zero LLM cost

chunks skips the LLM entirely. Chunks are stored as-is with no entity extraction and no temporal indexing—only embeddings are generated for semantic search. Any entities you pass via RetainContent.entities are used directly. This is the fastest and cheapest retain mode; use it when ingestion speed and cost matter more than structured metadata.

HINDSIGHT_API_RETAIN_EXTRACTION_MODE=chunks

Named retain strategies — mix content types in one bank

Named strategies let you configure multiple extraction profiles on a single bank and select among them at retain time. Any hierarchical config field can be overridden per strategy, including retain_extraction_mode, retain_chunk_size, entity_labels, retain_mission, and more.

Configure strategies via the bank config API:

{
"retain_default_strategy": "conversations",
"retain_strategies": {
"conversations": {
"retain_extraction_mode": "concise",
"retain_chunk_size": 3000
},
"documents": {
"retain_extraction_mode": "verbatim",
"retain_chunk_size": 800
}
}
}

Then specify the strategy per retain call:

# Uses default strategy ("conversations")
client.retain(bank_id, items=[{"content": "Alice joined the team today"}])

# Use document strategy for this item
client.retain(bank_id, items=[{"content": "...document text..."}], strategy="documents")

If no strategy is specified, retain_default_strategy is used. If neither is set, the bank/global config applies directly. Each item in a batch can also carry its own strategy field for fine-grained control.

TypeScript Deno Compatibility

The TypeScript client (hindsight-client) now works in Deno environments. The build was migrated from tsc to tsup, producing dual CJS + ESM output with a proper exports field. The AI SDK and chat integrations were updated in the same pass.

No installation needed — import directly via the npm: specifier:

import { HindsightClient } from "npm:@vectorize-io/hindsight-client";

Import paths and API are otherwise unchanged. No code changes are required if you're migrating an existing project from Node.js to Deno.

Other Updates

Improvements

  • Local reranker performance: FP16 inference (HINDSIGHT_API_RERANKER_LOCAL_FP16=true) and length-sorted bucket batching (HINDSIGHT_API_RERANKER_LOCAL_BUCKET_BATCHING=true) are now available as opt-in flags. Both default to off to preserve existing behavior. FP16 benefits GPU/MPS hardware; bucket batching reduces padding overhead and produces a 36–54% speedup in benchmarks. Also fixes XLM-RoBERTa model loading under Transformers 5.x.

Bug Fixes

  • Prevented silent memory loss when consolidation LLM calls fail. Previously, a batch that exhausted all retries was marked consolidated and permanently excluded from future runs—losing those memories silently. Now: the batch is split in half and retried recursively down to a single item (recovering most transient failures); only individual items that still fail all retries are tracked in a new consolidation_failed_at column. A new API endpoint POST /v1/default/banks/{bank_id}/consolidation/recover resets failed memories so they are picked up on the next consolidation run.
  • Fixed Docker control-plane startup to respect HINDSIGHT_CP_HOSTNAME when constructing the API URL.
  • Database cleanup migration removes orphaned observation memory units—rows left behind before the document delete cascade was fixed in 0.4.18—to prevent inconsistent memory state on existing installations.

Feedback and Community

Hindsight 0.4.19 is a drop-in replacement for 0.4.x with no breaking changes.

Share your feedback:

For detailed changes, see the full changelog.