What's new in Hindsight 0.4.18
Hindsight 0.4.18 adds compound tag filtering with boolean logic, slimmer Python packages, MiniMax and Jina MLX (Apple Silicon) provider support, per-bank HNSW vector indexes, and a configurable recall query token limit.
- Compound Tag Filtering: Express complex AND/OR/NOT tag predicates with
tag_groups. - Slim Python Packages: Install only what you need with optional ML and embedded-DB extras.
- MiniMax LLM Provider: Use MiniMax-M2.5 (204K context) as your LLM backend.
- Jina MLX Reranker: Native Apple Silicon reranking with no external API required.
- Per-Bank HNSW Indexes: Faster semantic search via dedicated vector indexes per bank.
Compound Tag Filtering
Previous tag filtering supported a flat list of tags with a single match mode (any, all, any_strict, all_strict). That works well for simple scoping but falls short when you need to express predicates like "user A, in one of these workflow steps, but not archived."
tag_groups adds a recursive boolean filter language to recall and reflect. Each group is either a leaf node with a tag list and match mode, or a compound node using and, or, or not. Top-level groups are AND-ed together.
Step filter AND user scope — two groups AND-ed at the top level:
{
"tag_groups": [
{ "tags": ["step:5", "step:8", "step:12"], "match": "any_strict" },
{ "tags": ["user:ep_42"], "match": "all_strict" }
]
}
Nested OR — user must match, plus either step OR priority flag:
{
"tag_groups": [
{ "tags": ["user:alice"], "match": "all_strict" },
{ "or": [
{ "tags": ["step:5"], "match": "any_strict" },
{ "tags": ["priority:high"], "match": "all_strict" }
]}
]
}
Exclusion — user must match, but archived memories are excluded:
{
"tag_groups": [
{ "tags": ["user:alice"], "match": "all_strict" },
{ "not": { "tags": ["archived"], "match": "any_strict" } }
]
}
tag_groups can also be combined with the existing tags / tags_match fields — they are AND-ed together. This makes it easy to add compound filtering incrementally to existing recall calls without rewriting the simpler filters you already have.
See the recall documentation for the full reference.
Slim Python Packages
By default, hindsight-api bundles everything needed to run embeddings and reranking locally—PyTorch, sentence-transformers, transformers, and MLX. These local ML libraries are what make zero-config deployments work out of the box, but they're heavy (several GB of dependencies). If you're already pointing Hindsight at an external embedding or reranking service (TEI, Cohere, ZeroEntropy, LiteLLM), you don't need any of that.
We already shipped a slim Docker image variant for this use case. That same split is now available on PyPI.
The package is now split into optional extras:
| Install | What you get |
|---|---|
pip install hindsight-api-slim | Core API server; no local ML models |
pip install "hindsight-api-slim[local-ml]" | Adds PyTorch, sentence-transformers, MLX (Apple Silicon) |
pip install "hindsight-api-slim[embedded-db]" | Adds pg0-embedded (local PostgreSQL) |
pip install "hindsight-api-slim[all]" | Local ML + embedded DB; equivalent to the old hindsight-api |
pip install hindsight-api | Unchanged; still installs the full stack for backward compatibility |
The base hindsight-api-slim package includes the FastAPI server, all LLM provider integrations (OpenAI, Anthropic, Gemini, Groq, MiniMax, and more), Cohere cloud embeddings/reranking, LiteLLM, MCP, and all API functionality. Only the local ML model dependencies are gated behind the [local-ml] extra.
If you use external embedding and reranking providers, you can now get a significantly leaner Python environment with pip install hindsight-api-slim.
MiniMax LLM Provider
MiniMax is now a supported LLM backend. MiniMax-M2.5 is the default model for this provider, with a 204K-token context window.
HINDSIGHT_API_LLM_PROVIDER=minimax
HINDSIGHT_API_LLM_API_KEY=your_minimax_key
HINDSIGHT_API_LLM_MODEL=MiniMax-M2.5 # optional; this is the default
MiniMax uses an OpenAI-compatible API, so no additional dependencies are required. One provider-specific detail: MiniMax requires temperature values in the (0.0, 1.0] range, so values outside that range are automatically clamped.
Jina MLX Reranker
A large share of Hindsight users run locally on Mac. The default local reranker works fine at low concurrency, but under higher load it becomes the bottleneck—it runs on CPU via PyTorch, which doesn't take advantage of the GPU cores in Apple Silicon at all. This is especially noticeable when many recall requests are in flight simultaneously.
The jina-mlx provider solves this by running the reranker natively on MLX, Apple's ML framework that targets the unified memory and GPU/Neural Engine of M-series chips directly. The result is significantly lower latency per rerank call and much better throughput under concurrent load—without needing an external reranking API.
HINDSIGHT_API_RERANKER_PROVIDER=jina-mlx
The provider runs jinaai/jina-reranker-v3-mlx, a 0.6B multilingual cross-encoder. The model (~1.2 GB) is downloaded from HuggingFace Hub automatically on first startup and cached locally.
It's also a listwise reranker, meaning it scores all candidates in a single forward pass rather than one document at a time. Latency scales sub-linearly with the number of candidates, which helps further under high recall load.
The jina-reranker-v3-mlx model is licensed under CC BY-NC 4.0. Contact Jina AI for commercial usage.
Per-Bank HNSW Indexes
Semantic search now uses dedicated partial HNSW vector indexes per bank and fact type, rather than a single global index.
Previously, when a recall query filtered by bank_id, PostgreSQL's planner would use a B-tree index on that column and then scan the embedding values sequentially. The fact-type-only partial HNSW indexes that existed were never selected because the planner had a cheaper option for the bank_id filter. The result: semantic search on large banks was slower than it should be.
0.4.18 creates three partial HNSW indexes per bank—one each for world, experience, and observation fact types—and rewrites the retrieval query as a UNION ALL of per-fact-type subqueries. Each arm can now use its dedicated partial index, and the planner consistently chooses an HNSW index scan over a sequential scan.
The migration runs automatically on startup and creates indexes for all existing banks. New banks get their indexes immediately at creation time, and indexes are dropped when a bank is deleted. No configuration is required.
Other Updates
Improvements
- Recall query token limit is now configurable via
HINDSIGHT_API_RECALL_MAX_QUERY_TOKENS(default: 500). Useful when you want to allow longer multi-turn context as the recall query. - LiteLLM reranker now truncates documents that exceed the provider's context limit, preventing reranking failures when individual fact chunks are unusually long.
Bug Fixes
- Fixed recalled memories not being injected as system context for OpenClaw.
- Fixed embedded profiles not being registered in CLI metadata when the daemon starts.
- Canceled in-flight async operations when a bank is deleted to avoid dangling background work.
Feedback and Community
Hindsight 0.4.18 is a drop-in replacement for 0.4.x with no breaking changes.
Share your feedback:
For detailed changes, see the full changelog.
