Hermes Memory Modes with Hindsight, Hybrid, Context, Tools

If you are trying to choose between Hermes memory modes with Hindsight, the decision is really about one question: should memory be injected automatically, exposed as tools, or both? Hermes's native Hindsight provider supports three integration modes, hybrid, context, and tools, and each one changes how the model experiences memory during a conversation.
This matters more than it sounds. Teams often turn on Hindsight, see that memory works, and stop there. But the mode determines whether recall happens before every turn, whether the model can call hindsight_recall and hindsight_reflect directly, and whether your assistant behaves like a silent memory system or an explicit tool-using agent. Pick the wrong mode and the setup still looks healthy, but the behavior feels off.
This guide explains what each mode does, when to use it, how to switch safely, how prefetch_method changes the experience, and how to verify that your chosen mode is actually doing what you think it is. For the complete reference, keep the Hermes integration docs, the docs home, and the quickstart guide nearby.
Quick answer
- Use
hybridif you want automatic recall and explicit Hindsight tools.- Use
contextif you want invisible auto-recall with no memory tools exposed.- Use
toolsif you want the model to call memory deliberately, not automatically.- Keep
prefetch_method="recall"for speed, switch toreflectonly when synthesized context is worth the latency.- Verify your choice with
hermes memory status,/tools, and a real next-turn recall test.
Prerequisites
Before you change modes, make sure:
- Hermes is already configured to use the native Hindsight provider.
hermes memory statusreports a healthy memory setup.- You know where your Hermes Hindsight config file lives.
- You understand that memory mode affects recall behavior, not whether your bank exists.
Hermes stores the native Hindsight config at ~/.hermes/hindsight/config.json by default. You can print it with:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
print(path.read_text())
PY
If you have not validated the provider yet, do that first:
hermes memory status
It also helps to skim the Recall API reference and Retain API reference, because they make the behavior of each mode easier to reason about. memory_mode controls how recall enters the conversation, while retain behavior continues to follow the provider's retention settings.
Step by step
1. Understand what each mode actually changes
The three modes are not cosmetic. They shape how memory enters the agent loop.
| Mode | Auto-recall before each turn | Explicit Hindsight tools visible | Best for |
|---|---|---|---|
hybrid | Yes | Yes | Most users, assistants that need both convenience and control |
context | Yes | No | Clean UX, consumer-facing assistants, less tool noise |
tools | No | Yes | Agents that should decide when to query memory explicitly |
Think of them like this:
hybridis the default general-purpose mode. Relevant memories are injected before the LLM sees the new user message, and the model can still callhindsight_recall,hindsight_retain, andhindsight_reflectexplicitly if needed.contexthides the tools and relies on automatic injection only. This is ideal when you want the assistant to feel seamless and not expose extra memory actions.toolsremoves automatic recall from the prompt-building path. The model has to choose to call memory tools. This is more deliberate, but it also means a poorly instructed model can forget to look.
If you remember one rule, make it this one: tools mode is not broken when auto-recall disappears. That is the design.
2. Inspect your current mode and prefetch method
Before you change anything, look at the current values:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
cfg = json.loads(path.read_text())
print("memory_mode:", cfg.get("memory_mode", "hybrid"))
print("prefetch_method:", cfg.get("prefetch_method", "recall"))
PY
prefetch_method matters because it changes what automatic injection looks like:
recallinjects raw memory facts. It is faster and usually the right default.reflectinjects a synthesized answer built from relevant memories. It is slower, but can be better when the assistant needs a coherent summary instead of a list of facts.
The simplest reliable starting point is:
memory_mode="hybrid"prefetch_method="recall"
That gives you automatic context plus explicit tools, with the least latency overhead.
3. Switch to hybrid when you want the safest default
If you want both automatic recall and explicit tool access, set hybrid:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
cfg = json.loads(path.read_text())
cfg["memory_mode"] = "hybrid"
cfg["prefetch_method"] = "recall"
path.write_text(json.dumps(cfg, indent=2) + "\n")
print(f"Updated {path} to hybrid mode")
PY
Choose hybrid if any of these are true:
- you want the assistant to remember relevant context without being prompted
- you also want power users or the model itself to call
hindsight_reflectexplicitly for deeper synthesis - you are still evaluating Hindsight and want the easiest mode to debug
This is also the best mode for internal assistants and technical users, where tool visibility is not a UX problem.
4. Switch to context when you want invisible memory
If you want recall to happen automatically but do not want the memory tools visible to the model, use context:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
cfg = json.loads(path.read_text())
cfg["memory_mode"] = "context"
cfg.setdefault("prefetch_method", "recall")
path.write_text(json.dumps(cfg, indent=2) + "\n")
print(f"Updated {path} to context mode")
PY
Use context when:
- you want a cleaner tool surface
- you are running a customer-facing assistant where fewer visible tools means less behavioral noise
- you trust automatic injection more than tool planning
For many production assistants, context is the best user experience. The model simply starts with relevant history. It does not need to remember that memory exists as a tool category.
5. Switch to tools when you want deliberate memory access
If you do not want automatic recall at all, use tools:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
cfg = json.loads(path.read_text())
cfg["memory_mode"] = "tools"
path.write_text(json.dumps(cfg, indent=2) + "\n")
print(f"Updated {path} to tools mode")
PY
This is the right choice when:
- you want the model to decide when memory is relevant
- you are optimizing for tighter prompt control
- you are experimenting with explicit agent strategies that use
hindsight_recallandhindsight_reflectas first-class reasoning tools
The tradeoff is obvious but important: if the model is not prompted well, it may simply forget to use memory. tools mode gives you control, but less safety.
6. Decide whether prefetch_method should stay on recall or move to reflect
prefetch_method only matters when auto-recall is active, so it applies to hybrid and context.
Use this script to switch methods:
python - <<'PY'
import json, os, pathlib
base = pathlib.Path(os.environ.get("HERMES_HOME", pathlib.Path.home() / ".hermes"))
path = base / "hindsight" / "config.json"
cfg = json.loads(path.read_text())
cfg["prefetch_method"] = "reflect"
path.write_text(json.dumps(cfg, indent=2) + "\n")
print(f"Updated {path} to reflect prefetch")
PY
Here is the practical difference:
| Prefetch method | What Hermes gets | Speed | Best for |
|---|---|---|---|
recall | raw relevant facts | faster | coding assistants, support, most chat workflows |
reflect | synthesized summary across memories | slower | complex planning, open-ended reasoning, summarization |
If you are unsure, stay on recall. It is easier to reason about, easier to debug, and usually enough. Reach for reflect only when the model keeps needing a coherent memory summary rather than point facts.
7. Match the mode to the job
A quick practical map:
- Personal assistant with recurring preferences and projects:
hybrid - Customer-facing assistant where you want seamless personalization:
context - Research or coding agent that should query memory intentionally:
tools - Planning-heavy assistant with deep history:
hybridorcontextplusprefetch_method="reflect"
This is also where adjacent integrations are useful reference points. If you like the silent memory pattern, the OpenClaw integration docs show the same idea in a different agent architecture. If you want a tool-centric workflow, Adding memory to Codex with Hindsight is a good comparison.
Verifying it works
Do not trust the config file alone. Verify behavior.
Check provider status
hermes memory status
This confirms that the provider is active, not that the chosen mode behaves the way you expect. For that, you need live testing.
Check tool visibility
Launch Hermes and inspect the tool list.
- In
hybrid, you should see the Hindsight tools. - In
context, you should not. - In
tools, you should see the tools, but no automatic recall should happen before a turn.
Run a next-turn recall test
Tell Hermes something it can remember:
Remember that the design review is Thursday at 3 PM and we are prioritizing the mobile onboarding flow.
Then, on the next turn, ask:
What do you remember about the design review?
Expected behavior:
hybridandcontext: relevant memory should already be in context before the model answers.tools: the model must choose to call a Hindsight tool, or it may answer without using memory.
Verify your expectations against the chosen mode
Many false bug reports are really expectation mismatches:
- a user chooses
tools, then expects auto-recall - a user chooses
context, then expects Hindsight tools in/tools - a user chooses
reflectprefetch, then is surprised by higher latency
The behavior has to match the mode you selected.
Troubleshooting common mistakes
Problem: tools mode seems to forget everything
That is often just a prompt issue. In tools mode, the model has to decide to use memory tools. If you want reliable automatic memory use, switch to hybrid or context.
Problem: context mode works, but I cannot find hindsight_reflect
That is expected. context hides the explicit tools. Automatic recall is the feature, not tool exposure.
Problem: hybrid shows tools, but no memory is injected
Check whether your Hermes build supports the required lifecycle hooks. The Hermes docs note that on older builds, only tools are registered and auto-injection is skipped.
Problem: reflect prefetch feels slow
It is slower by design because it synthesizes context instead of injecting raw recalled facts. Switch back to recall if responsiveness matters more than coherence.
Problem: nothing changes after editing the config file
Make sure you edited the correct file under ~/.hermes/hindsight/config.json, then restart Hermes so the new config is loaded.
Problem: memory appears stale or incomplete right after a user shares a new fact
Remember the async flow: retention happens after the response, then the new memory is available on the next turn. Test next-turn behavior, not same-turn behavior.
FAQ
Which mode should most people start with?
hybrid. It is the safest default and easiest to debug.
When is context better than hybrid?
When you want invisible personalization and a cleaner tool surface. It is especially good for assistants that should feel simple and conversational.
When is tools better than hybrid?
When you explicitly want the model to reason about whether memory should be consulted. It gives you more control, but less automation.
Does tools mode disable retention too?
No. tools mode changes how recall enters the agent experience. Retention behavior is still governed by the provider's retention settings and lifecycle hooks.
Should I use prefetch_method="reflect" everywhere?
Usually not. Use recall by default. Move to reflect when you have a real need for synthesized memory context.
Where can I learn more about what Hindsight is injecting?
The Recall API reference is the best place to understand fact retrieval, and the Retain API reference explains how new information becomes memory in the first place. The Hermes integration docs tie those concepts back to Hermes configuration.
Next Steps
- Create a Hindsight Cloud account if you want the fastest way to test mode changes without local infrastructure work.
- Read the Hermes integration docs for the complete config surface.
- Keep the quickstart guide open if you want a cleaner end-to-end setup reference.
- Read the Recall API reference to understand what
recallprefetch actually returns. - Read the Retain API reference if you want to tighten what becomes memory.
- Compare the OpenClaw integration docs and Adding memory to Codex with Hindsight if you want examples of other memory interaction styles.
