Skip to main content

Hermes Agent

Persistent long-term memory for Hermes Agent using Hindsight. Automatically recalls relevant context before every LLM call and retains conversations for future sessions — plus explicit retain/recall/reflect tools.

Quick Start

1. Get an API key at ui.hindsight.vectorize.io/connect. The API endpoint is https://api.hindsight.vectorize.io.

2. Run the setup wizard:

hermes memory setup    # select "hindsight"

The wizard will prompt for your API key and API URL, and configure everything automatically.

Or configure manually:

hermes config set memory.provider hindsight
# Add your key and the API endpoint
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
echo "HINDSIGHT_API_URL=https://api.hindsight.vectorize.io" >> ~/.hermes/.env

3. Confirm memory is active:

hermes memory status

Features

  • Auto-recall — on every turn, queries Hindsight for relevant memories and injects them into the system prompt (via pre_llm_call hook)
  • Auto-retain — after every response, retains the user/assistant exchange to Hindsight (via post_llm_call hook)
  • Explicit toolshindsight_retain, hindsight_recall, hindsight_reflect for direct model control
  • Memory modes — choose between automatic injection, tools-only, or hybrid
  • Zero config overhead — env vars work as overrides for CI/automation
note

The lifecycle hooks (pre_llm_call/post_llm_call) require hermes-agent with PR #2823 or later. On older versions, only the three tools are registered — hooks are silently skipped.

Architecture

The plugin registers via Hermes's hermes_agent.plugins entry point system:

ComponentPurpose
pre_llm_call hookAuto-recall — query memories, inject as ephemeral system prompt context
post_llm_call hookAuto-retain — store user/assistant exchange to Hindsight
hindsight_retain toolExplicit memory storage (model-initiated)
hindsight_recall toolExplicit memory search (model-initiated)
hindsight_reflect toolLLM-synthesized answer from stored memories

Connection Modes

Connect to Hindsight Cloud at https://api.hindsight.vectorize.io. Get an API key at ui.hindsight.vectorize.io/connect.

{
"mode": "cloud",
"api_url": "https://api.hindsight.vectorize.io",
"api_key": "hsk_your_token",
"bank_id": "hermes"
}

2. Local (embedded)

Runs an embedded Hindsight server with built-in PostgreSQL. Requires an LLM API key for memory extraction and synthesis. The daemon starts automatically in the background on first use.

{
"mode": "local",
"llm_provider": "groq",
"llm_api_key": "your-groq-key"
}
note

The embedded server starts on the first message when Hermes says "starting agent". On a fresh system this can take over a minute while the embedded PostgreSQL initializes. Subsequent startups are fast.

Daemon startup logs: ~/.hermes/logs/hindsight-embed.log
Daemon runtime logs: ~/.hindsight/profiles/<profile>.log

Configuration

All settings are in ~/.hermes/hindsight/config.json. Every setting can also be overridden via environment variables (env vars take priority).

Connection & Daemon

SettingDefaultEnv VarDescription
modecloudHINDSIGHT_MODEcloud or local
api_urlhttps://api.hindsight.vectorize.ioHINDSIGHT_API_URLHindsight API URL
api_keynullHINDSIGHT_API_KEYAuth token for Hindsight Cloud
apiPort9077HINDSIGHT_API_PORTPort for local Hindsight daemon
daemonIdleTimeout0HINDSIGHT_DAEMON_IDLE_TIMEOUTSeconds before idle daemon shuts down (0 = never)
embedVersion"latest"HINDSIGHT_EMBED_VERSIONhindsight-embed version for uvx

LLM Provider (local mode only)

SettingDefaultEnv VarDescription
llm_provideropenaiHINDSIGHT_LLM_PROVIDERLLM provider: openai, anthropic, gemini, groq, minimax, ollama, lmstudio
llm_api_keyHINDSIGHT_LLM_API_KEYAPI key for the chosen LLM provider
llm_modelprovider defaultHINDSIGHT_LLM_MODELModel override (auto-defaults per provider)

Default models per provider: openaigpt-4o-mini, anthropicclaude-haiku-4-5, geminigemini-2.5-flash, groqopenai/gpt-oss-120b, minimaxMiniMax-M2.7, ollamagemma3:12b.

Memory Bank

SettingDefaultEnv VarDescription
bank_idhermesHINDSIGHT_BANK_IDMemory bank ID
bankMission""HINDSIGHT_BANK_MISSIONAgent identity/purpose for the memory bank
retainMissionnullCustom retain mission (what to extract from conversations)

Auto-Recall

SettingDefaultEnv VarDescription
autoRecalltrueHINDSIGHT_AUTO_RECALLEnable automatic memory recall via pre_llm_call hook
recallBudget"mid"HINDSIGHT_RECALL_BUDGETRecall effort: low, mid, high
recallMaxTokens4096HINDSIGHT_RECALL_MAX_TOKENSMax tokens in recall response
recallMaxQueryChars800HINDSIGHT_RECALL_MAX_QUERY_CHARSMax chars of user message used as query
recallPromptPreamblesee belowHeader text injected before recalled memories

Default preamble:

Relevant memories from past conversations (prioritize recent when conflicting). Only use memories that are directly useful to continue this conversation; ignore the rest:

Auto-Retain

SettingDefaultEnv VarDescription
autoRetaintrueHINDSIGHT_AUTO_RETAINEnable automatic retention via post_llm_call hook
retainEveryNTurns1Retain every Nth turn
retainOverlapTurns2Extra overlap turns for continuity
retainRoles["user", "assistant"]Which message roles to retain

Integration Mode

SettingDefaultEnv VarDescription
memory_modehybridHow memories are integrated into the agent (see below)
prefetch_methodrecallMethod used for automatic context injection (see below)

memory_mode:

  • hybrid — automatic context injection before each turn, plus tools available to the LLM
  • context — automatic injection only; no tools exposed to the model
  • tools — tools only (hindsight_retain, hindsight_recall, hindsight_reflect); no automatic injection

prefetch_method:

  • recall — injects raw memory facts into the system prompt (fast)
  • reflect — injects an LLM-synthesized summary of relevant memories (slower, more coherent)

Miscellaneous

SettingDefaultEnv VarDescription
debugfalseHINDSIGHT_DEBUGEnable debug logging to stderr

Hermes Gateway (Telegram, Discord, Slack)

When using Hermes in gateway mode (multi-platform messaging), the plugin works across all platforms. Hermes creates a fresh AIAgent per message, and the plugin's pre_llm_call hook ensures relevant memories are recalled for each turn regardless of platform.

Disabling Hermes's Built-in Memory

Hermes has a built-in memory tool that saves to local markdown files. If both are active, the LLM may prefer the built-in one. Disable it:

hermes tools disable memory

Re-enable later with hermes tools enable memory.

Troubleshooting

Plugin not loading: Verify the entry point is registered:

python -c "
import importlib.metadata
eps = importlib.metadata.entry_points(group='hermes_agent.plugins')
print(list(eps))
"

You should see EntryPoint(name='hindsight', value='hindsight_hermes', ...).

Tools don't appear in /tools: Check that api_url (or HINDSIGHT_API_URL) is set, or that HINDSIGHT_API_KEY is set for cloud mode. The plugin silently skips tool registration when unconfigured.

Connection refused: Verify the Hindsight API is running:

curl http://localhost:9077/health

Local daemon not starting: Check the daemon log for errors:

cat ~/.hermes/logs/hindsight-embed.log

Recall returning no memories: Memories need at least one retain cycle. Try storing a fact first, then asking about it in a new session.