Adding Persistent Memory to OpenAI Codex with Hindsight

TL;DR
- Codex has no persistent memory built in. Every session starts fresh — no recollection of past decisions, preferences, or codebase context.
- The Hindsight plugin hooks into Codex via three Python scripts with zero
pip installrequired. One curl command gets you running. - Auto-recall queries your memory bank before every prompt and injects relevant facts as invisible context. Codex sees them; you don't have to repeat them.
- Auto-retain fires at the end of every session, extracts facts from the transcript, and stores them for future recall.
- For teams, point everyone's config at a shared Hindsight server with a fixed
bankId. See Shared Memory for AI Coding Agents. - Hindsight Cloud stores your memory bank server-side — no local daemon, and memory follows you across machines. Sign up free.
The Problem: Codex Has No Persistent Memory
Codex is OpenAI's open-source coding agent CLI. You give it a task, it reads your files, runs commands, and iterates until it's done. It's capable and fast — but it has no memory.
Every session starts from nothing. Codex doesn't know which libraries your project uses, which patterns you've standardized on, which areas of the codebase are fragile, or what you were working on yesterday. You re-establish this context at the start of every session, either by explaining it directly or by pointing at an AGENTS.md file you've manually maintained.
AGENTS.md helps — it's a static markdown file that tells Codex baseline facts about your project on startup. But it captures what you remembered to write down, not what you actually encountered. The Redis TTL discrepancy you noticed Tuesday at 3pm, the JWT edge case that surfaced during code review, the reason you stopped using SQLAlchemy — these live in session transcripts that vanish when the window closes. Nobody updated AGENTS.md. Next session, that knowledge is gone.
How Hindsight Adds Persistent Memory to Codex
Hindsight adds a persistent memory layer to Codex by hooking into its lifecycle at two points: before every prompt and after every session.
Auto-recall. On every user prompt, the recall hook queries Hindsight for memories relevant to what you're about to ask. The results are injected into Codex as additionalContext — prepended to the conversation before the model sees it, but not visible in your terminal output. Codex has the context; you didn't have to repeat it.
Auto-retain. When a Codex session ends (the Stop hook fires), the retain hook takes the session transcript, strips any injected memory tags to prevent feedback loops, and sends it to Hindsight. The extraction model reads it and pulls out discrete facts — decisions made, patterns observed, bugs found. These land in your memory bank, available for every future session.
Full-session upsert. The transcript is stored using the session ID as the document key. If a session is retained multiple times (in chunked mode), the content is upserted rather than duplicated. No accumulation of near-identical entries.
Minimal dependencies. The hook scripts use Python stdlib only — no pip install, no virtualenv, no version conflicts. Local daemon mode requires uvx to run hindsight-embed; Cloud mode has no local prerequisites at all.
┌──────────────┐ UserPromptSubmit ┌─────────────────────────┐
│ Codex │ ─────────────────────────▶ │ recall.py │
│ CLI │ ◀───────────────────────── │ queries Hindsight, │
│ │ additionalContext inject │ injects as context │
│ │ └─────────────────────────┘
│ │ Stop ┌─────────────────────────┐
│ │ ─────────────────────────▶ │ retain.py │
└──────────────┘ │ strips tags, sends │
│ transcript to Hindsight│
└─────────────────────────┘
Installing
curl -fsSL https://hindsight.vectorize.io/get-codex | bash
The installer guides you through choosing local or cloud mode and writes the hook scripts to ~/.hindsight/codex/. It also enables codex_hooks = true in your ~/.codex/config.toml automatically.
If you choose Hindsight Cloud during setup, there's nothing else to install — no local daemon, no uvx required.
Start a new Codex session — memory is live immediately. The first few sessions build up your bank. By the third or fourth, recall starts surfacing useful context you didn't have to re-explain.
To uninstall:
curl -fsSL https://hindsight.vectorize.io/get-codex | bash -s -- --uninstall
Hindsight Cloud (Recommended)
For most users, Hindsight Cloud is the easier option: no daemon to manage, memory syncs across machines, and setup is two lines of JSON. The default setup runs a local hindsight-embed daemon, but switching to Cloud means you never have to think about it again. Edit ~/.hindsight/codex.json:
{
"hindsightApiUrl": "https://api.hindsight.vectorize.io",
"hindsightApiToken": "hsk_your_token"
}
Create an account and API key at hindsight.vectorize.io. No daemon to manage — the cloud server handles extraction.
What Gets Recalled
The recall hook fires on every UserPromptSubmit event. It takes your prompt (and optionally the previous turn for context), queries Hindsight, and injects the most relevant memories as a block at the top of the conversation:
<hindsight_memories>
Relevant memories from past conversations...
Current time - 2026-03-28 09:14
- Project uses FastAPI with asyncpg — not SQLAlchemy [world] (2026-03-26)
- Preferred testing framework: pytest with pytest-asyncio [world] (2026-03-26)
- Redis TTL in production is 15 minutes — README says 30, README is wrong [world] (2026-03-27)
</hindsight_memories>
Codex sees this block; it doesn't appear in your terminal output. The result: Codex starts every response with relevant context from past sessions, without you having to provide it.
You can tune how much to inject with recallBudget ("low", "mid", "high") and recallMaxTokens.
Before and after
Without persistent memory, you open a session and type: "Continue working on the payments module." Codex has no context. It reads the files, makes reasonable guesses, and may ask clarifying questions you've answered before, or worse, contradict decisions you made last week.
With Hindsight, the same session opens with recalled facts already in context: which payment provider you're using, that you decided against webhooks in favor of polling, and that the staging environment has a known issue with idempotency keys. Codex starts from where you left off, not from zero.
Per-Project Memory
By default all Codex sessions share a single bank. To give each project its own isolated memory:
{
"dynamicBankId": true,
"dynamicBankGranularity": ["agent", "project"]
}
With this config, running Codex in ~/projects/api and ~/projects/frontend maintains separate banks. Bank IDs are derived from the working directory path — switching projects automatically switches memory context. Per-project isolation works with Hindsight Cloud the same way it does locally — the bank ID is just routed to the Cloud API instead of a local daemon.
Team Shared Memory
Individual persistent memory is useful. Shared memory across a team is transformative.
When everyone on a team points their Codex config at the same Hindsight bank, context accumulated by one developer becomes available to all. A bug discovered on Monday surfaces in recall on Tuesday, regardless of who's asking. Architecture decisions made in one session inform the next, without requiring anyone to update a shared doc.
To configure team shared memory, set a fixed bankId in each developer's config and point them at the same Hindsight Cloud endpoint:
{
"hindsightApiUrl": "https://api.hindsight.vectorize.io",
"hindsightApiToken": "hsk_your_token",
"bankId": "my-team-project"
}
See Shared Memory for AI Coding Agents for a full team setup guide including bank seeding, per-project isolation, and onboarding patterns.
Key Configuration Options
Settings live in ~/.hindsight/codex.json. Every setting can also be set via environment variable.
| Setting | Default | What it does |
|---|---|---|
retainMission | generic | Steers fact extraction — tell it what to focus on |
retainEveryNTurns | 10 | How often to retain mid-session in chunked mode |
recallBudget | "mid" | Search depth: "low" (fast) / "mid" / "high" (thorough) |
autoRecall | true | Master switch for recall |
autoRetain | true | Master switch for retention |
dynamicBankId | false | Enable per-project bank isolation |
debug | false | Logs all Hindsight activity to stderr |
Example retainMission for a focused memory bank:
{
"retainMission": "Extract technical decisions, known bugs and their workarounds, architecture choices, and coding conventions. Do not retain one-off debugging steps or editor preferences."
}
Pitfalls
Hooks not firing. The installer sets codex_hooks = true in ~/.codex/config.toml automatically, but if you installed manually or the file already existed, this may have been missed. Check the file and add the setting under [features] if it's missing.
No memories recalled in the first session. Recall returns results only after something has been retained. Complete one session first, or seed your bank manually using the cookbook example.
Retention seems delayed. retainEveryNTurns defaults to 10 in chunked mode — retain fires every 10 turns. In full-session mode (the default), retention fires once at session end. If you're testing, add "retainEveryNTurns": 1 to your config.
Nothing happening? Check ~/.hindsight/codex/error.log first — hook failures are written there. For a full trace of what each hook is doing, enable debug logging:
Debug mode. Add "debug": true to ~/.hindsight/codex.json to log detailed activity to ~/.hindsight/codex/debug.log:
[Hindsight] Recalling from bank 'codex', query length: 42
[Hindsight] Injecting 3 memories
[Hindsight] Retaining to bank 'codex', doc 'sess-abc123', 2 messages, 847 chars
Tradeoffs
A few things worth knowing before you commit.
Recall adds latency. Every prompt triggers a Hindsight query before Codex sees it. In practice this is 100–300ms with Hindsight Cloud on a fast connection. For interactive sessions it's imperceptible; for automated scripts it may matter. Use "recallBudget": "low" or "autoRecall": false if you need to skip it.
Retention is asynchronous. The retain.py hook fires asynchronously so it doesn't block your session from exiting. Facts are typically available within seconds of retention completing, but timing depends on server load — they may appear sooner or later than the next session.
Extraction quality depends on conversation quality. Hindsight extracts facts from what's actually in the transcript. If you work through a problem entirely in file edits without narrating what you're doing, there may be little for the extraction model to work with. Brief explanations in your prompts help.
Recap
| Codex default | With Hindsight | |
|---|---|---|
| Memory across sessions | None | Automatic |
| Memory setup | Manual AGENTS.md | Extracted from transcripts |
| Recall mechanism | File content on startup | Semantic search, injected per prompt |
| Per-project isolation | No | Optional via dynamicBankId |
| Team shared memory | No | Shared bank via Hindsight Cloud |
Next Steps
- Hindsight Cloud: ui.hindsight.vectorize.io
- Install:
curl -fsSL https://hindsight.vectorize.io/get-codex | bash - Config reference: Codex integration docs
- Team memory: Shared Memory for AI Coding Agents
- Cookbook: applications/codex-memory
