Skip to main content

Building an OpenClaw Coding Agent That Remembers Your Codebase

· 12 min read
Ben Bartholomew
Hindsight Team

Building an OpenClaw Coding Agent That Remembers Your Codebase

Every AI coding session starts from zero.

You open OpenClaw, paste in context — your stack, your conventions, the architectural decision you made last week, the bug you spent two days on in March. Then you do it again next session. And the one after that. The problem isn't that OpenClaw is bad at coding. It's that it has no memory of your codebase.

OpenClaw with Hindsight is the exception. Each session adds to what it knows about your project. By session 20, OpenClaw already knows your module boundaries, naming conventions, known fragile areas, and the root cause of that recurring auth issue. You stopped explaining; it started knowing.

This post covers what OpenClaw actually extracts from coding sessions, how to set it up in two minutes, and the three workflows where persistent codebase memory has the highest leverage.


What OpenClaw Remembers About Your Codebase

OpenClaw doesn't store transcripts. What Hindsight extracts and retains are facts — atomic, retrievable pieces of knowledge pulled from your conversations.

After a typical coding session, facts like these enter memory automatically:

  • "Project uses ESM modules, not CommonJS — always use .js extensions in imports"
  • "The auth middleware fails silently on expired refresh tokens, known issue as of March 14"
  • "SQLAlchemy was removed in favor of raw asyncpg after performance testing in February"
  • "Team convention: all async handlers wrapped in handle_errors() decorator"

None of these require you to explicitly tell OpenClaw to remember them. Hindsight's write pipeline extracts them from the natural flow of your session — from the questions you ask, the bugs you describe, the decisions you explain along the way.

What doesn't become memory: raw file contents, line-by-line code, verbose terminal output. The extraction step is itself a filter. Conversational filler, repeated context-setting, procedural noise — none of it survives. What remains is a growing index of codebase facts that OpenClaw carries into every future session.

The lifecycle runs at both ends of each turn:

Before each turn: Hindsight prefetches the most relevant memories from your history and injects them into the system prompt. OpenClaw sees that context before it sees your message.

After each response: Your conversation is retained asynchronously. Hindsight extracts facts in the background. What you discuss this turn becomes searchable starting next turn.


Two-Minute Setup

Install the plugin and run the setup wizard:

openclaw plugins install @vectorize-io/hindsight-openclaw
npx --package @vectorize-io/hindsight-openclaw hindsight-openclaw-setup

The wizard walks you through three install modes:

  • Cloud — managed Hindsight. Paste your cloud API token, done.
  • External API — your own Hindsight deployment. Prompts for the URL and optional token.
  • Embedded daemon — runs Hindsight locally on your machine. Prompts for the LLM provider (OpenAI, Anthropic, Gemini, Groq, Ollama, Claude Code, Codex) and its API key.

Confirm memory is active:

openclaw gateway
# Check logs:
tail -f /tmp/openclaw/openclaw-*.log | grep Hindsight
# Should see:
# [Hindsight] ✓ Using provider: openai, model: gpt-4o-mini

Config lives in ~/.openclaw/openclaw.json. The defaults work for most coding workflows:

KeyDefaultDescription
autoRecalltrueInject memories before every turn
autoRetaintrueCapture conversations after every turn
recallBudgetmidRecall thoroughness: low / mid / high
recallMaxTokens1024How much memory context injected per turn

Both autoRecall and autoRetain default to true, which means memory works out of the box with no agent-side changes. OpenClaw doesn't need to know memory exists — it just has context. Memories are injected into the system prompt before each turn, and conversations are retained in the background after each response.

For deployment: if you work across machines or share memory with a team, use cloud. If you want everything local with no external dependencies, the embedded daemon runs a local PostgreSQL instance in the background. First startup takes about a minute; subsequent starts are fast. Startup logs land at ~/.hindsight/profiles/openclaw.log.

Migrating from 0.5.x? The 0.6.0 plugin removes all process-environment reads. Configuration that previously came from shell env vars (OPENAI_API_KEY, HINDSIGHT_API_LLM_PROVIDER, etc.) must now go through openclaw config set with SecretRef for credentials. Run openclaw config validate after migrating. Full mapping in the integration docs.


Three Workflows Where Codebase Memory Matters

Starting a New Session on Existing Work

Without memory, resuming work on an existing project means context-setting before any actual work happens: paste the README, explain the tech stack, re-establish what you were doing last time, remind OpenClaw about the convention it should already know. On a complex project, that overhead eats 10–15 minutes of every session.

With memory, OpenClaw starts each session with the accumulated facts from your previous sessions already injected into its context. It knows the stack. It knows the conventions. It knows what you were debugging last week.

The first message of a session becomes the actual work.

What gets injected is controlled by the recallBudget setting. At mid (the default), Hindsight fetches the 10–15 most relevant memories — enough to cover your project's core facts without flooding the context window. At high, retrieval runs deeper: more context, more tokens. For most coding workflows, mid is the right balance. If you're jumping back into a complex investigation across many modules, high is worth the extra cost.

Debugging Recurring Issues

The highest-leverage value of codebase memory is pattern recognition across sessions. Some bugs aren't one-off — they're symptoms of a deeper architectural issue that surfaces in different forms over months.

Without memory, you debug each instance independently. You might trace the same root cause three separate times without connecting the dots.

With memory, OpenClaw recalls the previous instances. Describe a new failure mode, and it surfaces related context: the root cause it identified two months ago, the workaround that held until the next refactor, the component that keeps appearing in these failures.

The kinds of facts that pay off here:

  • "The rate limiter bypasses auth checks for requests with X-Internal: true header, source of two privilege escalation near-misses"
  • "Async task queue silently drops jobs when Redis connection resets, needs explicit ACK handling, not fire-and-forget"
  • "GraphQL resolver N+1 pattern reappears after every new schema addition, needs DataLoader enforcement flagged in code review"

These aren't facts you'd think to paste at the start of a debugging session. They're the institutional knowledge that separates debugging blindly from debugging with full context.

For this workflow, recallBudget: high is worth the extra cost. Instead of the default 10–15 memories, high-budget retrieval runs deeper — more context, more retrieval strategies. When you're chasing a bug that spans multiple subsystems and prior sessions, the additional recall surface pays for itself.

Onboarding to Someone Else's Code

If you join a project where a colleague has been working with OpenClaw and Hindsight using a shared bank, the memory bank already has context from their sessions.

Query what OpenClaw knows about a specific module:

What do you know about the payments module?
What quirks or known issues have come up in the auth service?
What were the reasons we moved off SQLAlchemy?

OpenClaw surfaces the accumulated facts from previous sessions: architectural context, known edge cases, past decisions and the reasoning behind them — without anyone having to write it down in a README, a wiki page, or a Slack thread that nobody can find.

This isn't documentation. It's the institutional knowledge that never makes it into documentation.


What Good Codebase Memory Looks Like

After 30+ sessions on a project, a well-built memory bank typically covers:

Project conventions: Module structure and import patterns, error handling requirements, naming conventions that aren't obvious from the code, linting rules that differ from the defaults.

Known fragile areas: Components that break under specific load or input conditions, integration points that have caused production incidents, edge cases the test suite doesn't cover.

Architectural history: Dependencies that were replaced and why, patterns considered and rejected, performance characteristics discovered through testing rather than docs.

Team preferences: Code review priorities, deployment gotchas, things that work differently than the official docs say.

Most of this accumulates automatically from normal sessions. The exception: major architectural decisions and team preferences benefit from explicit statement. When you make a significant call, tell OpenClaw the rationale:

We're switching to asyncpg from SQLAlchemy because connection pooling
under our load profile caused intermittent timeouts above ~200 concurrent
requests. The fix wasn't tuning, it was the ORM abstraction. Remember this.

The background extraction catches most of what matters without any explicit action on your part.

Before/after: what memory recall looks like in a session

Without memory, a session opening might look like:

"I'm working on a Python service that uses asyncpg for database access. We removed SQLAlchemy in February due to connection pool issues under load. All async handlers should be wrapped in handle_errors(). Help me debug this intermittent 500..."

With memory, OpenClaw already has these facts injected. You open with:

"Help me debug this intermittent 500 in the payment handler."

The stack, the convention, the architectural context — already there.


Team Codebases: Shared Memory Banks

By default, OpenClaw creates separate memory banks per agent + channel + user. For a team sharing a coding agent on the same project, set dynamicBankGranularity to share one bank:

{
"dynamicBankGranularity": ["agent"]
}

When multiple developers use OpenClaw on the same codebase with a shared bank, the memory compounds from all their sessions. Knowledge that one developer builds up — a tricky module's undocumented behavior, a hard-won debugging insight, a deployment gotcha — becomes available to the rest of the team automatically.

A few considerations:

  • What compounds well: Codebase facts, architectural decisions, known issues, conventions. These describe the codebase, not the person — safe to share.
  • What to keep separate: Personal workflow preferences, unrelated personal context. Use a separate bank for those.
  • Bank naming: Use bankIdPrefix to namespace banks per project. A shared bank with prefix payments-service that the payments team all uses turns into institutional memory. A default bank that three people use without coordinating turns into noise.

Advanced: Seeding a Structured Mental Model

Organic extraction from sessions is the primary way Hindsight builds codebase knowledge. But you can front-load context explicitly — useful when starting on an existing codebase, or when critical conventions should be in the bank before the first session runs.

Hindsight exposes two operations via the SDK and API outside of OpenClaw, feeding the same memory bank OpenClaw draws from:

Ingesting existing docs. Upload architecture notes, ADRs, or conventions files. Hindsight runs fact extraction on the content and stores the results as memories in the bank. Once ingested, those facts are available to OpenClaw on the next session — no waiting for organic extraction to catch up.

Creating a mental model. Define a curated summary built from a source query — "What are the coding conventions for this project?" Hindsight runs a reflect operation, synthesizes the answer from all ingested and session-extracted knowledge, and saves the result. Set refresh_after_consolidation to true and the model re-derives itself as new facts arrive. Mental models are checked first during reflect calls, before individual observations and raw facts, so the pre-computed answer is returned without re-deriving it on the fly.

Backfilling existing history. If you already have months of OpenClaw sessions, the plugin ships a backfill CLI that imports them into Hindsight:

npx --package @vectorize-io/hindsight-openclaw hindsight-openclaw-backfill \
--openclaw-root ~/.openclaw \
--dry-run

Remove --dry-run to execute. Use --agent proj-run to limit import to specific agents, and --resume to pick up where a previous backfill left off.

The combination of ingested project docs, session-extracted facts, and backfilled history gives OpenClaw a complete picture from three angles: what was deliberately documented, what was discovered through use, and what happened before the plugin was installed.

See the Hindsight mental models docs for the full API reference.


The Longer You Use It, the Less You Explain

OpenClaw with Hindsight is one of the few coding workflows where context accumulates across sessions. Every session adds to what it knows about your codebase. Other tools reset. This one doesn't.

The value compounds with use. Session one, OpenClaw knows nothing about your project. Session five, it knows the stack and conventions. Session 30, it knows the project's history, its fragile areas, the decisions that shaped its current shape. At that point you've stopped explaining those things — not because you skipped the context, but because you never needed to provide it again. And once that mental model is rich enough, you're not just talking to a coding assistant. You're working alongside an agent that knows the codebase as well as you do.

Set it up with openclaw plugins install @vectorize-io/hindsight-openclaw, or start with the Hindsight integration docs.


Further reading: