Stop Building Microsoft Agent Framework Agents That Forget

Microsoft Agent Framework is Microsoft's open-source successor to Semantic Kernel: a framework for building agents that plan, call tools, and hold a conversation. What it doesn't do out of the box is remember anything once a session ends. Start a new run and the agent is back to square one, with no recollection of who the user is or what was decided last time.
The new Hindsight integration fixes that. It plugs in as a context provider, so every agent run automatically recalls the memories relevant to the user's message and retains the conversation afterward. There's no MCP server in the loop and no memory tool the model has to decide to call. Memory just happens.
TL;DR
- Microsoft Agent Framework agents are stateless across runs. Each session starts cold.
- The Hindsight integration adds one context provider,
HindsightProvider.pip install hindsight-agent-framework, pass it to your agent, done. - Recall is automatic and deterministic. It runs on the framework's
before_runhook, so memory is injected before the model sees the prompt. There's no tool the model can forget to call. - After each run,
after_runretains the exchange, so the next run builds on it. - Memory lives in a Hindsight bank you choose per user, agent, or session, which makes per-user isolation a one-argument change.
- Hindsight Cloud means no infrastructure. Sign up free.
Why Microsoft Agent Framework Needs Persistent Memory
Microsoft Agent Framework gives you a clean way to stand up a capable agent: a chat client, instructions, tools, sessions. Within a single session the agent has working context. But that context evaporates when the session ends.
For anything you run more than once against the same user, that's the limitation you hit first. A support agent re-asks for the account details every conversation. A coding assistant re-learns the project's conventions on every reload. A personal assistant forgets the preferences the user stated yesterday. You either build a memory layer yourself (a datastore, embeddings, retrieval, deduplication) or you ship an agent with amnesia.
Hindsight is that memory layer, and the context provider wires it in without changing how you write your agent.
How It Works
Microsoft Agent Framework lets you attach context providers to an agent. A context provider gets two lifecycle hooks, and the Hindsight integration uses both:
| Hook | What Hindsight does |
|---|---|
before_run | Recall memories relevant to the user's message and inject them as a ## Memories block into the agent's instructions. |
after_run | Retain the user input plus the agent's response, so future runs build on them. |
Because recall runs on before_run, it is deterministic: the memories are in the agent's context before the model generates a token. Compare that to a tool-based approach, where the model has to decide to call a recall tool and might not. There's no decision to get wrong and no extra tool-use round trip.
The recalled memories arrive as a labeled block prepended to the agent's instructions:
## Memories
Consider the following memories when responding. Ignore any that are not relevant:
- User prefers vegetarian food
- User is allergic to peanuts
- User is cooking for four this week
The model reads it as part of its instructions; your application code never has to thread memory through by hand.
Setup
Install the package:
pip install hindsight-agent-framework
The recommended backend is Hindsight Cloud: sign up free, create an API key, and set it once:
export HINDSIGHT_API_KEY=hsk_your_token
Self-hosting works the same way; run the API locally and point the provider at it:
pip install hindsight-all
export HINDSIGHT_API_LLM_API_KEY=your-openai-key
hindsight-api # http://localhost:8888
HindsightProvider(bank_id="user-123", hindsight_api_url="http://localhost:8888")
Adding Memory to an Agent
This is the entire integration. Attach HindsightProvider to your agent's context_providers and it does the rest:
from agent_framework.openai import OpenAIChatClient
from hindsight_agent_framework import HindsightProvider
agent = OpenAIChatClient().as_agent(
name="assistant",
instructions="You are a helpful assistant.",
context_providers=[HindsightProvider(bank_id="user-123")],
)
session = agent.create_session()
await agent.run("Remember that I prefer vegetarian food.", session=session)
# ...later, even in a new process:
await agent.run("Suggest a recipe.", session=session) # recalls the preference
The second run, even from a cold process, recalls the vegetarian preference and the recipe suggestion respects it. Nothing in your agent loop changed; the provider handled recall before the run and retain after it.
Per-User Memory
The bank_id is the routing key for memory. A Hindsight bank is just a namespace, and you decide its granularity: one per user, one per agent, one per session, whatever fits your application.
The most common pattern is per-user. Set bank_id to the user's ID and each person gets an isolated memory that follows them across sessions and machines:
HindsightProvider(bank_id=f"user-{current_user.id}")
A support agent built this way recognizes a returning customer's history. A personal assistant remembers one user's preferences without leaking them into another's. Because the bank is server-side (on Cloud or your own instance), the same memory is available wherever the agent runs.
Configuration
HindsightProvider(bank_id, ...) takes a handful of optional arguments:
| Argument | Default | What it does |
|---|---|---|
budget | mid | Recall depth: low (fast) / mid / high (thorough). |
max_tokens | 4096 | Cap on the recalled-memory block injected into instructions. |
auto_recall | true | Recall and inject memories before each run. |
auto_retain | true | Retain the exchange after each run. |
recall_tags / recall_tags_match | none / any | Filter recall to tagged memories. |
tags | none | Tags applied to retained memories. |
mission | none | Creates the bank with a fact-extraction persona that steers what gets remembered. |
api_key / hindsight_api_url | env | Connection settings, if not using environment variables. |
You can also call configure(...) once to set process-wide defaults instead of repeating them on every provider.
Three Things It Gets Right
No tool-calling required. Memory is wired to lifecycle hooks, not exposed as a tool. The model can't forget to use it, and there's no tool-use latency. This is the same reason the Cline integration uses hooks instead of MCP: deterministic beats hopeful.
It never blocks your agent. Recall and retain are best-effort. If Hindsight is briefly unreachable, the failure is swallowed and logged, and the agent runs normally without memory for that turn. A memory hiccup never takes down the agent.
No feedback loop. The memories injected at the start of a run are not retained back at the end of it. Only the genuine user input and agent response are stored, so recalled context doesn't get re-ingested and amplified over time. (A regression test exists specifically to keep it that way.)
Recap
| Microsoft Agent Framework default | With Hindsight | |
|---|---|---|
| Memory across runs | None | Automatic, per bank |
| Setup | n/a | One context provider |
| Recall mechanism | n/a | before_run hook, injected as instructions |
| Retain mechanism | n/a | after_run hook |
| Model tool-calling needed | n/a | No |
| Per-user isolation | n/a | Set bank_id |
| Failure behavior | n/a | Best-effort, never blocks the agent |
Next Steps
- Hindsight Cloud: ui.hindsight.vectorize.io
- Integration docs: Microsoft Agent Framework + Hindsight
- Source: vectorize-io/hindsight/hindsight-integrations/agent-framework
- Why hooks beat tools: Cline Persistent Memory
