Give Your OpenAI App a Memory in 5 Minutes
Build a ChatGPT-style chatbot with persistent memory using the OpenAI SDK and Hindsight. Three API calls — retain(), recall(), reflect() — and your app remembers users across restarts, no vector database or RAG pipeline required.
TL;DR
You'll build a ChatGPT-style chatbot that:
- Remembers facts across restarts
- Recalls relevant context automatically
- Synthesizes long-term knowledge on demand
All with three API calls:
retain()— store memoriesrecall()— retrieve relevant onesreflect()— synthesize across everything
No vector database. No embedding pipeline. No RAG plumbing.
Copy, paste, run.
The Problem: Your Chatbot Has Amnesia
You build a chatbot with OpenAI:
messages = [{"role": "system", "content": "You are a helpful assistant."}]
It works perfectly.
Until you restart the process.
Now it remembers nothing.
You can serialize messages to disk. But then:
- Context windows fill up
- Token costs explode
- You start truncating history
- The assistant forgets early decisions
We ran into this building Jerri, our internal AI project manager at Vectorize.
Jerri lives in Slack. It ingests meeting transcripts, tracks action items, and answers "what did we decide about X?" Without persistent memory, every session started from zero.
That's not memory. That's stateless autocomplete.
What you actually need:
- Store facts as they happen
- Retrieve only what's relevant
- Synthesize when necessary
That's what we're building.
Architecture
Here's the entire loop:
User message
↓
recall(query) ← pull relevant memories
↓
OpenAI completion ← inject memory into system prompt
↓
retain(exchange) ← store conversation
↓
Response
Three calls. Same pattern Jerri runs in production.
Step 1 — Start the Memory Layer
Install:
pip install hindsight-all
Start the server:
export HINDSIGHT_API_LLM_API_KEY=YOUR_OPENAI_KEY
hindsight-api
It runs locally at http://localhost:8888.
It includes:
- Embedded Postgres
- Fact extraction
- Semantic search
- Knowledge graph
- Synthesis engine
No external infrastructure.
Prefer not to self-host? Hindsight Cloud gives you the same API with no setup — just swap
base_urlfor your Cloud endpoint.
Step 2 — Baseline Chat (No Memory)
Let's start with the broken version:
from openai import OpenAI
openai = OpenAI()
messages = [{"role": "system", "content": "You are a helpful assistant."}]
while True:
user_input = input("You: ")
if user_input in ("quit", "exit"):
break
messages.append({"role": "user", "content": user_input})
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print(reply)
Works great.
Restart it.
Ask: "What's my name?"
Blank stare.
Step 3 — Add retain()
Create a memory bank:
from hindsight_client import Hindsight
hindsight = Hindsight(base_url="http://localhost:8888")
hindsight.create_bank(
bank_id="chatbot",
name="Chatbot Memory",
reflect_mission="Remember user preferences and important facts.",
)
Now retain every exchange:
hindsight.retain(
bank_id="chatbot",
content=f"User: {user_input}\nAssistant: {reply}",
)
That's it.
Hindsight extracts facts, identifies entities, builds relationships, and stores them in a knowledge graph. You don't manage any of that.
Step 4 — Add recall()
Before calling OpenAI, retrieve relevant memories:
memories = hindsight.recall(
bank_id="chatbot",
query=user_input,
budget="low",
)
memory_context = "\n".join(r.text for r in memories.results)
Inject into the system prompt:
system_prompt = "You are a helpful assistant."
if memory_context:
system_prompt += "\n\nRelevant past context:\n" + memory_context
Now:
- Tell it your name
- Restart
- Ask again
It remembers. Because recall injects relevant past facts into the prompt.
Step 5 — Add reflect() for Synthesis
recall returns facts. reflect returns reasoning.
For questions like:
- "What do you know about me?"
- "Summarize our conversations."
- "What patterns do you see?"
Use reflect:
reflection = hindsight.reflect(
bank_id="chatbot",
query=user_input,
)
memory_context = reflection.text
Reflect traverses the knowledge graph, runs an LLM reasoning chain, and synthesizes across memories.
In Jerri, reflect powers weekly summaries, sprint reviews, and cross-meeting analysis. In your chatbot, it handles "step back and think" queries.
Full Working Example
Copy this into chat.py:
from openai import OpenAI
from hindsight_client import Hindsight
openai = OpenAI()
hindsight = Hindsight(base_url="http://localhost:8888")
hindsight.create_bank(
bank_id="chatbot",
name="Chatbot Memory",
reflect_mission="Remember user preferences and key facts.",
)
SYSTEM_PROMPT = "You are a helpful assistant with long-term memory."
SYNTHESIS_KEYWORDS = [
"summarize",
"what do you know about me",
"what have we talked about",
]
def get_memory_context(user_input):
if any(k in user_input.lower() for k in SYNTHESIS_KEYWORDS):
reflection = hindsight.reflect(
bank_id="chatbot",
query=user_input,
)
return reflection.text
memories = hindsight.recall(
bank_id="chatbot",
query=user_input,
budget="low",
)
return "\n".join(r.text for r in memories.results)
def main():
conversation = []
print("Chat with memory. Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input in ("quit", "exit"):
break
memory_context = get_memory_context(user_input)
conversation.append({"role": "user", "content": user_input})
system = SYSTEM_PROMPT
if memory_context:
system += "\n\nRelevant context:\n" + memory_context
messages = [{"role": "system", "content": system}] + conversation
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": reply})
print(f"\nAssistant: {reply}\n")
hindsight.retain(
bank_id="chatbot",
content=f"User: {user_input}\nAssistant: {reply}",
)
if __name__ == "__main__":
main()
Run:
export OPENAI_API_KEY=YOUR_KEY
python chat.py
Restart it. It still remembers.
Production Lessons (From Building Jerri)
1. Retain after responding. Otherwise the assistant remembers questions but not answers.
2. Use budget="low" for chat loops. Sub-second latency. Upgrade only when needed.
3. One bank per user in multi-user apps. Otherwise memories leak across users.
4. Set a mission. Fact extraction quality depends heavily on it.
5. Start by retaining everything. Optimize later.
When to Use This Pattern
Use it if:
- You need cross-session memory
- You want synthesis across time
- You don't want to build RAG infra
Don't use it if:
- You only need single-session context
- You're storing structured database records
This solves the space between "chat history" and "knowledge base."
The Pattern to Remember
retain— after respondingrecall— before respondingreflect— when synthesizing
That's the loop.
That's what powers Jerri across weeks of meetings.
That's what you just built in 15 minutes.
Next Steps
- Add per-user banks with unique
bank_idper user - Use tags for scoped memory (
tagson retain,tags_matchon recall) - Add structured JSON output to reflect with
response_schema - Inspect memories in the web UI at
localhost:9999via Docker - Try hosted Hindsight instead of self-hosting
Persistent memory turns a chatbot into an agent.
Now yours remembers.
