Give Your OpenAI App a Memory in 5 Minutes

March 5, 2026 · 5 min read

Hindsight Team

Build a ChatGPT-style chatbot with persistent memory using the OpenAI SDK and Hindsight. Three API calls — retain(), recall(), reflect() — and your app remembers users across restarts, no vector database or RAG pipeline required.

TL;DR

You'll build a ChatGPT-style chatbot that:

Remembers facts across restarts
Recalls relevant context automatically
Synthesizes long-term knowledge on demand

All with three API calls:

retain() — store memories
recall() — retrieve relevant ones
reflect() — synthesize across everything

No vector database. No embedding pipeline. No RAG plumbing.

Copy, paste, run.

The Problem: Your Chatbot Has Amnesia

You build a chatbot with OpenAI:

messages = [{"role": "system", "content": "You are a helpful assistant."}]

It works perfectly.

Until you restart the process.

Now it remembers nothing.

You can serialize messages to disk. But then:

Context windows fill up
Token costs explode
You start truncating history
The assistant forgets early decisions

We ran into this building Jerri, our internal AI project manager at Vectorize.

Jerri lives in Slack. It ingests meeting transcripts, tracks action items, and answers "what did we decide about X?" Without persistent memory, every session started from zero.

That's not memory. That's stateless autocomplete.

What you actually need:

Store facts as they happen
Retrieve only what's relevant
Synthesize when necessary

That's what we're building.

Architecture

Here's the entire loop:

User message
     ↓
recall(query)      ← pull relevant memories
     ↓
OpenAI completion  ← inject memory into system prompt
     ↓
retain(exchange)   ← store conversation
     ↓
Response

Three calls. Same pattern Jerri runs in production.

Step 1 — Start the Memory Layer

Install:

pip install hindsight-all

Start the server:

export HINDSIGHT_API_LLM_API_KEY=YOUR_OPENAI_KEY

hindsight-api

It runs locally at http://localhost:8888.

It includes:

Embedded Postgres
Fact extraction
Semantic search
Knowledge graph
Synthesis engine

No external infrastructure.

Prefer not to self-host? Hindsight Cloud gives you the same API with no setup — just swap base_url for your Cloud endpoint.

Step 2 — Baseline Chat (No Memory)

Let's start with the broken version:

from openai import OpenAI

openai = OpenAI()
messages = [{"role": "system", "content": "You are a helpful assistant."}]

while True:
    user_input = input("You: ")
    if user_input in ("quit", "exit"):
        break

    messages.append({"role": "user", "content": user_input})

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )

    reply = response.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})

    print(reply)

Works great.

Restart it.

Ask: "What's my name?"

Blank stare.

Step 3 — Add `retain()`

Create a memory bank:

from hindsight_client import Hindsight

hindsight = Hindsight(base_url="http://localhost:8888")

hindsight.create_bank(
    bank_id="chatbot",
    name="Chatbot Memory",
    reflect_mission="Remember user preferences and important facts.",
)

Now retain every exchange:

hindsight.retain(
    bank_id="chatbot",
    content=f"User: {user_input}\nAssistant: {reply}",
)

That's it.

Hindsight extracts facts, identifies entities, builds relationships, and stores them in a knowledge graph. You don't manage any of that.

Step 4 — Add `recall()`

Before calling OpenAI, retrieve relevant memories:

memories = hindsight.recall(
    bank_id="chatbot",
    query=user_input,
    budget="low",
)

memory_context = "\n".join(r.text for r in memories.results)

Inject into the system prompt:

system_prompt = "You are a helpful assistant."

if memory_context:
    system_prompt += "\n\nRelevant past context:\n" + memory_context

Now:

Tell it your name
Restart
Ask again

It remembers. Because recall injects relevant past facts into the prompt.

Step 5 — Add `reflect()` for Synthesis

recall returns facts. reflect returns reasoning.

For questions like:

"What do you know about me?"
"Summarize our conversations."
"What patterns do you see?"

Use reflect:

reflection = hindsight.reflect(
    bank_id="chatbot",
    query=user_input,
)

memory_context = reflection.text

Reflect traverses the knowledge graph, runs an LLM reasoning chain, and synthesizes across memories.

In Jerri, reflect powers weekly summaries, sprint reviews, and cross-meeting analysis. In your chatbot, it handles "step back and think" queries.

Full Working Example

Copy this into chat.py:

from openai import OpenAI
from hindsight_client import Hindsight

openai = OpenAI()
hindsight = Hindsight(base_url="http://localhost:8888")

hindsight.create_bank(
    bank_id="chatbot",
    name="Chatbot Memory",
    reflect_mission="Remember user preferences and key facts.",
)

SYSTEM_PROMPT = "You are a helpful assistant with long-term memory."

SYNTHESIS_KEYWORDS = [
    "summarize",
    "what do you know about me",
    "what have we talked about",
]


def get_memory_context(user_input):
    if any(k in user_input.lower() for k in SYNTHESIS_KEYWORDS):
        reflection = hindsight.reflect(
            bank_id="chatbot",
            query=user_input,
        )
        return reflection.text

    memories = hindsight.recall(
        bank_id="chatbot",
        query=user_input,
        budget="low",
    )
    return "\n".join(r.text for r in memories.results)


def main():
    conversation = []

    print("Chat with memory. Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")
        if user_input in ("quit", "exit"):
            break

        memory_context = get_memory_context(user_input)

        conversation.append({"role": "user", "content": user_input})

        system = SYSTEM_PROMPT
        if memory_context:
            system += "\n\nRelevant context:\n" + memory_context

        messages = [{"role": "system", "content": system}] + conversation

        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
        )

        reply = response.choices[0].message.content
        conversation.append({"role": "assistant", "content": reply})

        print(f"\nAssistant: {reply}\n")

        hindsight.retain(
            bank_id="chatbot",
            content=f"User: {user_input}\nAssistant: {reply}",
        )


if __name__ == "__main__":
    main()

Run:

export OPENAI_API_KEY=YOUR_KEY
python chat.py

Restart it. It still remembers.

Production Lessons (From Building Jerri)

1. Retain after responding. Otherwise the assistant remembers questions but not answers.

2. Use budget="low" for chat loops. Sub-second latency. Upgrade only when needed.

3. One bank per user in multi-user apps. Otherwise memories leak across users.

4. Set a mission. Fact extraction quality depends heavily on it.

5. Start by retaining everything. Optimize later.

When to Use This Pattern

Use it if:

You need cross-session memory
You want synthesis across time
You don't want to build RAG infra

Don't use it if:

You only need single-session context
You're storing structured database records

This solves the space between "chat history" and "knowledge base."

The Pattern to Remember

retain — after responding
recall — before responding
reflect — when synthesizing

That's the loop.

That's what powers Jerri across weeks of meetings.

That's what you just built in 15 minutes.

Next Steps

Add per-user banks with unique bank_id per user
Use tags for scoped memory (tags on retain, tags_match on recall)
Add structured JSON output to reflect with response_schema
Inspect memories in the web UI at localhost:9999 via Docker
Try hosted Hindsight instead of self-hosting

Persistent memory turns a chatbot into an agent.

Now yours remembers.

TL;DR​

The Problem: Your Chatbot Has Amnesia​

Architecture​

Step 1 — Start the Memory Layer​

Step 2 — Baseline Chat (No Memory)​

Step 3 — Add retain()​

Step 4 — Add recall()​

Step 5 — Add reflect() for Synthesis​

Full Working Example​

Production Lessons (From Building Jerri)​

When to Use This Pattern​

The Pattern to Remember​

Next Steps​