Memory with LiteLLM
This recipe is available as an interactive Jupyter notebook. Open in GitHub →
This notebook demonstrates how to add persistent memory to any LLM app using the hindsight-litellm package. Memory storage and injection happen automatically via LiteLLM callbacks - no manual memory management needed!
Key features demonstrated:
configure()+enable()- Set up automatic memory integration- Automatic storage - Conversations are stored after each LLM call
- Automatic injection - Relevant memories are injected into prompts
The hindsight-litellm package hooks into LiteLLM's callback system to:
- Store each conversation after successful LLM responses
- Inject relevant memories into the system prompt before LLM calls
Prerequisites
Make sure you have Hindsight running:
export OPENAI_API_KEY=your-key
docker run --rm -it --pull always -p 8888:8888 -p 9999:9999 \
-e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY \
-e HINDSIGHT_API_LLM_MODEL=o3-mini \
-v $HOME/.hindsight-docker:/home/hindsight/.pg0 \
ghcr.io/vectorize-io/hindsight:latest
Installation
!pip install hindsight-litellm litellm nest_asyncio python-dotenv -U -q
Setup
import os
import uuid
import time
import logging
import nest_asyncio
from dotenv import load_dotenv
# Apply nest_asyncio for Jupyter compatibility
nest_asyncio.apply()
# Load environment variables
load_dotenv()
# Configure logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("LiteLLM").setLevel(logging.WARNING)
logging.getLogger("LiteLLM Router").setLevel(logging.WARNING)
logging.getLogger("LiteLLM Proxy").setLevel(logging.WARNING)
# Import hindsight_litellm
import hindsight_litellm
# Configuration
HINDSIGHT_API_URL = os.getenv("HINDSIGHT_API_URL", "http://localhost:8888")
# Check for API key
if not os.getenv("OPENAI_API_KEY"):
print("Warning: OPENAI_API_KEY not set")
Configure and Enable Automatic Memory
This is all you need! After this, all LiteLLM calls will automatically:
- Have relevant memories injected into the prompt
- Store conversations to Hindsight after the response
# Generate a unique bank_id for this demo session
bank_id = f"demo-{uuid.uuid4().hex[:8]}"
print(f"Using bank_id: {bank_id}")
# Configure and enable hindsight
hindsight_litellm.configure(
hindsight_api_url=HINDSIGHT_API_URL,
bank_id=bank_id,
store_conversations=True, # Automatically store conversations
inject_memories=True, # Automatically inject relevant memories
verbose=True, # Enable logging to debug memory operations
)
hindsight_litellm.enable()
print("Hindsight memory integration enabled!")
Conversation 1: User Introduces Themselves
In this first conversation, the user shares some information about themselves. This will be automatically stored to Hindsight memory.
user_message_1 = "Hi! I'm Alex and I work at Google as a software engineer. I love Python and machine learning."
print(f"User: {user_message_1}\n")
# Use hindsight_litellm.completion() directly
response_1 = hindsight_litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_message_1}
],
)
assistant_response_1 = response_1.choices[0].message.content
print(f"Assistant: {assistant_response_1}")
print("\n(Conversation automatically stored to Hindsight)")
Wait for Memory Processing
Hindsight needs a few seconds to process and extract facts from the conversation.
print("Waiting 12 seconds for memory processing...")
time.sleep(12)
print("Done!")
Conversation 2: Test Memory-Augmented Response
Now we start a fresh conversation and ask what the assistant remembers. The memories from the previous conversation will be automatically injected into the prompt!
user_message_2 = "What do you know about me? What programming language should I use for my next project?"
print(f"User: {user_message_2}\n")
# Memories are automatically injected before this call!
response_2 = hindsight_litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_message_2}
],
)
print(f"Assistant: {response_2.choices[0].message.content}")
Summary
The assistant should have remembered that Alex:
- Works at Google as a software engineer
- Loves Python and machine learning
And it should have recommended Python based on that knowledge!
print(f"Memories stored in bank: {bank_id}")
print(f"View in UI: http://localhost:9999/banks/{bank_id}")
Cleanup
hindsight_litellm.cleanup()
# Optional: delete the bank
import requests
response = requests.delete(f"{HINDSIGHT_API_URL}/v1/default/banks/{bank_id}")
print(f"Deleted bank: {response.json()}")