Tool Learning Demo
This is a complete, runnable application demonstrating Hindsight integration. View source on GitHub →
An interactive Streamlit demo showing how Hindsight helps LLMs learn which tool to use when tool names are ambiguous.
The Problem
When building AI agents with tool/function calling, tool names and descriptions aren't always clear. An LLM might randomly select between similarly-named tools, leading to incorrect behavior.
The Scenario
This demo simulates a customer service routing system with two channels:
| Tool | Description (What the LLM sees) | Actual Purpose (Hidden) |
|---|---|---|
route_to_channel_alpha | "Routes to channel Alpha for appropriate request types" | Financial issues (refunds, billing, payments) |
route_to_channel_omega | "Routes to channel Omega for appropriate request types" | Technical issues (bugs, features, errors) |
The descriptions are intentionally vague! Without prior knowledge, the LLM must guess which channel handles what.
The Solution: Learning with Hindsight
With Hindsight memory:
- Store routing feedback about which channel handles which request type
- Retrieve learned knowledge when making routing decisions
- Consistently route correctly based on past experience
Quick Start
Prerequisites
- Hindsight Server running (Docker):
docker run -d -p 8888:8888 -p 9999:9999 \
-e HINDSIGHT_API_LLM_PROVIDER=openai \
-e HINDSIGHT_API_LLM_API_KEY=$OPENAI_API_KEY \
-e HINDSIGHT_API_LLM_MODEL=gpt-4o-mini \
ghcr.io/vectorize-io/hindsight:latest
- OpenAI API Key:
export OPENAI_API_KEY=your-key-here
Run the Demo
./run.sh
Or manually:
pip install -r requirements.txt
streamlit run app.py
How to Use the Demo
Step 1: Test Without Memory (Baseline)
- Select a Financial Request (e.g., "I need a refund...")
- Click Route Request
- Observe: The "Without Hindsight" column may route incorrectly
Step 2: Route First Customer and Learn
- Route a customer → Both LLMs route simultaneously
- Feedback is automatically stored to Hindsight
- Wait ~5 seconds for Hindsight to index the memory
Step 3: Test With Memory
- Select another request (financial or technical)
- Click Route Request
- Observe: The "With Hindsight" column should now route correctly!
Step 4: View Statistics
- See accuracy comparison between "Without Memory" vs "With Hindsight"
- Review test history to see the improvement over time
Demo Features
- Side-by-side comparison: See routing results with and without memory
- Pre-defined test requests: Financial and technical scenarios
- Custom requests: Enter your own customer requests
- Memory Explorer: Query stored routing knowledge directly
- Live statistics: Track accuracy improvement
Key Insight
Even when tool names and descriptions don't reveal their purpose, Hindsight allows the LLM to learn from experience which tool to use for which type of request.
This is especially valuable for:
- Enterprise systems with legacy tool names
- Multi-tenant systems where tools have generic names
- Agents that need to learn organization-specific workflows
Configuration
| Setting | Default | Description |
|---|---|---|
| Model | gpt-4o-mini | LLM model for routing decisions |
| Temperature (No Memory) | 0.7 | Randomness for baseline tests |
| Hindsight API URL | http://localhost:8888 | Hindsight server URL |
Files
app.py- Main Streamlit applicationrequirements.txt- Python dependenciesrun.sh- Launch script with dependency checkingREADME.md- This file