Skip to main content

Monitoring

Hindsight provides comprehensive monitoring through Prometheus metrics and pre-built Grafana dashboards.

Local Development

For local metrics visualization, a convenience script downloads and runs Prometheus and Grafana:

./scripts/dev/start-monitoring.sh

This will start:

Production Deployment

The local monitoring script is for development only. In production, you need to install and configure Prometheus and Grafana separately, then point Prometheus to scrape your Hindsight API's /metrics endpoint.

Grafana Dashboards

Pre-built dashboards are available in monitoring/grafana/dashboards/. Import these JSON files into your Grafana instance:

DashboardDescription
Hindsight OperationsOperation rates, latency percentiles, per-bank metrics
Hindsight LLM MetricsLLM calls, token usage, latency by scope/provider
Hindsight API ServiceHTTP requests, error rates, DB pool, process metrics

The dashboards are automatically provisioned when using the monitoring stack script.

Metrics Endpoint

Hindsight exposes Prometheus metrics at /metrics:

curl http://localhost:8888/metrics

Available Metrics

Operation Metrics

MetricTypeLabelsDescription
hindsight.operation.durationHistogramoperation, bank_id, source, budget, max_tokens, successDuration of operations in seconds
hindsight.operation.totalCounteroperation, bank_id, source, budget, max_tokens, successTotal number of operations executed

Labels:

  • operation: Operation type (retain, recall, reflect)
  • bank_id: Memory bank identifier
  • source: Where the operation was triggered from (api, reflect, internal)
  • budget: Budget level if specified (low, mid, high)
  • max_tokens: Max tokens if specified
  • success: Whether the operation succeeded (true, false)

The source label allows distinguishing between:

  • api: Direct API calls from clients
  • reflect: Internal recall calls made during reflect operations
  • internal: Other internal operations

LLM Metrics

MetricTypeLabelsDescription
hindsight.llm.durationHistogramprovider, model, scope, successDuration of LLM API calls in seconds
hindsight.llm.calls.totalCounterprovider, model, scope, successTotal number of LLM API calls
hindsight.llm.tokens.inputCounterprovider, model, scope, success, token_bucketInput tokens for LLM calls
hindsight.llm.tokens.outputCounterprovider, model, scope, success, token_bucketOutput tokens from LLM calls

Labels:

  • provider: LLM provider (openai, anthropic, gemini, groq, ollama, lmstudio)
  • model: Model name (e.g., gpt-4, claude-3-sonnet)
  • scope: What the LLM call is for (memory, reflect, entity_observation, answer)
  • success: Whether the call succeeded (true, false)
  • token_bucket: Token count bucket for cardinality control (0-100, 100-500, 500-1k, 1k-5k, 5k-10k, 10k-50k, 50k+)

HTTP Request Metrics

MetricTypeLabelsDescription
hindsight.http.durationHistogrammethod, endpoint, status_code, status_classDuration of HTTP requests in seconds
hindsight.http.requests.totalCountermethod, endpoint, status_code, status_classTotal number of HTTP requests
hindsight.http.requests.in_progressUpDownCountermethod, endpointNumber of HTTP requests currently being processed

Labels:

  • method: HTTP method (GET, POST, PUT, DELETE)
  • endpoint: Request path (normalized to reduce cardinality - UUIDs replaced with {id})
  • status_code: HTTP status code (200, 400, 500, etc.)
  • status_class: Status code class (2xx, 4xx, 5xx)

Database Pool Metrics

MetricTypeLabelsDescription
hindsight.db.pool.sizeGauge-Current number of connections in the pool
hindsight.db.pool.idleGauge-Number of idle connections in the pool
hindsight.db.pool.minGauge-Minimum pool size
hindsight.db.pool.maxGauge-Maximum pool size

Process Metrics

MetricTypeLabelsDescription
hindsight.process.cpu.secondsGaugetypeProcess CPU time in seconds
hindsight.process.memory.bytesGaugetypeProcess memory usage in bytes
hindsight.process.open_fdsGauge-Number of open file descriptors
hindsight.process.threadsGauge-Number of active threads

Labels:

  • type (CPU): user or system
  • type (Memory): rss_max (maximum resident set size)

Histogram Buckets

Custom bucket boundaries are configured for better percentile accuracy:

Operation Duration Buckets (seconds):

0.1, 0.25, 0.5, 0.75, 1.0, 2.0, 3.0, 5.0, 7.5, 10.0, 15.0, 20.0, 30.0, 60.0, 120.0

LLM Duration Buckets (seconds):

0.1, 0.25, 0.5, 1.0, 2.0, 3.0, 5.0, 10.0, 15.0, 30.0, 60.0, 120.0

HTTP Duration Buckets (seconds):

0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0

Prometheus Configuration

scrape_configs:
- job_name: 'hindsight'
static_configs:
- targets: ['localhost:8888']

Example Queries

Average operation latency by type

rate(hindsight_operation_duration_sum[5m]) / rate(hindsight_operation_duration_count[5m])

LLM calls per minute by provider

rate(hindsight_llm_calls_total[1m]) * 60

P95 LLM latency

histogram_quantile(0.95, rate(hindsight_llm_duration_bucket[5m]))

Total tokens consumed by model

sum by (model) (hindsight_llm_tokens_input_total + hindsight_llm_tokens_output_total)

Internal vs API recall operations

sum by (source) (rate(hindsight_operation_total{operation="recall"}[5m]))

HTTP requests per second by endpoint

sum by (endpoint) (rate(hindsight_http_requests_total[1m]))

HTTP error rate (5xx)

sum(rate(hindsight_http_requests_total{status_class="5xx"}[5m])) / sum(rate(hindsight_http_requests_total[5m]))

P95 HTTP latency

histogram_quantile(0.95, sum by (le) (rate(hindsight_http_duration_seconds_bucket[5m])))

Database pool utilization

hindsight_db_pool_size / hindsight_db_pool_max

Active database connections

hindsight_db_pool_size - hindsight_db_pool_idle

CPU usage rate

rate(hindsight_process_cpu_seconds{type="user"}[1m])