Query and analyze metrics from various monitoring systems directly in your runbooks.

Supported Data Sources

  • Prometheus
  • Datadog
  • CloudWatch
  • New Relic
  • Graphite

Query Format

PromQL (Prometheus)

# CPU usage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

Datadog

# Basic query
avg:system.cpu.user{*} by {host}

# Timeshift comparison
time_shift(avg:system.cpu.user{*}, 86400)

Examples

CPU Usage by Service

# CPU usage by Kubernetes pod
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)

Error Rate

# HTTP 5xx error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)

Best Practices

  • Use appropriate time ranges for your queries
  • Add filters to narrow down results
  • Use rate() for counters
  • Label your metrics consistently
  • Use recording rules for complex queries