Query and analyze metrics from various monitoring systems directly in your runbooks.
Supported Data Sources
- Prometheus
- Datadog
- CloudWatch
- New Relic
- Graphite
PromQL (Prometheus)
# CPU usage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)
# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
Datadog
# Basic query
avg:system.cpu.user{*} by {host}
# Timeshift comparison
time_shift(avg:system.cpu.user{*}, 86400)
Examples
CPU Usage by Service
# CPU usage by Kubernetes pod
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)
Error Rate
# HTTP 5xx error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
Best Practices
- Use appropriate time ranges for your queries
- Add filters to narrow down results
- Use rate() for counters
- Label your metrics consistently
- Use recording rules for complex queries