๐ง How It Works
- Describe the issue in a prompt โ as you would to a teammate.
- The agent interprets the intent, identifies relevant data sources and tools, and begins fetching context.
- It analyzes logs, metrics, configurations, or infra data to identify potential causes.
- Based on findings, it surfaces:
- Observations from the environment
- A likely root cause (if any)
- Suggested remediation actions
๐ Example
Prompt:๐ Investigation Summary
- Queried CPU metrics from Datadog for checkout-service pods.
- Found average CPU usage at 92% over the last 15 minutes.
- Analyzed top CPU-consuming processes via top command on the related nodes.
๐ก Insight:
- checkout-worker pod running Celery processes is using 55โ75% CPU on each replica.
- Spikes correlate with batch job executions at :00 and :30 marks.
๐ Root Cause:
- Worker pods overwhelmed during scheduled batch jobs due to insufficient CPU requests.
๐ ๏ธ Suggested Actions:
- Increase CPU requests/limits for checkout-worker pods.
- Consider spreading batch jobs or introducing queuing.
๐งฐ When to Use Prompt-based Debugging
- When no alert is raised but something โfeels offโ
- During exploratory analysis
- For recurring issues where you want a fresh look
- When you want a quick sanity check from the AI agent
๐ Notes
- Prompts work best when specific โ include service name, time window, or symptoms.
- The agent will fall back to general guidance if no integrations are configured.