Runbooks Overview
Runbooks in DrDroid help on-call engineers, SREs, and developers respond to production alerts through structured, repeatable workflows. They combine natural language reasoning, executable tasks, and contextual variables to standardize and accelerate debugging and remediation.
Whether youβre manually investigating an incident or proactively automating issue detection, runbooks enable your team to take consistent, informed action β powered by AI and your operational knowledge.
π§ What is a Runbook?
A runbook defines a sequence of investigative or corrective steps that can be:
- Written using natural language instructions
- Composed of reusable tasks
- Parameterized with variables to adapt across environments or alerts
ποΈ Core Components
Component | Description |
---|---|
Natural Language Instructions | Written by users to describe the debugging logic. The AI interprets this to understand the flow and task dependencies. |
Tasks | Atomic, reusable steps that run commands, query metrics/logs, or trigger remediations. |
Variables | Dynamic placeholders (e.g., $host_ip , $service_name ) used to generalize runbooks across alerts or services. |
βοΈ Runbook Execution Modes
Runbooks in DrDroid can be executed in two primary modes β manually by users or automatically through system triggers. This flexibility allows teams to respond to incidents in real-time or proactively investigate issues in the background.
ποΈ Manual Execution
Runbooks can be manually executed in the following ways:
-
From the Alerts Inbox: Users can quickly select one or more alerts and apply a runbook directly without entering the alert details page. This is useful for bulk triaging or running standard checks.
-
From the Alert Details Page: During an investigation, users can pick a relevant runbook, review its steps, and execute them sequentially or all at once.
-
Task-level Execution: Users can also test and run individual tasks within a runbook without triggering the full workflow β especially useful during iteration or partial debugging.
Manual execution gives full control to the engineer, while still benefiting from AI-suggested steps and auto-filled variables based on the alert context.
π€ Automatic Execution
DrDroid also supports automated runbook execution through:
-
Schedules: Define recurring runbook executions (e.g., every 15 mins, hourly) for proactive checks, health monitoring, or log scraping. Ideal for continuous observability without human intervention.
-
Automation Rules: Trigger runbooks automatically when alerts meet specific conditions β such as matching service tags, alert fingerprints, or severity levels. This supports auto-investigation workflows where the agent begins analysis the moment an alert arrives.
In both cases, the system uses AI to interpret the natural language instructions and dynamically populate variables, enabling meaningful execution without user input.
π Benefits of Using Runbooks
- Consistency: Standardize how teams respond to common issues.
- Speed: Reduce MTTR through predefined actions and automation.
- Clarity: Capture and document tribal knowledge in one place.
- Flexibility: Blend automation with human-in-the-loop decision-making.
π Whatβs Next?
Explore more about how to build and use runbooks in DrDroid:
- Tasks β Define and reuse executable debugging actions
- Variables β Use dynamic values to generalize runbooks
- Natural Language Instructions β Guide the agent using prompts
- Example Runbooks β Learn from real-world templates
- Automation Rules β Set up auto-triggered investigations