Runbooks in DrDroid help on-call engineers, SREs, and developers respond to production alerts through structured, repeatable workflows. They combine natural language reasoning, executable tasks, and contextual variables to standardize and accelerate debugging and remediation.

Whether you’re manually investigating an incident or proactively automating issue detection, runbooks enable your team to take consistent, informed action β€” powered by AI and your operational knowledge.


πŸ”§ What is a Runbook?

A runbook defines a sequence of investigative or corrective steps that can be:

  • Written using natural language instructions
  • Composed of reusable tasks
  • Parameterized with variables to adapt across environments or alerts

πŸ—οΈ Core Components

ComponentDescription
Natural Language InstructionsWritten by users to describe the debugging logic. The AI interprets this to understand the flow and task dependencies.
TasksAtomic, reusable steps that run commands, query metrics/logs, or trigger remediations.
VariablesDynamic placeholders (e.g., $host_ip, $service_name) used to generalize runbooks across alerts or services.

βš™οΈ Runbook Execution Modes

Runbooks in DrDroid can be executed in two primary modes β€” manually by users or automatically through system triggers. This flexibility allows teams to respond to incidents in real-time or proactively investigate issues in the background.

πŸ–οΈ Manual Execution

Runbooks can be manually executed in the following ways:

  • From the Alerts Inbox: Users can quickly select one or more alerts and apply a runbook directly without entering the alert details page. This is useful for bulk triaging or running standard checks.

  • From the Alert Details Page: During an investigation, users can pick a relevant runbook, review its steps, and execute them sequentially or all at once.

  • Task-level Execution: Users can also test and run individual tasks within a runbook without triggering the full workflow β€” especially useful during iteration or partial debugging.

Manual execution gives full control to the engineer, while still benefiting from AI-suggested steps and auto-filled variables based on the alert context.

πŸ€– Automatic Execution

DrDroid also supports automated runbook execution through:

  • Schedules: Define recurring runbook executions (e.g., every 15 mins, hourly) for proactive checks, health monitoring, or log scraping. Ideal for continuous observability without human intervention.

  • Automation Rules: Trigger runbooks automatically when alerts meet specific conditions β€” such as matching service tags, alert fingerprints, or severity levels. This supports auto-investigation workflows where the agent begins analysis the moment an alert arrives.

In both cases, the system uses AI to interpret the natural language instructions and dynamically populate variables, enabling meaningful execution without user input.


πŸ“ˆ Benefits of Using Runbooks

  • Consistency: Standardize how teams respond to common issues.
  • Speed: Reduce MTTR through predefined actions and automation.
  • Clarity: Capture and document tribal knowledge in one place.
  • Flexibility: Blend automation with human-in-the-loop decision-making.

πŸ“š What’s Next?

Explore more about how to build and use runbooks in DrDroid: