Playbooks

Automate your repetitive, manual tasks

Welcome to Playbooks documentation

Playbooks is an Open Source platform for investigation automation.



What are Playbooks?

PlayBooks help you write executable notebooks for on-call investigations / remediations instead of Google Docs or Wikis.


Demo Video


Past context

In our previous jobs, we were at a food delivery startup in India with a busy on-call routine for backend & devops engineers and a small tech team. Often business impacting issues (e.g. orders dropped by >5% in the last 15 minutes) would escalate to Dipesh as he was the lead dev who had been around for a while and he always had 4-5 hypotheses on what might have failed. To avoid becoming the bottleneck, he used to write scripts that fetched custom metrics & order related application logs every 5 minutes during peak traffic. So if an issue was reported, engineers would check the output of those scripts with all the usual suspects first, before diving into a generic exploration. This was the inspiration to get started on PlayBooks.

We’ve put together a platform that can help any dev create scripts with flexibility and without requiring to code much.

Goals

Our goals with PlayBooks as we began was:

  • Collaborative / easy to write & share investigation progress
  • Automation first / AI friendly
  • With / without laptop (work with Slack + be phone friendly)

Integrations

Using PlayBooks, a user can configure the steps as data queries or actions within their observability stack. Here are the integrations we currently support:

  • Run bash commands on a remote server;
  • Fetch logs from AWS Cloudwatch and Azure Log Analytics;
  • Fetch metrics from any PromQL compatible db, AWS Cloudwatch, Datadog and New Relic;
  • Query PostgreSQL, ClickHouse or any other JDBC compatible databases;
  • Write a custom API call;
  • Query events from EKS / GKE;
  • Add an iFrame

The platform focuses on not just running the tasks but also displaying information in a meaningful form with relevant graphs / logs / text outputs alongside the steps in a notebook format. Some of our users have shared feedback that on-call decision making overload has reduced with PlayBooks as relevant data from multiple tools is presented upfront in one page.

See full list of integrations here.

Capabilities

Here are some of the key features that we believe will further increase the value to users looking to improve developer experience for their on-call engineers:

  • Automated links of PlayBooks sent with alerts & enriching alerts with above-mentioned data
  • AI-supported interpretation layer — connect with LLM or ML models to auto-analyze the data in the playbook
  • Logs of historical executions to ease the effort of creating post-mortems / timelines and/or share information with peers

If this looks like something that would have been useful for you on-call or will be in your current workspace, we welcome you to try our sandbox: https://sandbox.drdroid.io/. We have added a default playbook. Just click on one of the steps in the playbook and then the “Run” button to see the playbook in action.

We are excited to hear what you like about the PlayBooks and what you think could improve the oncall developer experience for your team. Please drop your comments here – we will read them eagerly and respond!