Highlights:

Support for Builder View:

Now Playbooks support running both a drag-and-drop view, as well as a Notebook style view.

Support for AI driven analysis of a metric:


New Features:

Improvements:

  • Workflows: Upgraded message in Slack response from Playbook link to sending actual investigation data in response to a Slack alert.
  • Workflows: View list of past runs, edit existing configurations.
  • Workflows: Configure Schedules to define how frequently investigations should run.
  • Workflows: View execution logs of previous runs, including the values of each data.

A workflow is a way to configure playbooks for auto-trigger given a specific event (API call, alert, etc.). In this release, we built and released Workflows with the following functionality:

  • Triggers: Ability to setup an entry point against any alert coming within your Slack channels.

  • Run: Select a playbook to run on workflow execution

  • Notification: Receive response in specific channels in your Slack

  • Workflow Creation Pages

Transitioning from Cloud to Open Source

We started with building a cloud based MVP for Playbooks. Very soon, we reached 8-10 integrations and the interest from users started to spike but after a few discussions with our users, we realised that the barrier to exposing observability data to a Cloud platform hindered users frequently from the trying the tool.

For the release v0.1 of the Open Source framework, we did the following:

  • Refactoring of the code for Open Source Playbooks.
  • Removal of any custom logic / code.
  • Enable self-creation of an App in Slack and manual integration of Datadog as against the previously existing OAuth flow.
  • Auth: Removed Google OAuth, the complexity around managing multiple accounts/clients and the need to verify email.
  • Created a file-server within the project to support image sending capabilities.

See more on Github: https://github.com/DrDroidLab/PlayBooks/releases/tag/v0.1.0-beta

In this cycle, we focused on adding integrations with Kubernetes hosted with EKS within teams and self-hosted Grafana instances behind VPCs. We also emphasised on the website performance:

Integrations:

  • Kubectl EKS integration.
  • Support for Grafana within a VPC via agent.

Test confirmation:

  • For any integration, there's now a way to validate and confirm the connection status after adding the API key/token/auth details in the integration tab.

Performance improvement:

  • Code refactoring to reduce loading time.
  • Added alert insights to sandbox.
  • Datadog integration specific performance improvement to load metadata faster.

In this changelog, we discuss the 3 new added integrations, a Sandbox (which can be accessed without login) and slack integration to enable auto-trigger of a playbook.

Integrations added:

  • New Relic Integration

  • Datadog Integration

  • PostgreSQL


Feature work:


  • Editing existing playbook (only allowed by creator).


  • Playbook auto-triggering from a Slack alert.


Performance:

  • Reduced time to fetch tool assets from source page.

Metadata:

  • Improved metadata quality for PagerDuty & OpsGenie alerts.
  • Added a tabular data format with week over week analysis of noisy alerts.

WIP:

  • Automated RCA recommendation.

This changelog mentions newly added integrations (Cloudwatch Logs, Clickhouse DB) and sample playbooks as well as improvements to playbook metadata loading time and upgrades.

Integrations added:

  • Cloudwatch Logs
  • Clickhouse DB

Feature work:

  • Playbook metadata loading time
  • Enabling sample playbooks for all users
  • Playbook upgrade -- deletion, adding notes, external links & markdown

In the last two weeks, key updates include the launch of On-Call Playbooks with metadata integrations for New Relic, Datadog, Cloudwatch, and Grafana, as well as executions for Cloudwatch Metrics, Cloudwatch Logs, and Grafana Panels. Additionally, an Alert Insights Slack App has been submitted for Directory Publishing.

here are some of the key updates accomplished:

  1. On-Call Playbooks are now LIVE with following metadata integrations:

    1. New Relic

    2. Datadog

    3. Cloudwatch

    4. Grafana

  2. On-Call Playbooks are now LIVE with following executions:

    1. Cloudwatch Metrics

    2. Cloudwatch Logs

    3. Grafana Panels (Prometheus)

  3. Alert Insights Slack App Submitted for Directory Publishing

The team achieved milestones including going live on Datadog Integrations and launching a private beta of Playbooks for faster issue investigation. They also integrated and supported Coralogix alerts on the Alert Insights dashboard.

In the last two weeks, our team has achieved a few critical milestones:

  1. Went live on Datadog Integrations -- you can read more about the integrations and it's capabilities here -- https://docs.datadoghq.com/integrations/doctordroid/

  2. Launching private beta of Playbooks -- a faster way to investigate issues.

    1. Create a playbook with recommended steps to investigate an issue or add new steps as per your context.

    2. Auto-discovery of metadata for the stack pre-connected -- get full dictionary of all the assets accessible in the playbook.

  3. Integration & support for Coralogix alerts on the Alert Insights dashboard

In the last 2 weeks, new features were launched including Google Chat integration, deeper integrations with monitoring tools, a Datadog integrations plugin, and alert enrichment for Datadog. Personalization was also added to the User Experience.

Here are some of the things launched in the last 2 weeks:

  1. Added Google Chat -- an alternative to Slack channels popularly used by teams in the Google ecosystem.

  2. Enriching alert insights with

    1. Deeper integrations with monitoring tools: Sentry, New Relic, Cloudwatch, Datadog.
    2. Improved annotation model for different tools. Added support for Robusta, Prometheus_AlertManager and Signoz.
    Some of the labels that we are now able to extract from the alerts basis the improved model

    Some of the labels that we are now able to extract from the alerts basis the improved model

  3. Datadog integrations plugin submitted for automated diagnosis

  4. v0 of the alert enrichment for Datadog -- Imagine receiving an alert in your system and automatically receiving an analysis of recent deployments and metrics in the service.

    A sample analysis provided by our bot

    A sample analysis provided by our bot

  5. Adding personalisation into our User Experience, handling some dangling edge cases and strengthening our architecture. (no picture for this one ;) )