Added

Conditional Steps in PlayBooks

We previously discussed about Conditionals. Today, we are releasing support for hierarchical nodes with conditions (having rules) connecting individual steps. We currently support the following rules:

  • Aggregation Queries on top of timeseries metrics data
  • Row count / column value evaluation for Tabular data
  • Coming soon: Regex matching on strings


Execution Sessions

Going forward, all activity done within PlayBooks is logged in the platform for quick reference. Here are the scenarios where it can be helpful:

  • During incidents: Quick sharing of investigated data with team members without having to re-check.
  • Post incidents: Reference to data from the duration of an incident/investigation.
  • Audit: Both automated as well as manually run executions will be available -- ensuring that you are aware which user / bot has accessed which data through the platform.
  • Note: Executions done while creating or editing a playbook are not persisted currently.

Architectural Upgrades:

  • PlayBook architecture -- Support for sequential tasks with if/else blocks required upgrading the definition to support parent-child relationships within a PlayBook definition.
  • Workflow revamp -- Uncoupling of the backend of workflow into 3 independent components (Triggers, PlayBook Executions, Actions). This project was picked to enable accelerated contribution of new integrations into the code. (Read this doc to learn more about how workflows work).

Integrations:

  • GKE: Fetching events, deployments & log information from your k8s clusters hosted using GKE.
  • MS Teams: Send notification of playbook run into your Microsoft Teams' channels.

Improvements:

UI / UX Upgrades:

  • Addition of the Timeline view for user for quick sequential investigation.
  • Hyperlinking every data source addition page to connector page.
  • Auto-fetching the hostname of the PlayBooks server (reducing the need to manually enter it during Slack integration)

Integrations

  • Slack -- Updated the permissions in manifest to support for private channels.
  • Remote server: Now add multiple hosts against one PEM key, making it easier to add multiple integrations from one config.
  • Deployment: Docker image multiple-chip support Apple / Intel.

Bugfixes:

  • Integrations: Slack -- Bugfix to select the active Slack connector instead of first active/inactive connector.
  • UI: External links & Notes visibility bug fix.
  • Edit names of existing playbook: Previously, it was creating a copy with new name after playbook name edit.

Unreleased:

  • Global Variable context -- from workflow to Playbooks.

Made with Changelog Generator

Added

  • Support for multiple connectors of a single data source
  • New Connectors: Azure Log Analytics, iFrame, Grafana custom query, Trino

Changed

  • Update to the Helm charts to make databases stateful
  • API handling enhancements -- Updated JSON handling as strings for API task payloads and headers

Fixed

  • Fixed Slack notification URL
  • Fixed mySQL integration
  • Minor and critical bug fixes in Grafana task definitions and query rendering
  • Fixed list view execution bug in UI
  • Resolved issues with Slack connector authentication and linking
  • Addressed minor UI bugs relating to back button and error displays
  • Handled exceptions and minor issues across EKS and Azure task interactions
  • PostgreSQL integration fixes regarding field mismatches and query rendering

Deprecated

  • Deprecated support for dashboard driven queries from Grafana (If you've created steps using this, it'll still work but new ones will not. Going forward, please use the Grafana custom query steps)

Unreleased

  • Support for GKE

Made with Changelog Generator

Highlights:


New Features:

  • Setup: Install the project using Helm charts
  • Setup: Install the project from the last release instead of doing it from the main branch
  • Playbooks:Support to run any query on a Mimir data source. Watch Demo
    • Feature to Disable SSL verification
  • Playbooks: API call improvement: Added JSON handling as strings. Watch demo.

Bugfixes:

  • Playbooks: Usage of global variables in tasks.
  • Playbooks: Interpretation: fixes to ensure the correct step interpretation selection in lists.
  • Playbooks: image generation using Plotly fix.
  • Playbooks: checks before user page load to verify user existence.
  • Playbooks: Updated transformers for both old and new Grafana task definitions.
    • Fixed multiple bugs in Grafana integrations including ssl verify flag, precreated PB execution, and more.
  • Playbooks: Fixed auth token issue in Slack connector.

Unreleased:

2 critical stories have been picked up that are soon to be completed:

  • Adding multiple connectors of same source type:
    Currently, we have an architecture where there's only one connector feasible for a single connector. With this update, we will allow for multiple connectors of the same type to be added within any account. This will make it a 2 tier connector-source configuration now.
  • De-linking tasks with connectors:
    A playbook comprises of multiple steps. Each steps is a set of tasks which could be fetching metrics, logs, db queries, bash commands or more.
    In the current architecture, any step within the playbook needs to be mapped with a pre-configured integration. This makes it hard for teams to copy steps between different playbooks or create steps for tools that might not have been integrated yet, but would be.
    To overcome this barrier, we will be removing the need to mandatorily have an connected integration within the task of a playbook. In case an integration is marked in the step but is not configured, it will now send a warning to user that connector is not added.

Highlights:

Support for Builder View:

Now Playbooks support running both a drag-and-drop view, as well as a Notebook style view.

Support for AI driven analysis of a metric:


New Features:

Improvements:

  • Workflows: Upgraded message in Slack response from Playbook link to sending actual investigation data in response to a Slack alert.
  • Workflows: View list of past runs, edit existing configurations.
  • Workflows: Configure Schedules to define how frequently investigations should run.
  • Workflows: View execution logs of previous runs, including the values of each data.

A workflow is a way to configure playbooks for auto-trigger given a specific event (API call, alert, etc.). In this release, we built and released Workflows with the following functionality:

  • Triggers: Ability to setup an entry point against any alert coming within your Slack channels.

  • Run: Select a playbook to run on workflow execution

  • Notification: Receive response in specific channels in your Slack

  • Workflow Creation Pages

Transitioning from Cloud to Open Source

We started with building a cloud based MVP for Playbooks. Very soon, we reached 8-10 integrations and the interest from users started to spike but after a few discussions with our users, we realised that the barrier to exposing observability data to a Cloud platform hindered users frequently from the trying the tool.

For the release v0.1 of the Open Source framework, we did the following:

  • Refactoring of the code for Open Source Playbooks.
  • Removal of any custom logic / code.
  • Enable self-creation of an App in Slack and manual integration of Datadog as against the previously existing OAuth flow.
  • Auth: Removed Google OAuth, the complexity around managing multiple accounts/clients and the need to verify email.
  • Created a file-server within the project to support image sending capabilities.

See more on Github: https://github.com/DrDroidLab/PlayBooks/releases/tag/v0.1.0-beta

In this cycle, we focused on adding integrations with Kubernetes hosted with EKS within teams and self-hosted Grafana instances behind VPCs. We also emphasised on the website performance:

Integrations:

  • Kubectl EKS integration.
  • Support for Grafana within a VPC via agent.

Test confirmation:

  • For any integration, there's now a way to validate and confirm the connection status after adding the API key/token/auth details in the integration tab.

Performance improvement:

  • Code refactoring to reduce loading time.
  • Added alert insights to sandbox.
  • Datadog integration specific performance improvement to load metadata faster.

In this changelog, we discuss the 3 new added integrations, a Sandbox (which can be accessed without login) and slack integration to enable auto-trigger of a playbook.

Integrations added:

  • New Relic Integration

  • Datadog Integration

  • PostgreSQL


Feature work:


  • Editing existing playbook (only allowed by creator).


  • Playbook auto-triggering from a Slack alert.


Performance:

  • Reduced time to fetch tool assets from source page.

Metadata:

  • Improved metadata quality for PagerDuty & OpsGenie alerts.
  • Added a tabular data format with week over week analysis of noisy alerts.

WIP:

  • Automated RCA recommendation.

This changelog mentions newly added integrations (Cloudwatch Logs, Clickhouse DB) and sample playbooks as well as improvements to playbook metadata loading time and upgrades.

Integrations added:

  • Cloudwatch Logs
  • Clickhouse DB

Feature work:

  • Playbook metadata loading time
  • Enabling sample playbooks for all users
  • Playbook upgrade -- deletion, adding notes, external links & markdown