This recent release cycle (v1.3.13 and v1.4.1) have some significant upgrades to the platform:

  1. Context propagation: Capability to auto-extract relevant keywords from an alert and inject them as variables in any command / playbook task.
  2. kubectl commands: Run kubectl commands on your kubernetes containers in AWS / GCP or your self-managed k8s from PlayBooks now.
  3. Google Cloud Monitoring integration: Fetch metrics and logs from Google Cloud monitoring from within playbook task.
  4. Remote Server Authentication: Support for different remote server key types and authentication via PEM, PEM+Passphrase, username+password

New Features:

  • Context propagation: Capability to auto-extract relevant keywords from an alert and inject them as variables in any command / playbook task.
  • Heterogeneous tasks: Now add different tasks within a single step. This will be a significant upgrade while creating conditions as output of different tools can now be combined and used while evaluating conditions.

Integrations added:

  • Logs from Google Cloud
  • Metrics from Google Cloud
  • kubectl commands (self-serve, GKE, EKS)
  • Support for different key types and authentication via PEM, PEM+Passphrase, username+password

Conditions added:

  • Grep conditions

Upgrades to UI

  • Making Notes readable over hover
  • A WYSIWYG editor
  • Update to timerange selector
  • Timeline cards now include notes and external links.
  • Added search functionalities and advanced search text features in the UI.

Deployment practices

  • Image versioning and build timestamp environment variables in Docker context.
  • Added github workflows for automatic deployment on Coruscant and Sandbox environments.

Fixed

  • General bug fixes in pattern matching and edge index issues.
  • Resolved issues in workflow and logs UI for better clarity on task execution logs.
  • Typo fixes and minor text corrections across various modules.
  • Environment variable missing issue resolved.
  • Fixed search-engine ordering and enabled proper filtering.
  • Resolved issues related to settings pop-up message.

Removed

  • Removed old log assets from asset management.
  • Removed validations from test connection feature for optimization.

This release cycle was heavy on adding integrations at multiple layers within the platform:


Added

  • Okta SSO addition, ElasticSearch & Loki.
  • New integrations for MS Teams and PagerDuty, enhancing the runbook automation platform.
  • New features for handling different data sources and external links in the system.

Improved

  • Introduced "Test connection" across integrations to enable testing while adding a data source.
  • UI Updates in Workflow and PlayBooks configuration pages.

Removed

  • Deprecated the list view in support of Builder view + Execution view.

Fixed

  • Minor fixes in workflow configuration logic and various UI bug fixes to improve overall stability and user experience.

Unreleased

  • Context Propagation -- Read more on this feature here

Made with Changelog Generator

Added

Conditional Steps in PlayBooks

We previously discussed about Conditionals. Today, we are releasing support for hierarchical nodes with conditions (having rules) connecting individual steps. We currently support the following rules:

  • Aggregation Queries on top of timeseries metrics data
  • Row count / column value evaluation for Tabular data
  • Coming soon: Regex matching on strings


Execution Sessions

Going forward, all activity done within PlayBooks is logged in the platform for quick reference. Here are the scenarios where it can be helpful:

  • During incidents: Quick sharing of investigated data with team members without having to re-check.
  • Post incidents: Reference to data from the duration of an incident/investigation.
  • Audit: Both automated as well as manually run executions will be available -- ensuring that you are aware which user / bot has accessed which data through the platform.
  • Note: Executions done while creating or editing a playbook are not persisted currently.

Architectural Upgrades:

  • PlayBook architecture -- Support for sequential tasks with if/else blocks required upgrading the definition to support parent-child relationships within a PlayBook definition.
  • Workflow revamp -- Uncoupling of the backend of workflow into 3 independent components (Triggers, PlayBook Executions, Actions). This project was picked to enable accelerated contribution of new integrations into the code. (Read this doc to learn more about how workflows work).

Integrations:

  • GKE: Fetching events, deployments & log information from your k8s clusters hosted using GKE.
  • MS Teams: Send notification of playbook run into your Microsoft Teams' channels.

Improvements:

UI / UX Upgrades:

  • Addition of the Timeline view for user for quick sequential investigation.
  • Hyperlinking every data source addition page to connector page.
  • Auto-fetching the hostname of the PlayBooks server (reducing the need to manually enter it during Slack integration)

Integrations

  • Slack -- Updated the permissions in manifest to support for private channels.
  • Remote server: Now add multiple hosts against one PEM key, making it easier to add multiple integrations from one config.
  • Deployment: Docker image multiple-chip support Apple / Intel.

Bugfixes:

  • Integrations: Slack -- Bugfix to select the active Slack connector instead of first active/inactive connector.
  • UI: External links & Notes visibility bug fix.
  • Edit names of existing playbook: Previously, it was creating a copy with new name after playbook name edit.

Unreleased:

  • Global Variable context -- from workflow to Playbooks.

Made with Changelog Generator

Added

  • Support for multiple connectors of a single data source
  • New Connectors: Azure Log Analytics, iFrame, Grafana custom query, Trino

Changed

  • Update to the Helm charts to make databases stateful
  • API handling enhancements -- Updated JSON handling as strings for API task payloads and headers

Fixed

  • Fixed Slack notification URL
  • Fixed mySQL integration
  • Minor and critical bug fixes in Grafana task definitions and query rendering
  • Fixed list view execution bug in UI
  • Resolved issues with Slack connector authentication and linking
  • Addressed minor UI bugs relating to back button and error displays
  • Handled exceptions and minor issues across EKS and Azure task interactions
  • PostgreSQL integration fixes regarding field mismatches and query rendering

Deprecated

  • Deprecated support for dashboard driven queries from Grafana (If you've created steps using this, it'll still work but new ones will not. Going forward, please use the Grafana custom query steps)

Unreleased

  • Support for GKE

Made with Changelog Generator

Highlights:


New Features:

  • Setup: Install the project using Helm charts
  • Setup: Install the project from the last release instead of doing it from the main branch
  • Playbooks:Support to run any query on a Mimir data source. Watch Demo
    • Feature to Disable SSL verification
  • Playbooks: API call improvement: Added JSON handling as strings. Watch demo.

Bugfixes:

  • Playbooks: Usage of global variables in tasks.
  • Playbooks: Interpretation: fixes to ensure the correct step interpretation selection in lists.
  • Playbooks: image generation using Plotly fix.
  • Playbooks: checks before user page load to verify user existence.
  • Playbooks: Updated transformers for both old and new Grafana task definitions.
    • Fixed multiple bugs in Grafana integrations including ssl verify flag, precreated PB execution, and more.
  • Playbooks: Fixed auth token issue in Slack connector.

Unreleased:

2 critical stories have been picked up that are soon to be completed:

  • Adding multiple connectors of same source type:
    Currently, we have an architecture where there's only one connector feasible for a single connector. With this update, we will allow for multiple connectors of the same type to be added within any account. This will make it a 2 tier connector-source configuration now.
  • De-linking tasks with connectors:
    A playbook comprises of multiple steps. Each steps is a set of tasks which could be fetching metrics, logs, db queries, bash commands or more.
    In the current architecture, any step within the playbook needs to be mapped with a pre-configured integration. This makes it hard for teams to copy steps between different playbooks or create steps for tools that might not have been integrated yet, but would be.
    To overcome this barrier, we will be removing the need to mandatorily have an connected integration within the task of a playbook. In case an integration is marked in the step but is not configured, it will now send a warning to user that connector is not added.

Highlights:

Support for Builder View:

Now Playbooks support running both a drag-and-drop view, as well as a Notebook style view.

Support for AI driven analysis of a metric:


New Features:

Improvements:

  • Workflows: Upgraded message in Slack response from Playbook link to sending actual investigation data in response to a Slack alert.
  • Workflows: View list of past runs, edit existing configurations.
  • Workflows: Configure Schedules to define how frequently investigations should run.
  • Workflows: View execution logs of previous runs, including the values of each data.

A workflow is a way to configure playbooks for auto-trigger given a specific event (API call, alert, etc.). In this release, we built and released Workflows with the following functionality:

  • Triggers: Ability to setup an entry point against any alert coming within your Slack channels.

  • Run: Select a playbook to run on workflow execution

  • Notification: Receive response in specific channels in your Slack

  • Workflow Creation Pages

Transitioning from Cloud to Open Source

We started with building a cloud based MVP for Playbooks. Very soon, we reached 8-10 integrations and the interest from users started to spike but after a few discussions with our users, we realised that the barrier to exposing observability data to a Cloud platform hindered users frequently from the trying the tool.

For the release v0.1 of the Open Source framework, we did the following:

  • Refactoring of the code for Open Source Playbooks.
  • Removal of any custom logic / code.
  • Enable self-creation of an App in Slack and manual integration of Datadog as against the previously existing OAuth flow.
  • Auth: Removed Google OAuth, the complexity around managing multiple accounts/clients and the need to verify email.
  • Created a file-server within the project to support image sending capabilities.

See more on Github: https://github.com/DrDroidLab/PlayBooks/releases/tag/v0.1.0-beta

In this cycle, we focused on adding integrations with Kubernetes hosted with EKS within teams and self-hosted Grafana instances behind VPCs. We also emphasised on the website performance:

Integrations:

  • Kubectl EKS integration.
  • Support for Grafana within a VPC via agent.

Test confirmation:

  • For any integration, there's now a way to validate and confirm the connection status after adding the API key/token/auth details in the integration tab.

Performance improvement:

  • Code refactoring to reduce loading time.
  • Added alert insights to sandbox.
  • Datadog integration specific performance improvement to load metadata faster.