Custom Insights

How deeply does the Alert Insights Bot analyse?

The Alert Insights visible after installing the Slack bot are a result of in-depth analysis of the bot messages received in the respective alert channels.

In addition the generic modules to extract relevant context, there are custom modules written to analyse alerts from the following tools:

Datadog:

Datadog is a popular full stack Observability platform, with teams often relying on Datadog for their critical alerts on different data types like APM metric data, infrastructure metrics, et. al. For all the monitors that trigger alerts to your team, here are some custom insights that will be visible in Alert Insights dashboard:

  • Distribution of alerts by "monitor"
  • Count of alerts, distributed by "service" , "env" (E.g. 10 most frequent services across all alerts)

Sentry:

Your Sentry alerts can typically be due to same issue repeating itself or a new issue coming up. Irrespective, knowing what are the most common culprits across all issues received can help prioritise tech debt on certain culprits/endpoints.

  • Distribution of alerts by "issue_id"
  • Count of alerts, distributed by "project", "culprit" (E.g. 10 most frequent culprits across all alerts)

New Relic:

New Relic is a popular full stack Observability platform. For all your New Relic triggered incidents, here are some custom insights that will be visible:

  • Distribution of alerts by "policy", "condition"
  • Count of alerts, distributed by "impacted_entity", "env" (E.g. 10 most frequent entities across all alerts)

Coralogix:

Users frequently rely on alert rules created on top of the logs. If you are using Coralogix for creating such alerts, you will be able to also visualise:

  • Distribution of alerts by "application", "subsystem"
  • Count of alerts, distributed by "alert query" (E.g. 10 most frequent queries across all alerts)

Prometheus_AlertManager:

Prometheus AlertManager simplifies alert management through its effective handling of alerts sent by client applications such as the Prometheus server. Key insights available in Doctor Droid Alert Insights dashboard include:

  • Alert Classification: Alerts are classified by multiple dimensions such as alertname, device, job, container, endpoint, cluster, namespace, pod, and service. This classification helps in pinpointing the source and nature of noisy alerts, facilitating quicker updation of alert configurations.

Grafana_AlertManager:

Grafana AlertManager is the commonly adopter alert manager for Grafana users. With respect to the noise generated via Grafana managed alerts, you will be able to read following in Doctor Droid Alert Insights dashboard:

  • Distribution of alert volume by "alertname", "service", "metric". These details are crucial for understanding the scope and impact of an issue, assisting in the prioritization and troubleshooting process.

Robusta:

Robusta extends Kubernetes and Prometheus monitoring with its automated incident response solutions. For alerts triggered from Robusta, users can expect insights like:

  • Similar to Prometheus, alerts classification by "alertname", "device", "job", "container", "endpoint", "cluster", "namespace", "pod", and "service"**. This comprehensive categorization aids in a detailed analysis and quicker mitigation of issues.

AWS Cloudwatch Alarms:

AWS CloudWatch Alarms provide insights for efficiently monitoring AWS resources and applications. To help make alerts more actionable, in Doctor Droid Alert Insights dashboard you can see:

  • Distribution of alerts by "alarm_name", "metric_name", "cache_cluster_id", "cache_node_id", "error", "DBInstanceIdentifier"

GCP Monitoring Alarms:

GCP Monitoring Alarms allow for the monitoring of Google Cloud resources and applications with clarity. Insights provided include:

  • Distribution of alerts by "resource" and "metric_name".

PagerDuty Incidents:

PagerDuty incidents enables streamlining & smooth mapping of an alert to a relevant team & slack channel. Insights provided include:

  • Distribution of incidents by "title", "team", "service" or any other tag.

Opsgenie Incidents:

Opsgenie incidents enables streamlining & smooth mapping of an alert to a relevant team & slack channel. Insights provided include:

  • Distribution of incidents by "title", "team", "service" or any other tag.