Skip to main content

Alert Definitions

Every alert streaming into DrDroid is first matched with an alert definition. This can include source, text matching against the alert message. Several features can be enabled against an alert definition like auto investigation, escalation, runbooks etc.

Runbooks

Runbooks are a collection of steps that can be executed. Agent searches for runbooks relevant to the user prompt / alert or chooses to execute if one is linked to the alert definition. Runbooks can be linked to alert definitions or can be executed manually. They could also include scripts that can be executed to perform a task.

Teams

Teams are used to group users and link to services. Alerts linked to a service are assigned to the team linked to the service. The users within a team can go in rotation to be on-call and receive notifications over slack, sms or phone.

Services

Every alert is linked to a service if possible. Service linking helps in finding the right owner of an alert. Service is auto detected from the alert message or can be manually linked. A global list of services is assessed from the metadata extracted from connected sources. It can be manually added as well. A service’s context can be expanded by linking github repo and telemetry destinations, so alerts for that service can be investigated with rich context.

Assets

Assets are fetched from connected sources. This could be services, dashboards, log groups, component names etc and helps in creating a knowledge graph and overview for the agent to use to identify right query parameters.

Guidelines & Overview

Agent generates an overview of the organisation’s architecture that Droid Agent pre-creates for reference later on. User can also give custom guidelines to the agent to guide it in its investigation. They become part of the system prompt for the agent.
  • Overview: Agent added overview of the organisation’s architecture that Droid Agent pre-creates for reference later on.
  • Troubleshooting: User added hints for debugging scenarios, used in every investigation
  • Classification: User added hints on how to group alerts into issues
  • Escalation: User added hints on how to escalate issues to owners / teams

Memory

Droid Agent keeps accumulating daily memory of issues and conversations linked to them to re-use them in future investigations.

Code Repository Context

If given code access, Droid Agent analyses the code to create context docs for itself so when an alert is triggered, it can identify the repo + service quickly and understand how and what to query for root causing.