How to use incidents
Context: you have received an alert, notifying that one of your monitoring monitor has failed. You want to start your troubleshooting process, but do not know where to start.
You should first take a look at the associated incident.
Typical workflow
A typical workflow for an opened incident would be:
- Assign it to someone in your team
- Use the
Compromised assets
section to alert any team that could be impacted - Use the
Lineage
tab to get a better understanding of the upstream assets - Comment the resolution in the
Timeline
- Close the report by providing a final status:
Fixed
,False Positive
,Expected
orKnown issue
Assign the incident
According to the type of monitors that has failed, or the ownership of the corresponding tables, the incident should be assigned to someone of your team, responsible for handling the issue. Assigning a user to an incident or unassigning a user from an incident automatically sends an email to their email address to notify them.
You will find the incidents that have been assigned to you on your Dashboard
.
Alert the impacted teams
On the Overview
page of an incident report, you will find a list of all the compromised assets, that means all the assets depending from the one on which the monitor failed. In order to prevent any propagation of "bad" data, you should alert the teams working on those assets.
Look for the root cause
You should use the Lineage
page to understand all the upstream dependencies of the asset. This should help you find the root cause and save a lot of time in troubleshooting. Find more info here.
Comment the resolution
In order to easily communicate among your team and keep a record of your troubleshooting actions, you should use the Timeline
in the Overview
section. Make sure each action is correctly updated and commented.
Close the report
Once the incident is solved, you should close the report by giving a status, using the dropdown list at the top-right corner of the page.
Updated about 1 year ago