What's an Incident?

In Sifflet, an incident represents a specific, actionable issue or problem with your data. Think of it as a centralized hub for investigating and resolving a data quality problem.

An incident is triggered when Sifflet detects a problem via the result of a Monitor run or a dbt test. Every incident can contain one or multiple related failing checks, such as:

  • Monitor Failures: One or more of your configured monitors have failed.
  • dbt Test Failures: A dbt test has failed, indicating a problem with your data transformations.

To help you focus on the root cause and avoid alert fatigue, Sifflet intelligently groups related failures into a single incident based on grouping rules. This ensures that multiple symptoms of the same underlying problem are managed together.

The Incidents List Page

The main Incidents page provides a high-level overview of all the data incidents that have been created in your Sifflet environment.

The Incidents list page

The Incidents list page

From this page, for every incident, you can quickly see:

  • Name: The name of the incident. This is automatically generated by Sifflet (and updated when new failing monitors are added to the incident), but can be changed manually.
  • Data assets: The list of data assets affected by the incident.
  • Severity: The severity level of the monitor that triggered the incident.
  • Status: The current status of the incident (Open, In Progress, or Closed).
  • Assignee(s): The user or team members assigned to investigate and resolve the incident.
  • Last failure timestamp: The timestamp of the last failed monitor run for monitors that are part of this incident.
  • Number of compromised dashboards: The number of dashboards impacted by this incident.

Investigating a Specific Incident

To dive deeper into an issue, click on a specific incident from the list. This will take you to the main incident page, which contains detailed information to help you with your root cause analysis.

The Incident page

The Incident page

The incident detail page includes the following key sections:

  • Incident Details: This section includes key details about the incident, like its severity, assigned user(s), and the incident description.

    📘

    Sage-generated description

    If our Root Cause Analysis GenAI agent Sage is activated in your environment, the description will contain the following sections:

    • Overview: high-level overview of the incident and failing monitors
    • Details: An in-depth description of the incident
    • Root cause analysis: Sage's analysis of what might be the root cause of the incident.
  • Incident Scope: This section provides a summary of the incident, including the affected data assets and the specific failures that are part of this incident (e.g., a list of all the monitor failures).
  • Impacted Downstream Assets: To help you understand the full business impact of the incident, this section shows you which downstream assets (e.g., dashboards, reports) are affected by this issue.
  • Timeline: The timeline provides a chronological view of the incident, including when the issue was first detected, when the incident was created, and any status changes or comments made by your team. This helps you understand the history of the issue.

    📘

    Seeing events from incident/ticket management tools in the timeline

    If you integrated Sifflet with an incident or ticket management tool (like Jira or ServiceNow), events from such tools will be reflected in the incident timeline in Sifflet.

  • Lineage: The data lineage graph is a powerful tool for root cause analysis. It visually maps the upstream and downstream dependencies of the affected data asset(s), allowing you to quickly trace the problem to its source.

By using these different sections, you can efficiently investigate the root cause of the data issue, understand its impact, and collaborate with your team to resolve it.