Continuous Scan

Overview

It's a parameter that enables alerting on data quality issues occurring between subsequent Monitor Runs. If turned ON, it automatically includes all anomalies that happen since the last Monitor Run into the next Incident. It prevents existence of unmonitored periods of data for example in case of frequent run schedule changes.

How to

Activation

The setting may be tuned on in the Monitor Editing screen. It requires no further parameters.

Availability

Continuous Scan is currently available for ML-based Sifflet Monitor Templates and it'll soon be available for all templates. For the time being, it's available through the UI, and its support will soon be extended to the Monitors as Code.

Use cases

Example #1 - misalignment between Data Aggregation and Schedule

Let's take an example of a monitor with the following parameters:

  • Template: Smart Metrics (Dynamic) >> Normalized Average
  • Time Window (Model Training Period): 365 days
  • Time-based Data Aggregation: hourly
  • Schedule: @daily

Above settings result in 1 datapoint per hour, created every day.

  • With Continuous Scan OFF, the Monitor will raise alerts only on data quality issues occurring concerning the last created datapoint (in this case: occurring on the hour of the run, potentially missing 23 anomalies).
Monitor Run from the 5.01.24 with reporting reporting only on 1 anomaly on the last hour of the daily run.

Monitor Run from the 5.01.24 with reporting reporting only on 1 anomaly on the last hour of the daily run.

  • With Continuous Scan ON, the Monitor will raise alerts only on data quality issues occurring on any datapoint (in this case: 24 datapoints, 1 per hour) created during a daily Monitor Run.
Monitor Run from the 5.01.24 with reporting anomalies in 9 timeslots across the last 24 hours.

Monitor Run from the 5.01.24 with reporting anomalies in 9 timeslots across the last 24 hours.

Example #2 - Changing Time Offset

Context: Due to the data pipeline updates, data is no longer expected to arrive with a 7 days delay and 1 day will suffice. Therefore, a Monitor Time Offset setting may be adjusted as below:

  • Template: Smart Metrics (Dynamic) >> Normalized Average
  • Time Window (Model Training Period): 365 days
  • Offset: 7 days -> 1 day
  • Time-based Data Aggregation: 1 day
  • Schedule: @daily

Above settings result in still 1 datapoint being created per day, but now starting 1 day before the Run Date (T-1) instead of 7 days before the Run date (T-7).

Monitor Run from the 5.01.24 with 7 days offset, showing no anomalies on the analysis day (29.12.23).

Monitor Run from the 5.01.24 with 7 days offset, showing no anomalies on the analysis day (29.12.23).

  • With Continuous Scan OFF: during the first Monitor Run with the new schedule, there would be a window of unmonitored data of 6 days (from T-7 to T-1).
Monitor Run from the 5.01.24 with 1 day offset, reporting only on 1 anomaly - on the analysis day (04.01.24).

Monitor Run from the 5.01.24 with 1 day offset, reporting only on 1 anomaly - on the analysis day (04.01.24).

  • With Continuous Scan ON: during the first Monitor Run with the new schedule, the window of 6 days (from T-7 to T-1) is being covered within that Run, as depicted in the screenshots below.
Monitor Run from the 5.01.24 with 1 day offset, reporting only on anomalies in 7 timeslots.

Monitor Run from the 5.01.24 with 1 day offset, reporting only on anomalies in 7 timeslots.