Continuous Scan
Overview
It's a parameter that enables alerting on data quality issues occurring between subsequent Monitor Runs. If turned ON, it automatically includes all anomalies that happen since the last Monitor Run into the next Incident. It prevents existence of unmonitored periods of data for example in case of frequent run schedule changes.
How to
Activation
The setting may be tuned on in the Monitor Editing screen. It requires no further parameters.
Availability
Continuous Scan is currently available for ML-based Sifflet Monitor Templates and it'll soon be available for all templates. For the time being, it's available through the UI, and its support will soon be extended to the Monitors as Code.
Use cases
Example #1 - misalignment between Data Aggregation and Schedule
Let's take an example of a monitor with the following parameters:
- Template: Smart Metrics (Dynamic) >> Normalized Average
- Time Window (Model Training Period): 365 days
- Time-based Data Aggregation: hourly
- Schedule: @daily
Above settings result in 1 datapoint per hour, created every day.
- With Continuous Scan OFF, the Monitor will raise alerts only on data quality issues occurring concerning the last created datapoint (in this case: occurring on the hour of the run, potentially missing 23 anomalies).
- With Continuous Scan ON, the Monitor will raise alerts only on data quality issues occurring on any datapoint (in this case: 24 datapoints, 1 per hour) created during a daily Monitor Run.
Example #2 - Changing Time Offset
Context: Due to the data pipeline updates, data is no longer expected to arrive with a 7 days delay and 1 day will suffice. Therefore, a Monitor Time Offset setting may be adjusted as below:
- Template: Smart Metrics (Dynamic) >> Normalized Average
- Time Window (Model Training Period): 365 days
- Offset: 7 days -> 1 day
- Time-based Data Aggregation: 1 day
- Schedule: @daily
Above settings result in still 1 datapoint being created per day, but now starting 1 day before the Run Date (T-1) instead of 7 days before the Run date (T-7).
- With Continuous Scan OFF: during the first Monitor Run with the new schedule, there would be a window of unmonitored data of 6 days (from T-7 to T-1).
- With Continuous Scan ON: during the first Monitor Run with the new schedule, the window of 6 days (from T-7 to T-1) is being covered within that Run, as depicted in the screenshots below.
Updated 10 months ago