Time Parameters
Overview
In data monitoring and analysis, Time Parameters are crucial for ensuring that the data being considered is relevant for the specific goals of the analysis. Adjusting them can help focus on the most pertinent data, whether the goal is looking for real-time insights or longer-term trends.
Modes
There are two modes regarding Time Parameters for Monitors
- Full data scan : This is the mode that applies when Time Window is Off . Each run of the monitor will scan the entire table.
- Incremental scan: This is the mode that applies when Time Window is Off and a Time field is selected. Each run of the monitor will scan an incremental portion of the data. i.e. Running a monitor incrementally each day on the last day's worth of data.
Time Parameters in Incremental Mode
There are two main Time Parameters to consider while configuring Sifflet Monitors in incremental mode:
- Time Aggregation - The frequency of a point that is checked for anomaly i.e. daily, hourly, weekly.
- Time Offset - The delay held by the data in the table.
Dataset- vs Monitor-Level Parameters
It's important to know that Time Offset parameters can also be set a Dataset Level. If defined at the Dataset-level, their values will be used as defaults for all newly created monitors accepting those parameters. Consult this table for details.
Time Aggregation
The Time Aggregation describes intervals between points that are checked for anomalies.
This is the first parameter you should consider when establishing a monitor based on time. When selecting daily time aggregation, one datapoint per hour will be checked. Time aggregation also affects the maximum training window for ML monitors.
Read more about Time-based Data Aggregation
Schedule vs Time Aggregation
Monitor Run Schedule doesn't need to be equal to its Time Aggregation parameter. For example, you may choose to run a Monitor once a week (Schedule: @weekly), with a daily datapoint creation frequency (Time-based Data Aggregation: daily).
Read more about Time-based Data Aggregation.
Optimize your run frequency
You should adapt your frequency according to the refresh frequency of your data. If your data is updated by batch at 2am every day, running your monitors every hour would be suboptimal.
Time Offset
By default, Sifflet runs monitors on the today's date. In some cases, pipelines are configured so that they update with a delay: e.g. a sales table updated every morning with the data for T-2 days. To take this into account, Sifflet allows the user to change the reference date by using an OFFSET parameter.
The offset typically represents the data delays. If the table contains only orders from 2 days ago, setting an offset of 2 days will ensure that the monitor doesn't alert on empty or partial days.
Time Offset with other Time Parameters
Adding an Offset doesn't alter the duration of the time aggregation or the rolling aggregation window it only shifts it into the past by a given Offset.
Advanced parameters
Rolling Aggregation
Rolling Aggregation allows the setting of a custom time interval for every point being checked by Sifflet . When rolling aggregation is OFF the time interval will match the Time Aggregation. This means that typically, a daily point will represent one day's worth of data.
However there are scenarios where you might want a daily point that represents something other than a day, such as a daily point representing the rolling sum over the last 7 days.
Rolling Aggregation on Static Monitors
You may notice Rolling Aggregation is missing for static monitors. While static monitors will match other monitors in terms of time settings in the upcoming release today static monitors do not differentiate between a data point and a run and do not have a time aggregation parameter, this means that each run of the monitor represents one data point.
It does however have a time window parameter which acts int he same way as the rolling aggregation parameter and specifies the time window being checked.
Lookback Period
By default, Sifflet monitors run incrementally and only query new data.
However there are scenarios where past data is susceptible to change and you want today's run of the monitor to also check previous days (or other time aggregation).
This is where the Lookback period comes in. It can be used to specify a time window for which previous data points should be rechecked for changes. Previous points that were previously within range but whose value now appears as an anomaly will be included in the run's result.
Anomalies in the lookback window that have correct values are automatically qualified as fixed
Time Parameters as Code
Snapshot Mode:
parameters:
kind: <MonitorKind>
threshold:
sensitivity: Low
//no timewindow
Not setting a time window parameter will make the monitor scan the entire table each run.
Incremental Mode
timeWindow: object that represents the time window configuration, when absent the monitor will act in Snapshot Mode, when present the monitor will act in Incremental mode.
timeWindow.field: the time field that will be used to incrementally query data.
timeWindow.frequency: the time aggregation configuration. ISO 8601 format .P1D
: Daily. P1W
Weekly. P1M
Monthly. P1H
Hourly. P30M
every 30 minutes. P20M
every 20 minutes. P15M
every 15 minutes.
timeWindow.firstRun: the amount of data to query in the first run. ISO 8601 format
timeWindow.offset: the offset that represents the delay between the data and the present. Typically it should match the granularity of the aggregation. Days for Dayly, Hours for Hourly etc. ISO 8601 format
parameters:
kind: <MonitorKind>
timeWindow:
field: auto
duration: P365D
offset: P1D
frequency: P1D
Incremental config for : An automatically selected time field. 1 point per day, with an offset of 1 to only track completed days. 365 Days queried in the first run to build the graph and the prediction model.\
timeWindow.rollingTimeWindow. An ISO 8601 format representation of the rolling time window each time represents. P7D
will make each point a rolling aggregation over the last 7 days for each point.
parameters:
kind: <MonitorKind>
timeWindow:
field: order_date
firstRun: P365D
offset: P1D
frequency: P1D
rollingTimeWindow: P7D
Incremental config for : An order_date time field. 1 point per day which represents an aggregation over the previous 7 days from that point's date. With an offset of 1 to only start from completed days. 365 Days queried in the first run to build the graph and the prediction model.
timeWindow.deltaQuerying. An ISO 8601 format representation of the rollback period of the monitor. P7D
will check the last 7 daily datapoints each run, rather than checking only a single datapoint/day.
parameters:
kind: <MonitorKind>
timeWindow:
field: order_date
firstRun: P365D
offset: P1D
frequency: P1D
deltaQuerying: P7D
Updated about 1 month ago