Time Parameters
Overview
In data monitoring and analysis, Time Parameters are crucial for ensuring that the data being considered is relevant for the specific goals of the analysis. Adjusting them can help focus on the most pertinent data, whether the goal is looking for real-time insights or longer-term trends.
Standard Parameters
There are two main Time Parameters to consider while configuring Sifflet Monitors:
- Time Aggregation - The frequency of a point that is checked for anomaly i.e. daily, hourly,
- Time Offset - The delay held by the data in the table.
Dataset- vs Monitor-Level Parameters
It's important to know that Time Offset parameters can also be set a Dataset Level. If defined at the Dataset-level, their values will be used as defaults for all newly created monitors accepting those parameters. Consult this table for details.
Time Aggregation
The Time Aggregation describes intervals between points that are checked for anomalies.
This is the first parameter you should consider when establishing a monitor based on time. When selecting daily time aggregation, one datapoint per hour will be checked. Time aggregation also affects the maximum training window for ML monitors.
Read more about Time-based Data Aggregation
Schedule vs Time Aggregation
Monitor Run Schedule doesn't need to be equal to its Time Aggregation parameter. For example, you may choose to run a Monitor once a week (Schedule: @weekly), with a daily datapoint creation frequency (Time-based Data Aggregation: daily).
Read more about Time-based Data Aggregation.
Optimize your run frequency
You should adapt your frequency according to the refresh frequency of your data. If your data is updated by batch at 2am every day, running your monitors every hour would be suboptimal.
Time Offset
By default, Sifflet runs monitors on the today's date. In some cases, pipelines are configured so that they update with a delay: e.g. a sales table updated every morning with the data for T-2 days. To take this into account, Sifflet allows the user to change the reference date by using an OFFSET parameter.
Time Offset with other Time Parameters
Adding an Offset doesn't alter the duration of the time aggregation or the rolling aggregation window it only shifts it into the past by a given Offset.
Advanced parameters
Rolling Aggregation
Rolling Aggregation allows the setting of a custom time interval for every point being checked by Sifflet . When rolling aggregation is OFF the time interval will match the Time Aggregation. This means that typically, a daily point will represent one day's worth of data.
However there are scenarios where you might want a daily point that represents something other than a day, such as a daily point representing the rolling sum over the last 7 days.
Rolling Aggregation on Static Monitors
You may notice Rolling Aggregation is missing for static monitors. While static monitors will match other monitors in terms of time settings in the upcoming release today static monitors do not differentiate between a data point and a run and do not have a time aggregation parameter, this means that each run of the monitor represents one data point.
It does however have a time window parameter which acts int he same way as the rolling aggregation parameter and specifies the time window being checked.
Lookback Period
By default, Sifflet monitors run incrementally and only query new data.
However there are scenarios where past data is susceptible to change and you want today's run of the monitor to also check previous days (or other time aggregation).
This is where the Lookback period comes in. It can be used to specify a time window for which previous data points should be rechecked for changes. Previous points that were previously within range but whose value now appears as an anomaly will be included in the run's result.
Anomalies that now have correct values are automatically qualified as fixed
Updated 4 months ago