Time Parameters

Overview

In data monitoring and analysis, Time Parameters are crucial for ensuring that the data being considered is relevant for the specific goals of the analysis. Adjusting them can help focus on the most pertinent data, whether the goal is looking for real-time insights or longer-term trends.


Modes

There are two modes regarding Time Parameters for Monitors

  • Full data scan : This is the mode that applies when Time Window is Off . Each run of the monitor will scan the entire table.
  • Incremental scan: This is the mode that applies when Time Window is Off and a Time field is selected. Each run of the monitor will scan an incremental portion of the data. i.e. Running a monitor incrementally each day on the last day's worth of data.

Time Parameters in Incremental Mode

There are two main Time Parameters to consider while configuring Sifflet Monitors in incremental mode:

  • Time Aggregation - The frequency of a point that is checked for anomaly i.e. daily, hourly, weekly.
  • Time Offset - The delay held by the data in the table.

📘

Dataset- vs Monitor-Level Parameters

It's important to know that Time Offset parameters can also be set a Dataset Level. If defined at the Dataset-level, their values will be used as defaults for all newly created monitors accepting those parameters. Consult this table for details.

Time Aggregation

The Time Aggregation describes intervals between points that are checked for anomalies.

This is the first parameter you should consider when establishing a monitor based on time. When selecting daily time aggregation, one datapoint per hour will be checked. Time aggregation also affects the maximum training window for ML monitors.

Read more about Time-based Data Aggregation

📘

Schedule vs Time Aggregation

Monitor Run Schedule doesn't need to be equal to its Time Aggregation parameter. For example, you may choose to run a Monitor once a week (Schedule: @weekly), with a daily datapoint creation frequency (Time-based Data Aggregation: daily).

Read more about Time-based Data Aggregation.

📘

Optimize your run frequency

You should adapt your frequency according to the refresh frequency of your data. If your data is updated by batch at 2am every day, running your monitors every hour would be suboptimal.

Time Offset

By default, Sifflet runs monitors on the today's date. In some cases, pipelines are configured so that they update with a delay: e.g. a sales table updated every morning with the data for T-2 days. To take this into account, Sifflet allows the user to change the reference date by using an OFFSET parameter.

The offset typically represents the data delays. If the table contains only orders from 2 days ago, setting an offset of 2 days will ensure that the monitor doesn't alert on empty or partial days.

🚧

Time Offset with other Time Parameters

Adding an Offset doesn't alter the duration of the time aggregation or the rolling aggregation window it only shifts it into the past by a given Offset.

Time offset of 1 day

Time offset of 1 day

Advanced parameters

Rolling Aggregation

Rolling Aggregation allows the setting of a custom time interval for every point being checked by Sifflet . When rolling aggregation is OFF the time interval will match the Time Aggregation. This means that typically, a daily point will represent one day's worth of data.

However there are scenarios where you might want a daily point that represents something other than a day, such as a daily point representing the rolling sum over the last 7 days.

📘

Rolling Aggregation on Static Monitors

You may notice Rolling Aggregation is missing for static monitors. While static monitors will match other monitors in terms of time settings in the upcoming release today static monitors do not differentiate between a data point and a run and do not have a time aggregation parameter, this means that each run of the monitor represents one data point.

It does however have a time window parameter which acts int he same way as the rolling aggregation parameter and specifies the time window being checked.

Lookback Period

By default, Sifflet monitors run incrementally and only query new data.

However there are scenarios where past data is susceptible to change and you want today's run of the monitor to also check previous days (or other time aggregation).

This is where the Lookback period comes in. It can be used to specify a time window for which previous data points should be rechecked for changes. Previous points that were previously within range but whose value now appears as an anomaly will be included in the run's result.


👍

Anomalies in the lookback window that have correct values are automatically qualified as fixed


Time Parameters as Code


Snapshot Mode:

parameters:
  kind: <MonitorKind>
  threshold:
    sensitivity: Low
  //no timewindow

Not setting a time window parameter will make the monitor scan the entire table each run.

Incremental Mode

timeWindow: object that represents the time window configuration, when absent the monitor will act in Snapshot Mode, when present the monitor will act in Incremental mode.

timeWindow.field: the time field that will be used to incrementally query data.

timeWindow.frequency: the time aggregation configuration. ISO 8601 format .P1D : Daily. P1W Weekly. P1M Monthly. P1H Hourly. P30M every 30 minutes. P20M every 20 minutes. P15M every 15 minutes.

timeWindow.firstRun: the amount of data to query in the first run. ISO 8601 format

timeWindow.offset: the offset that represents the delay between the data and the present. Typically it should match the granularity of the aggregation. Days for Dayly, Hours for Hourly etc. ISO 8601 format

parameters:
  kind: <MonitorKind>
	timeWindow:
    field: auto
    duration: P365D
    offset: P1D
    frequency: P1D

Incremental config for : An automatically selected time field. 1 point per day, with an offset of 1 to only track completed days. 365 Days queried in the first run to build the graph and the prediction model.\

timeWindow.rollingTimeWindow. An ISO 8601 format representation of the rolling time window each time represents. P7D will make each point a rolling aggregation over the last 7 days for each point.

parameters:
  kind: <MonitorKind>
	timeWindow:
    field: order_date
    firstRun: P365D
    offset: P1D
    frequency: P1D
		rollingTimeWindow: P7D

Incremental config for : An order_date time field. 1 point per day which represents an aggregation over the previous 7 days from that point's date. With an offset of 1 to only start from completed days. 365 Days queried in the first run to build the graph and the prediction model.

timeWindow.deltaQuerying. An ISO 8601 format representation of the rollback period of the monitor. P7D will check the last 7 daily datapoints each run, rather than checking only a single datapoint/day.

parameters:
  kind: <MonitorKind>
	timeWindow:
    field: order_date
    firstRun: P365D
    offset: P1D
    frequency: P1D
		deltaQuerying: P7D