Time Parameters

Overview

In data monitoring and analysis, Time Parameters are crucial for ensuring that the data being considered is relevant for the specific goals of the analysis. Adjusting them can help focus on the most pertinent data, whether the goal is looking for real-time insights or longer-term trends.

How to

Parameters

There are 3 main Time Parameters to consider while configuring Sifflet Monitors:

  • Run Frequency - how often a Monitor is being run
  • Time Window - the data time-range to be analysed each time a Monitor is run
  • Time Offset - a shift into the past of the Time Window

📘

Dataset- vs Monitor-Level Parameters

It's important to know that Time Window and/or Time Offset parameters can also be set a Dataset Level. If defined at the Dataset-level, their values will be used as defaults for all newly created monitors accepting those parameters. Consult this table for details.

Run Frequency

The Run Frequency describes intervals at which a Monitor is being run. It can be defined either with wording shortcuts (@daily, @hourly) or a cron expression. More information on Monitor Scheduling.

📘

Run Frequency vs Datapoints creation frequency

Monitor Run Schedule doesn't need to be equal to its Data Aggregation parameter (frequency of datapoints being created). For example, you may choose to run a Monitor once a week (Schedule: @weekly), with a daily datapoint creation frequency (Time-based Data Aggregation: daily).

Read more about Time-based Data Aggregation.

📘

Optimize your run frequency

You should adapt your frequency according to the refresh frequency of your data. If your data is updated by batch at 2am every day, running your monitors every hour would be suboptimal.

Time Window

Time Window is the time interval that will be analyzed every time the Monitor is run. Sifflet will scan only data with a data field value falling into the Time Window range. Unless your tables are frequently loaded entirely, setting a Time Window is recommended in order to optimize your resources.

Setup

Let's say you want to monitor a table that is updated daily. Setting a Time Window on that table will require:

  • A specific data field that represents the time dimension (e.g. the creation time of a data entry)
  • A historical Time Window range that you want to consider for the monitor's scan (e.g. a week, a month)

Some templates require a Time Window whereas it's optional for others. You will be asked to select the field representing the time dimension, the unit of time and the numerical amount of that unit of time.

Time window on thelast 30 days, on field "Date"

Time window on the last 30 days, on field "Date"

On this schema, the set ups of frequency and time window are optimal

On this schema, the set ups of frequency and time window are optimal

📘

Optimize the Time Window setup

It's recommended to define the Time Window based on the ingestion frequency of the data being monitored. For example, if a table is updated daily, a Run Schedule of 1 day in combination with a Time Window of 1 day would be recommended.

Time Window (Model Training Period) in ML Monitor Templates

ML Monitor Templates are a special case, with the Time Window parameter used as a Model Training Period. It means that the time range defined will be used to define the history that will be used by the ML model to train.

Time Offset

By default, Sifflet runs monitors on the today's date. In some cases, pipelines are configured so that they update with a delay: e.g. a sales table updated every morning with the data for T-2 days. To take this into account, Sifflet allows the user to change the reference date by using an OFFSET parameter.

🚧

Time Window VS Time Offset

Adding an Offset on a Time Window doesn't alter the duration of the Time Window, only shifts it into the past by a given Offset.

Time offset of 1 day

Time offset of 1 day

On this schema, there is an offset of one time window

On this schema, there is an Offset of one Time Window

Dataset vs Monitor Level

Time Window and Time Offset can also be configured at a Dataset-level. When applied, Monitors newly created on that asset will, by default, inherit these parameters. The settings may, however, get overridden by Monitor-level values.

🚧

Inheritance only works for compatible Monitors

Time Window parameter gets inherited by Static Monitor Templates, while Time Offset gets inherited by ML Monitor Templates. Details in the table below.

📘

Updating the Dataset-level setting will not affect existing Monitors

Adjusting the Dataset-level setting only influences default settings of newly created Monitors - it doesn't impact existing ones. In order to update the already existing monitors, adjust them individually.

To do so, navigate to an Asset Page in the Data Catalog. If Sifflet detects at least one temporal field (e.g. a date), you will have a possibility to use it to configure a Time Window and Time Offset.

Inheritance overview

Monitor TypeMonitor TemplateInherits Time Window?Inherits Time Offset?
MetadataCompletenessnoyes
MetadataDuplicatesnoyes
MetadataFreshnessnoyes
MetadataSchema Changenono
MetricsStandard Deviation (static thresholds)yesno
MetricsUnique Values Count (static thresholds)yesno
MetricsAverage (static thresholds)yesno
MetricsSum (static thresholds)yesno
MetricsValues (static thresholds)yesno
MetricsQuantile (static thresholds)yesno
MetricsVariance (static thresholds)yesno
Smart MetricsMetrics (dynamic thresholds)noyes
Smart MetricsMetrics Custom (dynamic thresholds)noyes
Smart MetricsInterlinked Metrics nono
Field profilingDistribution Changenono
Field profilingDuplicates in % (static thresholds)yesno
Field profilingDuplicates in % (dynamic thresholds)noyes
Field profilingDuplicates in # (static thresholds)yesno
Field profilingDuplicates in # (dynamic thresholds)noyes
Field profilingLow Cardinalityyesno
Field profilingNot after dateyesno
Field profilingNot before dateyesno
Field profilingNot in the listyesno
Field profilingNull in # (static thresholds)yesno
Field profilingNull in # (dynamic thresholds)noyes
Field profilingNull in % (static thresholds)yesno
Field profilingNull in % (dynamic thresholds)noyes
Field profilingUniqueyesno
Format validationIs an emailyesno
Format validationIs a phone numberyesno
Format validationIs UUIDyesno
Format validationMatches regexyesno
CustomSQLnono
CustomConditional monitorsnono