Time Parameters
Overview
In data monitoring and analysis, Time Parameters are crucial for ensuring that the data being considered is relevant for the specific goals of the analysis. Adjusting them can help focus on the most pertinent data, whether the goal is looking for real-time insights or longer-term trends.
How to
Parameters
There are 3 main Time Parameters to consider while configuring Sifflet Monitors:
- Run Frequency - how often a Monitor is being run
- Time Window - the data time-range to be analysed each time a Monitor is run
- Time Offset - a shift into the past of the Time Window
Dataset- vs Monitor-Level Parameters
It's important to know that Time Window and/or Time Offset parameters can also be set a Dataset Level. If defined at the Dataset-level, their values will be used as defaults for all newly created monitors accepting those parameters. Consult this table for details.
Run Frequency
The Run Frequency describes intervals at which a Monitor is being run. It can be defined either with wording shortcuts (@daily, @hourly) or a cron expression. More information on Monitor Scheduling.
Run Frequency vs Datapoints creation frequency
Monitor Run Schedule doesn't need to be equal to its Data Aggregation parameter (frequency of datapoints being created). For example, you may choose to run a Monitor once a week (Schedule: @weekly), with a daily datapoint creation frequency (Time-based Data Aggregation: daily).
Read more about Time-based Data Aggregation.
Optimize your run frequency
You should adapt your frequency according to the refresh frequency of your data. If your data is updated by batch at 2am every day, running your monitors every hour would be suboptimal.
Time Window
Time Window is the time interval that will be analyzed every time the Monitor is run. Sifflet will scan only data with a data field value falling into the Time Window range. Unless your tables are frequently loaded entirely, setting a Time Window is recommended in order to optimize your resources.
Setup
Let's say you want to monitor a table that is updated daily. Setting a Time Window on that table will require:
- A specific data field that represents the time dimension (e.g. the creation time of a data entry)
- A historical Time Window range that you want to consider for the monitor's scan (e.g. a week, a month)
Some templates require a Time Window whereas it's optional for others. You will be asked to select the field representing the time dimension, the unit of time and the numerical amount of that unit of time.
Optimize the Time Window setup
It's recommended to define the Time Window based on the ingestion frequency of the data being monitored. For example, if a table is updated daily, a Run Schedule of 1 day in combination with a Time Window of 1 day would be recommended.
Time Window (Model Training Period) in ML Monitor Templates
ML Monitor Templates are a special case, with the Time Window parameter used as a Model Training Period. It means that the time range defined will be used to define the history that will be used by the ML model to train.
Time Offset
By default, Sifflet runs monitors on the today's date. In some cases, pipelines are configured so that they update with a delay: e.g. a sales table updated every morning with the data for T-2 days. To take this into account, Sifflet allows the user to change the reference date by using an OFFSET parameter.
Time Window VS Time Offset
Adding an Offset on a Time Window doesn't alter the duration of the Time Window, only shifts it into the past by a given Offset.
Dataset vs Monitor Level
Time Window and Time Offset can also be configured at a Dataset-level. When applied, Monitors newly created on that asset will, by default, inherit these parameters. The settings may, however, get overridden by Monitor-level values.
Inheritance only works for compatible Monitors
Time Window parameter gets inherited by Static Monitor Templates, while Time Offset gets inherited by ML Monitor Templates. Details in the table below.
Updating the Dataset-level setting will not affect existing Monitors
Adjusting the Dataset-level setting only influences default settings of newly created Monitors - it doesn't impact existing ones. In order to update the already existing monitors, adjust them individually.
To do so, navigate to an Asset Page in the Data Catalog. If Sifflet detects at least one temporal field (e.g. a date), you will have a possibility to use it to configure a Time Window and Time Offset.
Inheritance overview
Monitor Type | Monitor Template | Inherits Time Window? | Inherits Time Offset? |
---|---|---|---|
Metadata | Completeness | no | yes |
Metadata | Duplicates | no | yes |
Metadata | Freshness | no | yes |
Metadata | Schema Change | no | no |
Metrics | Standard Deviation (static thresholds) | yes | no |
Metrics | Unique Values Count (static thresholds) | yes | no |
Metrics | Average (static thresholds) | yes | no |
Metrics | Sum (static thresholds) | yes | no |
Metrics | Values (static thresholds) | yes | no |
Metrics | Quantile (static thresholds) | yes | no |
Metrics | Variance (static thresholds) | yes | no |
Smart Metrics | Metrics (dynamic thresholds) | no | yes |
Smart Metrics | Metrics Custom (dynamic thresholds) | no | yes |
Smart Metrics | Interlinked Metrics | no | no |
Field profiling | Distribution Change | no | no |
Field profiling | Duplicates in % (static thresholds) | yes | no |
Field profiling | Duplicates in % (dynamic thresholds) | no | yes |
Field profiling | Duplicates in # (static thresholds) | yes | no |
Field profiling | Duplicates in # (dynamic thresholds) | no | yes |
Field profiling | Low Cardinality | yes | no |
Field profiling | Not after date | yes | no |
Field profiling | Not before date | yes | no |
Field profiling | Not in the list | yes | no |
Field profiling | Null in # (static thresholds) | yes | no |
Field profiling | Null in # (dynamic thresholds) | no | yes |
Field profiling | Null in % (static thresholds) | yes | no |
Field profiling | Null in % (dynamic thresholds) | no | yes |
Field profiling | Unique | yes | no |
Format validation | Is an email | yes | no |
Format validation | Is a phone number | yes | no |
Format validation | Is UUID | yes | no |
Format validation | Matches regex | yes | no |
Custom | SQL | no | no |
Custom | Conditional monitors | no | no |
Updated 3 months ago