Freshness

Overview

The Freshness monitor is a tool designed to track the regular update of data. By specifying a monitoring frequency, it verifies whether data has been ingested within the designated time intervals.

📘

Metadata Monitoring

Metadata can be defined as information about data, including its structure and transformations applied to it. Metadata monitoring helps identify and address issues related to data integration, and data transformations.

As data volumes and complexities continue to grow, metadata monitoring is becoming increasingly crucial for maintaining a reliable and trustworthy data ecosystem.

How to

Modes

The Sifflet Freshness Monitor Template offers two modes of operation.

Static Mode

In the Static Mode, the system expects data to be ingested within each time slot. If data is not found within a particular time interval, an incident is created to notify the user.

Dynamic Mode

In the Dynamic Mode, the system automatically detects an injection pattern of data based on past behaviour. An anomaly is detected only if data is expected to have arrived in the last time slot but has not. In this mode, a machine learning algorithm is trained to detect and predict data injection patterns, enhancing the accuracy of the alerts.

Example

Data is expected to arrive twice a week, every Monday and Thursday. A correctly set Freshness Monitor with the following parameters: Mode: Dynamic, Run Schedule: weekly, Time-based Data Aggregation: daily, will only throw an exception if data arrives on a different day than Monday or Thursday OR it will not arrive on one of the mentioned days.

Graph

The Freshness Monitoring Graph serves as a valuable tool for quickly identifying the freshness of the ingested data, allowing users to monitor and track data updates effectively. It provides a visual representation of the ingested data arrival. The graph uses binary values, 0 and 1, to represent the data status as follows:

  • 0 - no updates in the table since the previous time slot
  • 1 - new rows ingested during the last time slot, indicating recent data updates

Freshness Monitor, static mode

A daily based freshness monitoring

Freshness Monitor, dynamic mode

Time-based Data Aggregation

Independent from Monitor Run Schedule, the frequency at which datapoints will be created is defined by the Time-based Time Aggregation parameter.