Volume (static)

Overview

Sifflet Completeness Monitor is a table-level metadata monitor. It detects changes in data volume newly ingested into the system. Significant changes in data volume may indicate data duplication, data loss, or data corruption. By identifying and addressing these issues early on, the accuracy of data may be ensured, enabling better analysis.

📘

Metadata Monitoring

Metadata can be defined as information about data, including its structure and transformations applied to it. Metadata monitoring helps identify and address issues related to data integration, and data transformations.

As data volumes and complexities continue to grow, metadata monitoring is becoming increasingly crucial for maintaining a reliable and trustworthy data ecosystem.

How to

How it works

Completeness Monitor compares the actual volume of data ingested in a dataset per time interval (an hour, a day) with the expected volume of ingestion based on a set threshold. It's especially useful for pipelines in which data gets re-uploaded in its entirety at every update. Use it to make sure that no data gets lost between runs by comparing the number of rows to either an absolute or the previous run value.

Comparison Modes

The static Completeness Monitor Template comes with two modes: Absolute and Difference with previous run value.

Absolute

In the Absolute Mode, the number of ingested rows is compared to an arbitrarily defined threshold.

Difference with previous run value

In the Difference with previous run value Mode, the number of ingested rows is compared to the number of rows form the previous monitor run.

👍

Tips

Completeness monitors are often used with a "Group by" statement in order to identify volume drift at a more precise granularity. Example: group by geography and a type of product for a international retail company.