Volume (dynamic)

Overview

Sifflet Completeness Monitor is a table-level metadata monitor. It detects changes in data volume newly ingested into the system. Significant changes in data volume may indicate data duplication, data loss, or data corruption. By identifying and addressing these issues early on, the accuracy of data may be ensured, enabling better analysis.

📘

Metadata Monitoring

Metadata can be defined as information about data, including its structure and transformations applied to it. Metadata monitoring helps identify and address issues related to data integration, and data transformations.

As data volumes and complexities continue to grow, metadata monitoring is becoming increasingly crucial for maintaining a reliable and trustworthy data ecosystem.

How to

How it works

Completeness Monitor compares the actual volume of data ingested in a dataset per time interval (an hour, a day) with the expected volume of ingestion. Expectations are computed by Machine Learning models based on the historical behavior of data.

Example

All the orders from different selling platforms of a company are being aggregated in a table called "Orders". Monitoring the daily volume of data being ingested in that table can detect:

  • Missing data coming from one of the source (technical issue)
  • Less orders made on one selling platform (business issue)
A completeness monitor with an anomaly on the 14th of June

A completeness monitor with an anomaly on the 14th of June

👍

Tips

Completeness monitors are often used with a "Group by" statement in order to identify volume drift at a more precise granularity. Example: group by geography and a type of product for a international retail company.