Volume (dynamic)
Overview
Sifflet Completeness Monitor is a table-level metadata monitor. It detects changes in data volume newly ingested into the system. Significant changes in data volume may indicate data duplication, data loss, or data corruption. By identifying and addressing these issues early on, the accuracy of data may be ensured, enabling better analysis.
Metadata Monitoring
Metadata can be defined as information about data, including its structure and transformations applied to it. Metadata monitoring helps identify and address issues related to data integration, and data transformations.
As data volumes and complexities continue to grow, metadata monitoring is becoming increasingly crucial for maintaining a reliable and trustworthy data ecosystem.
How to
How it works
Completeness Monitor compares the actual volume of data ingested in a dataset per time interval (an hour, a day) with the expected volume of ingestion. Expectations are computed by Machine Learning models based on the historical behavior of data.
Example
All the orders from different selling platforms of a company are being aggregated in a table called "Orders". Monitoring the daily volume of data being ingested in that table can detect:
- Missing data coming from one of the source (technical issue)
- Less orders made on one selling platform (business issue)
Tips
Completeness monitors are often used with a "Group by" statement in order to identify volume drift at a more precise granularity. Example: group by geography and a type of product for a international retail company.
Updated 2 months ago