Data quality rules library

A data quality rule can be set on either a table or a field to check specific data quality criteria. Rules can be scheduled to run based on a flexible scheduler and can alert the user in case of a breach on Slack or by email.

Find below the available data quality rules with a brief description of what they do.

TypeRuleApplicable ToDescription
MetadataCompleteness (ML-Based)TableCounts the new ingested rows and compares them to expectations based on past behavior.
MetadataDuplicates (ML-Based)TableComputes the duplication rate on row-level and compares it to expectations based on past behavior.
MetadataFreshness (ML-based)TableDetects the frequency of your table new rows ingestion.
MetadataSchema ChangeTableDetects any new change to the schema: new field(s), removed field(s), existing field(s) with updated types or names.
MetricsAverage (static thresholds)Fields: NumericThe rule fails if the average of the field is outside of a given range
MetricsValues count (static thresholds)Fields: AllThe rule fails if the number of unique values of the field is outside of a given range
MetricsQuantile (static thresholds)Fields: NumericThe rule fails if a quantile of the field is outside of a given range.
MetricsValues (static thresholds)Fields: NumericThe rule fails if the chosen field has one or more values outside of a given range.
MetricsStandard Deviation (static thresholds)Fields: NumericThe rule fails if the standard deviation of the field is outside of a given range.
MetricsVariance (static thresholds)Fields: NumericThe rule fails if the variance of the field is outside of a given range.
Smart MetricsMetrics (dynamic thresholds)Fields: numeric)The rule fails is the selected statistical transformation of the field behaves differently than it did in the past.
Field profilingDistribution ChangeFields: AllThe rule fails if the distribution of a given field has changed abnormally compared to a former given run.
Field profilingDuplicates in % (static threshold)Fields: AllThe rule fails if chosen field duplicate rate is superior to a given threshold.
Field profilingDuplicates in # (dynamic threshold)Fields: AllThe rule fails if the count of duplicate values of the field is abnormal compared to expectations based on past behavior.
Field profilingDuplicates in % (dynamic threshold)Fields: AllThe rule fails if the % of duplicate values of the field is abnormal compared to expectations based on past behavior.
Field profilingLow CardinalityFields: AllThe rule fails if:

- The chosen field has several different values above a given threshold;
- The different values of the field changed since the previous run
Ex: from ['dog', 'cat'] to ['dog', 'rabbit', 'turtle'].
Field profilingNot after dateFields: Timestamps, DatesThe rule fails if the table has rows after a given date.
Field profilingNot before dateFields: Timestamps, DatesThe rule fails if the table has rows before a given date.
Field profilingNot in the listFields: StringThe rule fails if the chosen field has values that are not present in the given list.
Field profilingNull (or Empty)Fields: AllThe rule fails if the chosen field has values that are empty/null.
Field profilingNull in # (dynamic threshold)Fields: AllThe rule fails if the count of null values of the field is abnormal compared to expectations based on past behavior.
Field profilingNull in % (dynamic threshold)Fields: AllThe rule fails if the % of null values of the field is abnormal compared to expectations based on past behavior.
Field profilingUniqueFields: AllThe rule fails if the chosen field has duplications.
Format validationIs an emailFields: StringThe rule fails if the chosen field contains at least one row that does not have an email format.
Format validationIs a phone numberFields: StringThe rule fails if the chosen field contains at least one row that does not have a phone number format.
Format validationIs UUIDFields: StringThe rule fails if the chosen field contains at least one row that does not have a UUID format.
CustomSQLTableAdvanced template to write custom rules based on business specifics. The SQL query must describe a quality breach on one or more tables within the same data source.