Parameters list and example for every monitor type WIP
Volume
The monitor fails if the number of data rows ingested behaves differently than in the past.
Parameters
kind: "Completeness" # (REQUIRED) Kind of monitor
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: Completeness
Complex monitor
kind: Completeness
threshold:
sensitivity: Low
bounds: MinAndMax
whereStatement: metricColumn > 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P30D
offset: PT1H
deltaQuerying: P2D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Duplicates
The monitor fails if the duplicate rate at a row level behaves differently than it did in the past.
Parameters
kind: "Duplicates" # (REQUIRED) Kind of monitor
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: Duplicates
Complex monitor
kind: Duplicates
threshold:
sensitivity: Low
bounds: MinAndMax
whereStatement: metricColumn > 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P30D
offset: PT1H
deltaQuerying: P2D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Freshness (Update time gap)
Available on the following data sources: BigQuery, Databricks, MySQL, Oracle, Snowflake.
The monitor fails when the duration since the last update deviates from historical norms.
Parameters
kind: MetadataFreshness # (REQUIRED) Kind of monitor
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: MetadataFreshness
Complex monitor
kind: MetadataFreshness
threshold:
kind: Dynamic
sensitivity: Low
Freshness
The monitor fails if the ingestion frequency of new rows behaves differently than it did in the past.
Parameters
kind: "Freshness" # (REQUIRED) Kind of monitor
threshold: # (optional - default Static) Threshold to use for detection
# Can be either Static or Dynamic
# Static threshold:
kind: "Static"
# Dynamic threshold:
kind: "Dynamic"
sensitivity: # (optional - default *Normal*) Sensitivity for the detection
"Low" | "Normal" | "High"
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: Freshness
Complex monitor
kind: Freshness
threshold:
kind: Dynamic
sensitivity: Low
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
offset: PT3H
deltaQuerying: PT3D
frequency: P1D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Schema Change
The monitor fails if the dataset's schema has changed since the previous run.
Parameters
kind: SchemaChange # (REQUIRED) Kind of monitor
Static Metrics
The monitor fails if the field’s aggregation is outside of a given range.
Parameters
kind: "StaticMetrics" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
aggregation: # (REQUIRED) Aggregation to apply
kind: # (REQUIRED) Kind of aggregation to use.
"Average" | "DistinctCount" | "Range" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
# For Quantile aggregation
quantile: Number # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold: # (REQUIRED) Threshold
isMinInclusive: Boolean # (optional - default *true*) Inclusive minimum
min: Number # (optional - default *null*) Expected minimum for the aggregated value
# null to disable detection on lower bound
isMaxInclusive: Boolean # (optional - default *true*) Inclusive maximum
max: Number # (optional - default *null*) Expected maximum for the aggregated value
# *null* to disable detection on upper bound
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
A monitor validating that the average of the field myField
is always above 1000.
kind: StaticMetrics
field: myField
aggregation:
kind: Average
threshold:
min: 1000
Complex monitor
kind: StaticMetrics
field: myField
aggregation:
kind: Quantile
quantile: 0.5
threshold:
isMinInclusive: false
min: 1000
isMaxInclusive: false
max: 2000
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Additional details
- One of
threshold.min
orthreshold.max
must be set to a non-nul value threshold.min
must be below thanthreshold.max
if both are set- Sum aggregation is not supported for Firebolt datasource
- Quantile aggregation is not supported for Hive datasource
Dynamic Metrics
The monitor fails if the selected field values (or their statistical transformations) behave differently than they did in the past.
Parameters
kind: "DynamicMetrics" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
aggregation: # (REQUIRED) Aggregation to apply
kind: # (REQUIRED) Kind of aggregation to use.
"Average" | "NormalizedAverage" | "DistinctCount" | "Min" | "Max" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
# For Quantile aggregation
quantile: Number # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition
Examples
Simple monitor
A monitor validating that distinct count of myField
behave similarly than in the past (detection is done with default sensitivity).
kind: StaticMetrics
field: myField
aggregation:
kind: DistinctCount
Complex monitor
kind: DynamicMetrics
field: myField
aggregation:
kind: NormalizedAverage
threshold:
sensitivity: Low
bounds: Max
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: auto
duration: P365D
offset: PT3H
deltaQuerying: P2D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Custom Metrics
Description
Parameters
kind: "CustomMetrics" # (REQUIRED) Kind of monitor
sql: String # (REQUIRED) SQL of the monitor to execute (see Additional details below)
threshold: DynamicThreshold
groupBy: GroupBy
timeWindow:
offset: Duration # (optional - default *null*) Delay the query of data by this duration
# *null* to disable using an offset
# Allowed duration units: Days, Hours
partition: Partition
Examples
Simple monitor
A monitor validating that distinct count of myField
behave similarly than in the past (detection is done with default sensitivity).
kind: CustomMetrics
sql: SELECT myField FROM SomeTable
Complex monitor
kind: CustomMetrics
sql: |
SELECT
myField = column1 * 100,
time = COMPUTETIME(column2, column3),
groupByField
FROM SomeTable
threshold:
sensitivity: Low
bounds: Max
groupBy: groupByField
timeWindow:
offset: PT1H
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Additional details
- The SQL query should at least return one field as per the following:
- The metric value: This should be a numerical field.
- The metric timing (optional): this should be a date/timestamp field. This field is optional. If not present in the query, the metric will be calculated over time by doing snapshots at every run.
- The monitoring dimensions (optional): this can be a categorical column to allow for multi-dimensional monitoring. This field is optional. If added, this should be an existing field in the table/view schema and not be referred to with an alias.
Static Field Profiling
The monitor fails if the null or duplicate value rate of the selected field is higher than a given threshold.
Parameters
kind: "StaticFieldProfiling" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
profiling: # (REQUIRED) Profiling to execute
kind: # (REQUIRED) Kind of profiling to use.
"NullCount" | "NullPercentage" | "DuplicateCount" | "DuplicatePercentage"
nullValues: # (OPTIONAL - default Null) Null values to check in case of NullCount or NullPercentage field profiling
"Null" | "NullAndEmpty" | "NullEmptyAndWhitespaces"
threshold: # (REQUIRED) Threshold
max: Number # (optional - default 0) Expected maximum occcurrences or percentage of null or duplicates
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: StaticFieldProfiling
field: myField
profiling:
kind: NullCount
Complex monitor
kind: StaticFieldProfiling
field: myField
profiling:
kind: NullPercentage
nullValues: NullAndEmpty
threshold:
max: 32.2
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Additional details
DuplicateCount
is not supported yet- Only
threshold.max = 0
is supported currently forNullCount
threshold.max
must be between 0 and 100 forNullPercentage
orDuplicatePercentage
threshold.max
must be greater or equal to 0 in all cases
Dynamic Field Profiling
The monitor fails if the null or duplicate value rate of the selected field behaves differently than they did in the past.
Parameters
kind: "DynamicFieldProfiling" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
fieldProfiling: # (REQUIRED) Profiling to execute
kind: # (REQUIRED) Kind of profiling to use.
"NullCount" | "NullPercentage" | "DuplicateCount" | "DuplicatePercentage"
nullValues: # (OPTIONAL - default Null) Null values to check in case of NullCount or NullPercentage field profiling
"Null" | "NullAndEmpty" | "NullEmptyAndWhitespaces"
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition
Examples
Simple monitor
kind: DynamicFieldProfiling
field: myField
fieldProfiling:
kind: NullCount
Complex monitor
kind: DynamicFieldProfiling
field: myField
fieldProfiling:
kind: DuplicateCount
threshold:
sensitivity: Low
bounds: Max
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
deltaQuerying: P2D
offset: PT3H
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Distribution change
The monitor fails if the distribution of a given field has changed significantly compared to a fixed or rolling reference date.
Parameters
kind: "Distribution" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
threshold: ... # (optional - default *Dynamic*) Threshold to use for detection
# Can be either *Static* or *Dynamic*
# *Static* threshold:
kind: "Static"
max: Number # (REQUIRED) Percentage, between 0 and 100 of allowed distribution change
onAddedCategory: Boolean # (optional - default *true*) Fail if a new category appeared since the last snapshot
onRemovedCategory: Boolean # (optional - default *false*) Fail if a category disappeared since the last snapshot
# *Dynamic* threshold:
kind: "Dynamic"
sensitivity: # (optional - default *Normal*) Sensitivity for the detection
"Low" | "Normal" | "High"
reference: # (optional - default *Rolling*) Time Reference for distribution comparison
# Can be either Fixed or Rolling
# Fixed reference
kind: Fixed
timestamp: Date # (REQUIRED) Reference date to use for distribution
# Rolling reference
kind: Rolling
delay: Duration # (optional - default *1 day*) Delay between the reference snapshot and the new snapshot
# Allowed formats: PnD
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: # See common parameter elements - Time window
field: String
duration: Duration # Duration
# Allowed units: *Days*
offset: Duration # Offset
partition: ... # See common parameter elements - Partition
Examples
Simple monitor
Check the distribution compared to the previous day using dynamic threshold with default sensitivity.
kind: Distribution
field: myField
Complex with Static Threshold and Fixed reference
kind: Distribution
field: myField
threshold:
kind: Static
percentage: 0.5
onAddedCategory: false
onRemovedCategory: true
reference:
kind: Fixed
timestamp: 2023-07-09
timeWindow:
field: timeWindowField
duration: P365D
offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Complex with Dynamic Threshold and Rolling reference
kind: Distribution
field: myField
threshold:
kind: Dynamic
sensitivity: Low
reference:
kind: Rolling
delay: P4D
timeWindow:
field: timeWindowField
duration: P365D
offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Field in List
The monitor fails if the selected field has values that are not in the given list.
Parameters
kind: "FieldInList" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
values: # (REQUIRED) Allowed values
- String
- ...
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: FieldInList
field: myField
values:
- value1
- value2
- value3
Complex monitor
kind: FieldInList
field: myField
values:
- value1
- value2
- value3
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
FieldUniqueness
Parameters
kind: "FieldUniqueness" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: FieldUniqueness
field: myField
Complex monitor
kind: FieldUniqueness
field: myField
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: PT10M
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Field Format
The monitor fails if the selected field contains at least one row that does not match the format specified.
Parameters
kind: "FieldFormat" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
format: # (REQUIRED) Expected format of the field values
kind: # (REQUIRED) Kind of format to validate.
"Email" | "Phone" | "UUID" | "Regex"
# For Regex kind
regex: String # (REQUIRED) Regex to use for validation
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: FieldFormat
field: myField
format:
kind: Email
Complex monitor
kind: FieldFormat
field: myField
format:
kind: Regex
regex: ^[a-zA-Z0-9]+$
whereStatement: myColumn != ''
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P30D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Additional details
- MS SQL is not supported for Regex format.
SQL
The monitor fails if the row count returned by the monitor query is >0.
Parameters
kind: "Sql" # (REQUIRED) Kind of monitor
sql: String # (REQUIRED) SQL query to execute
partition: Partition
Example
kind: Sql
sql: SELECT * WHERE COMPLEX_CALCULATION(myColumn) = 42
Updated about 1 month ago