Parameters list and example for every monitor type
Volume
The monitor fails if the number of data rows ingested behaves differently than in the past.
Parameters
kind: Volume # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity. This will look at the total number of rows in the table each run.
kind: Volume
Complex monitor
Incremental Volume that checks the daily number of rows, with 365 days of history for the first run with an offset of 1 day to only include completed days.
kind: Volume
threshold:
kind: Dynamic
sensitivity: 25
bounds: Min
whereStatement: metricColumn > 5
groupBy:
field: Category
timeWindow:
field: auto
firstRun: P365D
offset: P1D
frequency: P1D
Row-Level(Perfect) Duplicates
The monitor fails if the duplicate rate at a row level behaves differently than it did in the past.
Parameters
kind: "Duplicates" # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: RowDuplicates
Complex monitor
Daily Incremental Monitor that alerts whenever there are at least 1 Row Duplicates, with 365 days of history for the first run , checks the previous 2 days via Lookback and applies partitioning (BigQuery Only)
kind: RowDuplicates
threshold:
kind: Static
max: 0
isMaxInclusive: false
timeWindow:
field: auto
firstRun: P365D
frequency: P1D
deltaQuerying: P2D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Freshness (Update time gap)
Available on the following data sources: BigQuery, Databricks, MySQL, Oracle, Snowflake.
The monitor fails when the duration since the last update deviates from historical norms.
Parameters
kind: MetadataFreshness # (REQUIRED) Kind of monitor
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: MetadataFreshness
Freshness
The monitor fails if the ingestion frequency of new rows behaves differently than it did in the past.
Parameters
kind: Freshness # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor
The monitor can be used without any specific option by using the default sensitivity.
kind: Freshness
Complex monitor
kind: Freshness
threshold:
kind: Static
timeWindow:
field: auto
firstRun: P365D
frequency: P1D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Schema Change
The monitor fails if the dataset's schema has changed since the previous run.
Parameters
kind: SchemaChange # (REQUIRED) Kind of monitor
Metrics
The monitor fails if the field’s aggregation is outside of a given range.
Parameters
kind: Metrics # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
aggregation: # (REQUIRED) Aggregation to apply
kind: # (REQUIRED) Kind of aggregation to use.
"Average" | "DistinctCount" | "Range" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
# For Quantile aggregation
quantile: Number # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold: <Threshold> # (REQUIRED) Threshold
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor with Static Threshold
A monitor validating that the average of the field myField
is always above 1000.
kind: Metrics
field: myField
aggregation:
kind: Average
threshold:
kind: Static
min: 1000
Complex monitor
kind: Metrics
field: myField
aggregation:
kind: Quantile
quantile: 0.5
threshold:
kind: Static
min: 1000
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
firstRun: P365D
frequency: P1D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Custom Metrics
A monitor allowing incremental and dynamic monitoring of custom SQL queries
Parameters
kind: "CustomMetrics" # (REQUIRED) Kind of monitor
sql: String # (REQUIRED) SQL of the monitor to execute (see Additional details below)
threshold: <Threshold>
groupBy: <GroupBy>
timeWindow:
offset: <Duration> # (optional - default *null*) Delay the query of data by this duration
# *null* to disable using an offset
# Allowed duration units: Days, Hours
partition: <Partition>
Examples
Simple monitor
A monitor validating that distinct count of myField
behave similarly than in the past (detection is done with default sensitivity).
kind: CustomMetrics
sql: SELECT myField FROM SomeTable
Complex monitor
kind: CustomMetrics
sql: |
SELECT
myField = column1 * 100,
time = COMPUTETIME(column2, column3),
groupByField
FROM SomeTable
threshold:
kind: Dynamic
sensitivity: Low
bounds: Max
timeWindow:
offset: PT1H
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Additional details
- The SQL query should at least return one field as per the following:
- The metric value: This should be a numerical field.
- The metric timing (optional): this should be a date/timestamp field. This field is optional. If not present in the query, the metric will be calculated over time by doing snapshots at every run.
- The monitoring dimensions (optional): this can be a categorical column to allow for multi-dimensional monitoring. This field is optional. If added, this should be an existing field in the table/view schema and not be referred to with an alias.
Nulls
The monitor fails if the null values of a field meet a threshold criteria
Parameters
kind: "FieldNulls" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
threshold: <Threshold> # (REQUIRED) Threshold
valueMode: String # (optional - default Count) "Count" or "Percentage"
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor
Triggers when at least one Null in the field is detected
kind: FieldNulls
field: myField
Complex monitor
i.e. Percentage Threshold with daily incremental mode turned on.
kind: FieldNulls
field: ACCOUNT_NAME
threshold:
kind: Static
valueMode: Percentage
max: 0%
timeWindow:
field: auto
firstRun: P365D
frequency: P1D
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Duplicates
The monitor fails if the duplicate values of a field meet a threshold criteria
Parameters
kind: "FieldDuplicates" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
threshold: <Threshold> # (REQUIRED) Threshold
valueMode: String # (optional - default Count) "Count" or "Percentage"
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>
Examples
Simple monitor
kind: FieldDuplicates
field: myField
Complex monitor
kind: FieldDuplicates
field: ACCOUNT_NAME
threshold:
kind: Static
valueMode: Percentage
max: 0%
timeWindow:
field: auto
firstRun: P365D
frequency: P1D
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Distribution change
The monitor fails if the distribution of a given field has changed significantly compared to a fixed or rolling reference date.
Parameters
kind: "Distribution" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
threshold: ... # (optional - default *Dynamic*) Threshold to use for detection
# Can be either *Static* or *Dynamic*
# *Static* threshold:
kind: "Static"
max: Number # (REQUIRED) Percentage, between 0 and 100 of allowed distribution change
onAddedCategory: Boolean # (optional - default *true*) Fail if a new category appeared since the last snapshot
onRemovedCategory: Boolean # (optional - default *false*) Fail if a category disappeared since the last snapshot
# *Dynamic* threshold:
kind: "Dynamic"
sensitivity: # (optional - default *Normal*) Sensitivity for the detection
"Low" | "Normal" | "High"
reference: # (optional - default *Rolling*) Time Reference for distribution comparison
# Can be either Fixed or Rolling
# Fixed reference
kind: Fixed
timestamp: Date # (REQUIRED) Reference date to use for distribution
# Rolling reference
kind: Rolling
delay: Duration # (optional - default *1 day*) Delay between the reference snapshot and the new snapshot
# Allowed formats: PnD
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: # See common parameter elements - Time window
field: String
duration: Duration # Duration
# Allowed units: *Days*
offset: Duration # Offset
partition: ... # See common parameter elements - Partition
Examples
Simple monitor
Check the distribution compared to the previous day using dynamic threshold with default sensitivity.
kind: Distribution
field: myField
Complex with Static Threshold and Fixed reference
kind: Distribution
field: myField
threshold:
kind: Static
percentage: 0.5
onAddedCategory: false
onRemovedCategory: true
reference:
kind: Fixed
timestamp: 2023-07-09
timeWindow:
field: timeWindowField
duration: P365D
offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Complex with Dynamic Threshold and Rolling reference
kind: Distribution
field: myField
threshold:
kind: Dynamic
sensitivity: Low
reference:
kind: Rolling
delay: P4D
timeWindow:
field: timeWindowField
duration: P365D
offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Field in List
The monitor fails if the selected field has values that are not in the given list.
Parameters
kind: "FieldInList" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
values: # (REQUIRED) Allowed values
- String
- ...
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: FieldInList
field: myField
values:
- value1
- value2
- value3
Complex monitor
kind: FieldInList
field: myField
values:
- value1
- value2
- value3
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P365D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P365D
Field Format
The monitor fails if the selected field contains at least one row that does not match the format specified.
Parameters
kind: "FieldFormat" # (REQUIRED) Kind of monitor
field: String # (REQUIRED) Name of the field to monitor
format: # (REQUIRED) Expected format of the field values
kind: # (REQUIRED) Kind of format to validate.
"Email" | "Phone" | "UUID" | "Regex"
# For Regex kind
regex: String # (REQUIRED) Regex to use for validation
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition
Examples
Simple monitor
kind: FieldFormat
field: myField
format:
kind: Email
Complex monitor
kind: FieldFormat
field: myField
format:
kind: Regex
regex: ^[a-zA-Z0-9]+$
whereStatement: myColumn != ''
groupBy: groupByField
timeWindow:
field: timeWindowField
duration: P30D
partition:
field: partitionTimeField
kind: TimeUnitColumn
interval: P30D
Additional details
- MS SQL is not supported for Regex format.
SQL
The monitor fails if the row count returned by the monitor query is >0.
Parameters
kind: "Sql" # (REQUIRED) Kind of monitor
sql: String # (REQUIRED) SQL query to execute
partition: Partition
Example
kind: Sql
sql: SELECT * WHERE COMPLEX_CALCULATION(myColumn) = 42
Updated 4 months ago