Parameters list and example for every monitor type

Volume

The monitor fails if the number of data rows ingested behaves differently than in the past.

Parameters

kind: Volume    # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity. This will look at the total number of rows in the table each run.

kind: Volume

Complex monitor

Incremental Volume that checks the daily number of rows, with 365 days of history for the first run with an offset of 1 day to only include completed days.

kind: Volume
threshold:
  kind: Dynamic
  sensitivity: 25
  bounds: Min
whereStatement: metricColumn > 5
groupBy:
  field: Category
timeWindow:
  field: auto
  firstRun: P365D
  offset: P1D
  frequency: P1D

Row-Level(Perfect) Duplicates

The monitor fails if the duplicate rate at a row level behaves differently than it did in the past.

Parameters

kind: "Duplicates"      # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: RowDuplicates

Complex monitor

Daily Incremental Monitor that alerts whenever there are at least 1 Row Duplicates, with 365 days of history for the first run , checks the previous 2 days via Lookback and applies partitioning (BigQuery Only)

kind: RowDuplicates
threshold:
  kind: Static
  max: 0
  isMaxInclusive: false
timeWindow:
  field: auto
  firstRun: P365D
  frequency: P1D
  deltaQuerying: P2D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Freshness (Update time gap)

Available on the following data sources: BigQuery, Databricks, MySQL, Oracle, Snowflake.

The monitor fails when the duration since the last update deviates from historical norms.

Parameters

kind: MetadataFreshness       # (REQUIRED) Kind of monitor

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: MetadataFreshness

Freshness

The monitor fails if the ingestion frequency of new rows behaves differently than it did in the past.

Parameters

kind: Freshness       # (REQUIRED) Kind of monitor
threshold: <Threshold>
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: Freshness

Complex monitor

kind: Freshness
threshold:
  kind: Static
timeWindow:
  field: auto
  firstRun: P365D
  frequency: P1D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Schema Change

The monitor fails if the dataset's schema has changed since the previous run.

Parameters

kind: SchemaChange      # (REQUIRED) Kind of monitor

Metrics

The monitor fails if the field’s aggregation is outside of a given range.

Parameters

kind: Metrics      # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
aggregation:               # (REQUIRED) Aggregation to apply
  kind:                    # (REQUIRED) Kind of aggregation to use.
    "Average" | "DistinctCount" | "Range" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
  # For Quantile aggregation
  quantile: Number         # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold: <Threshold>                 # (REQUIRED) Threshold
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor with Static Threshold

A monitor validating that the average of the field myField is always above 1000.

kind: Metrics
field: myField
aggregation:
  kind: Average
threshold:
	kind: Static
  min: 1000

Complex monitor

kind: Metrics
field: myField
aggregation:
  kind: Quantile
  quantile: 0.5
threshold:
	kind: Static
  min: 1000
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  firstRun: P365D
  frequency: P1D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Custom Metrics

A monitor allowing incremental and dynamic monitoring of custom SQL queries

Parameters

kind: "CustomMetrics"   # (REQUIRED) Kind of monitor
sql: String             # (REQUIRED) SQL of the monitor to execute (see Additional details below)
threshold: <Threshold>
groupBy: <GroupBy>
timeWindow:
  offset: <Duration>    # (optional - default *null*) Delay the query of data by this duration
                        # *null* to disable using an offset
                        # Allowed duration units: Days, Hours
partition: <Partition>

Examples

Simple monitor

A monitor validating that distinct count of myField behave similarly than in the past (detection is done with default sensitivity).

kind: CustomMetrics
sql: SELECT myField FROM SomeTable

Complex monitor

kind: CustomMetrics
sql: |
  SELECT 
    myField = column1 * 100, 
    time = COMPUTETIME(column2, column3), 
    groupByField 
  FROM SomeTable
threshold:
	kind: Dynamic
  sensitivity: Low
  bounds: Max
timeWindow:
  offset: PT1H
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Additional details

  • The SQL query should at least return one field as per the following:
    • The metric value: This should be a numerical field.
    • The metric timing (optional): this should be a date/timestamp field. This field is optional. If not present in the query, the metric will be calculated over time by doing snapshots at every run.
    • The monitoring dimensions (optional): this can be a categorical column to allow for multi-dimensional monitoring. This field is optional. If added, this should be an existing field in the table/view schema and not be referred to with an alias.

Nulls

The monitor fails if the null values of a field meet a threshold criteria

Parameters

kind: "FieldNulls"  # (REQUIRED) Kind of monitor
field: String                 # (REQUIRED) Name of the field to monitor
threshold: <Threshold>                  # (REQUIRED) Threshold
  valueMode: String                 # (optional - default Count) "Count" or "Percentage"
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor

Triggers when at least one Null in the field is detected

kind: FieldNulls
field: myField

Complex monitor

i.e. Percentage Threshold with daily incremental mode turned on.

kind: FieldNulls
field: ACCOUNT_NAME
threshold:
  kind: Static
  valueMode: Percentage
  max: 0%
timeWindow:
  field: auto
  firstRun: P365D
  frequency: P1D
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Duplicates

The monitor fails if the duplicate values of a field meet a threshold criteria

Parameters

kind: "FieldDuplicates"  # (REQUIRED) Kind of monitor
field: String                  # (REQUIRED) Name of the field to monitor
threshold: <Threshold>                  # (REQUIRED) Threshold
  valueMode: String                 # (optional - default Count) "Count" or "Percentage"
whereStatement: <WhereStatement>
groupBy: <GroupBy>
timeWindow: <TimeWindow>
partition: <Partition>

Examples

Simple monitor

kind: FieldDuplicates
field: myField

Complex monitor

kind: FieldDuplicates
field: ACCOUNT_NAME
threshold:
  kind: Static
  valueMode: Percentage
  max: 0%
timeWindow:
  field: auto
  firstRun: P365D
  frequency: P1D
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Distribution change

The monitor fails if the distribution of a given field has changed significantly compared to a fixed or rolling reference date.

Parameters

kind: "Distribution"          # (REQUIRED) Kind of monitor
field: String                 # (REQUIRED) Name of the field to monitor
threshold: ...                # (optional - default *Dynamic*) Threshold to use for detection
  # Can be either *Static* or *Dynamic*
  # *Static* threshold:
  kind: "Static"
  max: Number                 # (REQUIRED) Percentage, between 0 and 100 of allowed distribution change
  onAddedCategory: Boolean    # (optional - default *true*) Fail if a new category appeared since the last snapshot
  onRemovedCategory: Boolean  # (optional - default *false*) Fail if a category disappeared since the last snapshot
  # *Dynamic* threshold:
  kind: "Dynamic"
  sensitivity:                # (optional - default *Normal*) Sensitivity for the detection 
    "Low" | "Normal" | "High"
reference:                    # (optional - default *Rolling*) Time Reference for distribution comparison
  # Can be either Fixed or Rolling
  # Fixed reference
  kind: Fixed
  timestamp: Date             # (REQUIRED) Reference date to use for distribution
  # Rolling reference
  kind: Rolling
  delay: Duration             # (optional - default *1 day*) Delay between the reference snapshot and the new snapshot
                              # Allowed formats: PnD
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow:                   # See common parameter elements - Time window
  field: String
  duration: Duration          # Duration
                              # Allowed units: *Days*
  offset: Duration            # Offset
partition: ...                # See common parameter elements - Partition

Examples

Simple monitor

Check the distribution compared to the previous day using dynamic threshold with default sensitivity.

kind: Distribution
field: myField

Complex with Static Threshold and Fixed reference

kind: Distribution
field: myField
threshold:
  kind: Static
  percentage: 0.5
  onAddedCategory: false
  onRemovedCategory: true
reference:
  kind: Fixed
  timestamp: 2023-07-09
timeWindow:
  field: timeWindowField
  duration: P365D
  offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Complex with Dynamic Threshold and Rolling reference

kind: Distribution
field: myField
threshold:
  kind: Dynamic
  sensitivity: Low
reference:
  kind: Rolling
  delay: P4D
timeWindow:
  field: timeWindowField
  duration: P365D
  offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Field in List

The monitor fails if the selected field has values that are not in the given list.

Parameters

kind: "FieldInList"        # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
values:                    # (REQUIRED) Allowed values
  - String
  - ...
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: FieldInList
field: myField
values:
  - value1
  - value2
  - value3

Complex monitor

kind: FieldInList
field: myField
values:
  - value1
  - value2
  - value3
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Field Format

The monitor fails if the selected field contains at least one row that does not match the format specified.

Parameters

kind: "FieldFormat"        # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
format:                    # (REQUIRED) Expected format of the field values
  kind:                    # (REQUIRED) Kind of format to validate.
    "Email" | "Phone" | "UUID" | "Regex"
  # For Regex kind
  regex: String            # (REQUIRED) Regex to use for validation
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: FieldFormat
field: myField
format:
  kind: Email

Complex monitor

kind: FieldFormat
field: myField
format:
  kind: Regex
  regex: ^[a-zA-Z0-9]+$
whereStatement: myColumn != ''
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P30D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Additional details

  • MS SQL is not supported for Regex format.

SQL

The monitor fails if the row count returned by the monitor query is >0.

Parameters

kind: "Sql"        # (REQUIRED) Kind of monitor
sql: String        # (REQUIRED) SQL query to execute
partition: Partition

Example

kind: Sql
sql: SELECT * WHERE COMPLEX_CALCULATION(myColumn) = 42