Parameters list and example for every monitor type WIP

Volume

The monitor fails if the number of data rows ingested behaves differently than in the past.

Parameters

kind: "Completeness"    # (REQUIRED) Kind of monitor
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: Completeness

Complex monitor

kind: Completeness
threshold:
  sensitivity: Low
  bounds: MinAndMax
whereStatement: metricColumn > 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P30D
  offset: PT1H
  deltaQuerying: P2D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Duplicates

The monitor fails if the duplicate rate at a row level behaves differently than it did in the past.

Parameters

kind: "Duplicates"      # (REQUIRED) Kind of monitor
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: Duplicates

Complex monitor

kind: Duplicates
threshold:
  sensitivity: Low
  bounds: MinAndMax
whereStatement: metricColumn > 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P30D
  offset: PT1H
  deltaQuerying: P2D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Freshness (Update time gap)

Available on the following data sources: BigQuery, Databricks, MySQL, Oracle, Snowflake.

The monitor fails when the duration since the last update deviates from historical norms.

Parameters

kind: MetadataFreshness       # (REQUIRED) Kind of monitor

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: MetadataFreshness

Complex monitor

kind: MetadataFreshness
threshold:
  kind: Dynamic
  sensitivity: Low

Freshness

The monitor fails if the ingestion frequency of new rows behaves differently than it did in the past.

Parameters

kind: "Freshness"       # (REQUIRED) Kind of monitor
threshold:              # (optional - default Static) Threshold to use for detection
  # Can be either Static or Dynamic
  # Static threshold:
  kind: "Static"
  # Dynamic threshold:
  kind: "Dynamic"
  sensitivity:          # (optional - default *Normal*) Sensitivity for the detection 
    "Low" | "Normal" | "High"
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition

Examples

Simple monitor

The monitor can be used without any specific option by using the default sensitivity.

kind: Freshness

Complex monitor

kind: Freshness
threshold:
  kind: Dynamic
  sensitivity: Low
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
  offset: PT3H
  deltaQuerying: PT3D
  frequency: P1D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Schema Change

The monitor fails if the dataset's schema has changed since the previous run.

Parameters

kind: SchemaChange      # (REQUIRED) Kind of monitor

Static Metrics

The monitor fails if the field’s aggregation is outside of a given range.

Parameters

kind: "StaticMetrics"      # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
aggregation:               # (REQUIRED) Aggregation to apply
  kind:                    # (REQUIRED) Kind of aggregation to use.
    "Average" | "DistinctCount" | "Range" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
  # For Quantile aggregation
  quantile: Number         # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold:                 # (REQUIRED) Threshold
  isMinInclusive: Boolean  # (optional - default *true*) Inclusive minimum
  min: Number              # (optional - default *null*) Expected minimum for the aggregated value
                           # null to disable detection on lower bound
  isMaxInclusive: Boolean  # (optional - default *true*) Inclusive maximum
  max: Number              # (optional - default *null*) Expected maximum for the aggregated value
                           # *null* to disable detection on upper bound
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

A monitor validating that the average of the field myField is always above 1000.

kind: StaticMetrics
field: myField
aggregation:
  kind: Average
threshold:
  min: 1000

Complex monitor

kind: StaticMetrics
field: myField
aggregation:
  kind: Quantile
  quantile: 0.5
threshold:
  isMinInclusive: false
  min: 1000
  isMaxInclusive: false
  max: 2000
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Additional details

  • One of threshold.min or threshold.max must be set to a non-nul value
  • threshold.min must be below than threshold.max if both are set
  • Sum aggregation is not supported for Firebolt datasource
  • Quantile aggregation is not supported for Hive datasource

Dynamic Metrics

The monitor fails if the selected field values (or their statistical transformations) behave differently than they did in the past.

Parameters

kind: "DynamicMetrics"  # (REQUIRED) Kind of monitor
field: String           # (REQUIRED) Name of the field to monitor
aggregation:            # (REQUIRED) Aggregation to apply
  kind:                 # (REQUIRED) Kind of aggregation to use.
    "Average" | "NormalizedAverage" | "DistinctCount" | "Min" | "Max" | "Quantile" | "Sum" | "StandardDeviation" | "Variance"
  # For Quantile aggregation
  quantile: Number      # (REQUIRED) Quantile. For instance, 0.5 for the median.
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition

Examples

Simple monitor

A monitor validating that distinct count of myField behave similarly than in the past (detection is done with default sensitivity).

kind: StaticMetrics
field: myField
aggregation:
  kind: DistinctCount

Complex monitor

kind: DynamicMetrics
field: myField
aggregation:
  kind: NormalizedAverage
threshold:
  sensitivity: Low
  bounds: Max
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: auto
  duration: P365D
  offset: PT3H
  deltaQuerying: P2D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Custom Metrics

Description

Parameters

kind: "CustomMetrics"   # (REQUIRED) Kind of monitor
sql: String             # (REQUIRED) SQL of the monitor to execute (see Additional details below)
threshold: DynamicThreshold
groupBy: GroupBy
timeWindow:
  offset: Duration      # (optional - default *null*) Delay the query of data by this duration
                        # *null* to disable using an offset
                        # Allowed duration units: Days, Hours
partition: Partition

Examples

Simple monitor

A monitor validating that distinct count of myField behave similarly than in the past (detection is done with default sensitivity).

kind: CustomMetrics
sql: SELECT myField FROM SomeTable

Complex monitor

kind: CustomMetrics
sql: |
  SELECT 
    myField = column1 * 100, 
    time = COMPUTETIME(column2, column3), 
    groupByField 
  FROM SomeTable
threshold:
  sensitivity: Low
  bounds: Max
groupBy: groupByField
timeWindow:
  offset: PT1H
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Additional details

  • The SQL query should at least return one field as per the following:
    • The metric value: This should be a numerical field.
    • The metric timing (optional): this should be a date/timestamp field. This field is optional. If not present in the query, the metric will be calculated over time by doing snapshots at every run.
    • The monitoring dimensions (optional): this can be a categorical column to allow for multi-dimensional monitoring. This field is optional. If added, this should be an existing field in the table/view schema and not be referred to with an alias.

Static Field Profiling

The monitor fails if the null or duplicate value rate of the selected field is higher than a given threshold.

Parameters

kind: "StaticFieldProfiling"  # (REQUIRED) Kind of monitor
field: String                 # (REQUIRED) Name of the field to monitor
profiling:                    # (REQUIRED) Profiling to execute
  kind:                       # (REQUIRED) Kind of profiling to use.
    "NullCount" | "NullPercentage" | "DuplicateCount" | "DuplicatePercentage"
  nullValues:                 # (OPTIONAL - default Null) Null values to check in case of NullCount or NullPercentage field profiling
    "Null" | "NullAndEmpty" | "NullEmptyAndWhitespaces"
threshold:                    # (REQUIRED) Threshold
  max: Number                 # (optional - default 0) Expected maximum occcurrences or percentage of null or duplicates
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: StaticFieldProfiling
field: myField
profiling:
  kind: NullCount

Complex monitor

kind: StaticFieldProfiling
field: myField
profiling:
  kind: NullPercentage
  nullValues: NullAndEmpty
threshold:
  max: 32.2
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Additional details

  • DuplicateCount is not supported yet
  • Only threshold.max = 0 is supported currently for NullCount
  • threshold.max must be between 0 and 100 for NullPercentage or DuplicatePercentage
  • threshold.max must be greater or equal to 0 in all cases

Dynamic Field Profiling

The monitor fails if the null or duplicate value rate of the selected field behaves differently than they did in the past.

Parameters

kind: "DynamicFieldProfiling"  # (REQUIRED) Kind of monitor
field: String                  # (REQUIRED) Name of the field to monitor
fieldProfiling:                # (REQUIRED) Profiling to execute
  kind:                        # (REQUIRED) Kind of profiling to use.
    "NullCount" | "NullPercentage" | "DuplicateCount" | "DuplicatePercentage"
  nullValues:                 # (OPTIONAL - default Null) Null values to check in case of NullCount or NullPercentage field profiling
    "Null" | "NullAndEmpty" | "NullEmptyAndWhitespaces"
threshold: DynamicThreshold
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindowWithOffsetAndDelta
partition: Partition

Examples

Simple monitor

kind: DynamicFieldProfiling
field: myField
fieldProfiling:
  kind: NullCount

Complex monitor

kind: DynamicFieldProfiling
field: myField
fieldProfiling:
  kind: DuplicateCount
threshold:
  sensitivity: Low
  bounds: Max
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
  deltaQuerying: P2D
  offset: PT3H
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Distribution change

The monitor fails if the distribution of a given field has changed significantly compared to a fixed or rolling reference date.

Parameters

kind: "Distribution"          # (REQUIRED) Kind of monitor
field: String                 # (REQUIRED) Name of the field to monitor
threshold: ...                # (optional - default *Dynamic*) Threshold to use for detection
  # Can be either *Static* or *Dynamic*
  # *Static* threshold:
  kind: "Static"
  max: Number                 # (REQUIRED) Percentage, between 0 and 100 of allowed distribution change
  onAddedCategory: Boolean    # (optional - default *true*) Fail if a new category appeared since the last snapshot
  onRemovedCategory: Boolean  # (optional - default *false*) Fail if a category disappeared since the last snapshot
  # *Dynamic* threshold:
  kind: "Dynamic"
  sensitivity:                # (optional - default *Normal*) Sensitivity for the detection 
    "Low" | "Normal" | "High"
reference:                    # (optional - default *Rolling*) Time Reference for distribution comparison
  # Can be either Fixed or Rolling
  # Fixed reference
  kind: Fixed
  timestamp: Date             # (REQUIRED) Reference date to use for distribution
  # Rolling reference
  kind: Rolling
  delay: Duration             # (optional - default *1 day*) Delay between the reference snapshot and the new snapshot
                              # Allowed formats: PnD
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow:                   # See common parameter elements - Time window
  field: String
  duration: Duration          # Duration
                              # Allowed units: *Days*
  offset: Duration            # Offset
partition: ...                # See common parameter elements - Partition

Examples

Simple monitor

Check the distribution compared to the previous day using dynamic threshold with default sensitivity.

kind: Distribution
field: myField

Complex with Static Threshold and Fixed reference

kind: Distribution
field: myField
threshold:
  kind: Static
  percentage: 0.5
  onAddedCategory: false
  onRemovedCategory: true
reference:
  kind: Fixed
  timestamp: 2023-07-09
timeWindow:
  field: timeWindowField
  duration: P365D
  offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Complex with Dynamic Threshold and Rolling reference

kind: Distribution
field: myField
threshold:
  kind: Dynamic
  sensitivity: Low
reference:
  kind: Rolling
  delay: P4D
timeWindow:
  field: timeWindowField
  duration: P365D
  offset: PT3H
whereStatement: myColumn = 5
groupBy: groupByField
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Field in List

The monitor fails if the selected field has values that are not in the given list.

Parameters

kind: "FieldInList"        # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
values:                    # (REQUIRED) Allowed values
  - String
  - ...
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: FieldInList
field: myField
values:
  - value1
  - value2
  - value3

Complex monitor

kind: FieldInList
field: myField
values:
  - value1
  - value2
  - value3
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P365D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

FieldUniqueness

Parameters

kind: "FieldUniqueness"             # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: FieldUniqueness
field: myField

Complex monitor

kind: FieldUniqueness
field: myField
whereStatement: myColumn = 5
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: PT10M
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P365D

Field Format

The monitor fails if the selected field contains at least one row that does not match the format specified.

Parameters

kind: "FieldFormat"        # (REQUIRED) Kind of monitor
field: String              # (REQUIRED) Name of the field to monitor
format:                    # (REQUIRED) Expected format of the field values
  kind:                    # (REQUIRED) Kind of format to validate.
    "Email" | "Phone" | "UUID" | "Regex"
  # For Regex kind
  regex: String            # (REQUIRED) Regex to use for validation
whereStatement: WhereStatement
groupBy: GroupBy
timeWindow: TimeWindow
partition: Partition

Examples

Simple monitor

kind: FieldFormat
field: myField
format:
  kind: Email

Complex monitor

kind: FieldFormat
field: myField
format:
  kind: Regex
  regex: ^[a-zA-Z0-9]+$
whereStatement: myColumn != ''
groupBy: groupByField
timeWindow:
  field: timeWindowField
  duration: P30D
partition:
  field: partitionTimeField
  kind: TimeUnitColumn
  interval: P30D

Additional details

  • MS SQL is not supported for Regex format.

SQL

The monitor fails if the row count returned by the monitor query is >0.

Parameters

kind: "Sql"        # (REQUIRED) Kind of monitor
sql: String        # (REQUIRED) SQL query to execute
partition: Partition

Example

kind: Sql
sql: SELECT * WHERE COMPLEX_CALCULATION(myColumn) = 42