Monitor schema

Schema

kind: Monitor
version: 1
id: UUID                    # (REQUIRED) ID of the monitor
name: String                # (REQUIRED) Name of the monitor
description: String         # (optional - default null) Description for the monitor
tags:
  - id: UUID                # (optional - default null) ID of the tag
    name: String            # (optional - default null) Name of the tag
    kind:                   # (optional - default null) Type of tag
      "Tag" | "Classification"
terms:
  - id: UUID                # (optional - default null) ID of the term
    name: String            # (optional - default null) Name of the term
schedule: String            # (optional - default null) Schedule for monitor execution. null for no schedule 
                            # Defined via a crontab CRON expression  
incident:                   # (REQUIRED) 
  severity:                 # (REQUIRED) Severity of the incident
    "Low" | "Moderate" | "High" | "Critical"
  message: String           # (optional - default null) Custom message to add to the incident and notifications
  createOnFailure: Boolean  # (optional - default true) Whether or not to create an incident on failure
notifications: 
  - kind:                   # (REQUIRED) Kind of notification to send
      "Slack" | "Email" | "MicrosoftTeams"
    id: UUID                # (optional - default null) ID of the notification 
    name: String            # (optional - default null) Name of the Slack Channel, Email address or Microsoft Teams channel
datasets:                   # (REQUIRED) List of datasets to monitor. Most monitors can only have a single dataset
  - id: UUID                # (optional - default null) ID of the dataset
  - name: String            # (optional - default null) Name of the dataset
  - datasource:
    - id: UUID              # (optional - default null) ID of the dataset source
    - name: String          # (optional - default null) Name of the dataset
parameters: ...          # (REQUIRED) See Parameters

📘

Referencing a dataset

Users can use the name of the dataset in case there are no conflicts identifying the dataset. Users can rely on the ID of the dataset, the data source name, or the data source ID to prevent any conflict.

Referencing other Sifflet objects

The monitor definition references many other Sifflet objects such as: tags, classification tags, terms, datasets, slack channels, teams channels or emails.

To do so, you can use the ID of the object. This will ensure that applying your monitor will link to the correct object.

tags:
  - id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
  - id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b

However, this process of retrieving and using these IDs is very cumbersome. So you can also use more natural identifier like the name of the object.

tags:
  - name: Production
  - name: Important Asset

Referencing by name will work if there is a matching object and if there is no ambiguity (there are not 2 objects with the same name).

In case of ambiguity with the name, you need to use another property to distinguish the different objects.

For instance, for the tags, it can be kind property (if a classification tag and a classical tag are both using the name Production).

tags:
  - name: Production
    kind: Classification
  - name: Important Asset

Finally, using the id will always guarantee non-ambiguious reference.

You can also use the id with other properties to provide more context:

tags:
  - id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
    name: Production
    kind: Classification
  - id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
    name: Important Asset

In this case, if the object is renamed or changes kind, the monitor can still be applied. However a Warning will be generated with the mismatch.

For tags

For tags, you can using the id, the name and the kind properties.

For instance:

tags:
  - name: To fix
  - id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
  - id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
    name: Production
    kind: Classification
  - id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
    name: Important Asset

For terms

For terms, you can using the id and the name properties.

For instance:

terms:
  - name: ROI
  - id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
  - id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
    name: Marketing

For notifications

For notifications, you can using the id, the kind and the name properties.

For instance:

notifications:
- kind: Email
  name: [email protected]
- kind: Slack
  name: Alerts
  id: dd6f06ec-fab1-4a87-9544-b113f496d61d
- kind: Slack
  id: 8c2dbfe5-0911-4586-b937-c5f48b7c21d9

For datasets

For datasets, you can use the id, the name the datasource.id and datasource.name properties.

For instance:

datasets:
- name: Sales
- name: Prices
  datasource: 
    name: BigQuery Data warehouse
- id: 70217023-1a89-4c0b-9b6a-c85192c918b3
- name: company_employees
  datasource:
    id: ce3e9dd9-b007-42b0-b884-8c419f7f6daa

Retrieving IDs from the UI

To get the ID of a dataset, go to the dataset page and click on Copy Data Asset ID button of :

You can get a data source ID from the data source page and when clicking on Copy Source ID

Examples

Simple example

kind: Monitor
version: 1
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Uniqueness on customerEmail
schedule: "@daily"
incident:
  severity: Low
datasets:
- name: sales
  datasource:
    name: mySqlDatabase
parameters:
  kind: Unique
  field: customerEmail

Complex example

kind: Monitor
version: 1
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Average Monitor on price
description: The monitor fails if the field's average is outside of a given range.
tags:
- name: Low Cardinality
  kind: Classification
- name: Production
schedule: 5 4 * * *
incident:
  severity: Low
  message: Some message
notifications:
  - kind: Slack
    name: team-data-science
  - kind: Email
    name: [email protected]
  - kind: Email
    id: 41df515f-e5b6-4b4e-b684-d9108ac563bf
    name: [email protected]
datasets:
- id: 5a93a977-df0b-4f17-b4dd-d12ebe21549d
parameters:
  kind: StaticMetrics
  field: price
  aggregation:
    kind: Average
  threshold:
    min: 1000.0
    isMinInclusive: false
  groupBy:
    field: channel
  timeWindow:
    field: time
    duration: P100D

Common types

Simple Types

Duration

Uses the ISO 8601 format. It only accept a single positive date/time element.

So only the following formats are accepted: PnY (Year unit), PnM (Month unit), PnW (Week unit), PnD (Day unit), PTnH (Hour unit), PTnM (Minute unit), PTnS (Second unit), where n is a positive number or 0.

Examples:

  • P1W represents 1 week
  • P2D represents 2 days
  • PT24H represents 24 hours
  • P0D represents 0 days

Note: In most cases, only a subset of units supported (for instance, only Year, Months and Days, but not the other ones). These cases are described in the schema of each monitor.

Date

Uses the ISO 8601 format: YYYY-MM-DD

Examples:

  • 2023-01-03 represents January 3rd, 2023
  • 2023-12-31 represents December 31st, 2023

UUID

Identifier using the format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where x is an hexadecimal character

Examples:

  • 20be41bc-9f0e-4a2f-b7d7-e737051589cd
  • 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
  • ecd00144-153e-479c-b51b-1273271e33ed

Monitor specific types

DynamicThreshold

sensitivity:    # (optional - default *Normal*) Sensitivity of the ML model
  "Low" | "Normal" | "High"
bounds:         # (optional - default *LowerAndUpper*) Indicates on which bound to alert
                # *Lower* - Alert only if the value is below the expected range
                # *Upper* - Alert only if the value is above the expected range
                # *LowerAndUpper* - Alert if the value is below or above the expected range
  "Lower" | "Upper" | "LowerAndUpper"

WhereStatement

(optional - default null) SQL boolean expression used to filter out data from input dataset.

null to disable this filtering.

Examples:

  • myColumn > 5
  • category LIKE 'production-%'
  • (myColumn > 5 AND myColumn < 25) OR COMPLEX_FUNCTION(category)

GroupBy

(optional - default null) Field to use for alerting per category.

null to disable categorical analysis.

Examples:

  • myColumn

TimeWindow

(optional - default null) Time Window parameters for the query

null to disable using time window

field: String       # (REQUIRED) Field to use for time window
duration: Duration  # (REQUIRED) Length of the time window to use
                    # Allowed duration units: Days, Hours, Minutes

TimeWindowWithOffsetAndDelta

Same as TimeWindow, with the following properties added:

# Same as TimeWindowWithOffset, with the following properties added
frequency: Duration           # (REQUIRED) Aggregation size
                              # Can be 1 month, 1 week, 1 day, 1 hour, 30 minutes, 20 minutes, 15 minutes, 10 minutes
                              # *null* to auto-select aggregation between 1 day and 1 hour
offset: Duration              # (optional - default *null*) Delay the query of data by this duration
                              # *null* to disable using an offset
                              # Allowed duration units: Days, Hours
disableDeltaQuerying: Boolean # (optional - default false) Disable the delta querying feature
deltaQuerying: Duration       # (optional - default 1 day) Length of data to re-query at each run
continuousScan: Boolean       # (optional - default false) Enable continuous scan

Partition

(optional - default null) If input dataset is partitioned, filter to use on the partitioned column.

kind:                # (REQUIRED) Kind of partition to use
  "IngestionTime" | "TimeUnitColumn" | "IntegerRange"
# For *IngestionTime* kind, to use 
interval: Duration   # (REQUIRED)
# For *TimeUnitColumn* kind
field: String        # (REQUIRED) 
interval: Duration   # (REQUIRED) 
# For IntegerRange kind
field: String        # (REQUIRED) 
min: Integer         # (REQUIRED)
max: Integer         # (REQUIRED)

The partition kind and field must match the partitioning of the input dataset.

kind: Monitor
version: 1
id: 9699f47e-bb01-4514-0001-451424fe4179  # (REQUIRED) ID of the monitor
name: My monitor name                     # (REQUIRED) Name of the monitor
description: First monitor using CLI      # Description of the monitor
tags:                                     # Tags to apply to the monitor
- name: Production
- name: IMPORTANT
terms:                                    # Business Terms to apply to the monitor
- name: ROI
schedule: 31 * * * *                      # Schedule for the monitor (using CRON format)
incident:
  severity: Low														# Severity of the incident to create in case of alert
  message: ""															# Message to add to the incident/alert notification
datasets:																	# (REQUIRED) Datasets to monitor
- id: 6d8f779d-f6ac-41f7-be74-7237caa76967
parameters:																# (REQUIRED) Parameters of the monitor
  kind: StaticMetrics											# (REQUIRED) Kind of monitor to use
  field: price
  aggregation:
    kind: Average
  threshold:
    min: 1000
    isMinInclusive: false

📘

Parameters list and example for every monitor type

You can find the exact list of parameters for every monitor type with examples in this section of the documentation