📘
Use Monitor Version 2
Version 1 is now deprecated

Schema

kind: Monitor
version: 2
id: UUID                    # (friendlyId or id REQUIRED) ID of the monitor
friendlyId: String          # (friendlyId or id REQUIRED) friendly string to identify the monitor, unique per dataset
extends: 									  # (optional - default []) List of templates to include in the definition of the monitor. See Templates
- String
name: String                # (REQUIRED) Name of the monitor
description: String         # (optional - default null) Description for the monitor
tags:
- id: UUID                  # (optional - default null) ID of the tag
  name: String              # (optional - default null) Name of the tag
  kind:                     # (optional - default null) Type of tag
    "Tag" | "Classification"
terms:
- id: UUID                  # (optional - default null) ID of the term
  name: String              # (optional - default null) Name of the term
schedule: String            # (optional - default null) Schedule for monitor execution. null for no schedule 
                            # Defined as @hourly/@daily/@weekly/@monthly/@yearly (default to midnight UTC) or CRON expression.
scheduleTimezone: String    # (optional - default null) Schedule Time Zone, i.e. Europe/Paris
incident:                   # (REQUIRED) 
  severity:                 # (REQUIRED) Severity of the incident
    "Low" | "Moderate" | "High" | "Critical"
  message: String           # (optional - default null) Custom message to add to the incident and notifications
  createOnFailure: Boolean  # (optional - default true) Whether or not to create an incident on failure
notifications: 
- kind:                     # (REQUIRED) Kind of notification to send
    "Slack" | "Email" | "MicrosoftTeams" | "Jira"
  id: UUID                  # (optional - default null) ID of the notification 
  name: String              # (optional - default null) Name of the Slack Channel, Email address or Microsoft Teams channel
  projectKey: String        # (REQUIRED) Key of the Jira project
  issueTypeId: Integer      # (REQUIRED) ID of the Jira issue type
datasets:                   # (REQUIRED) List of datasets to monitor. Most monitors can only have a single dataset
- id: UUID                  # (optional - default null) ID of the dataset
  name: String              # (optional - default null) Name of the dataset
  datasource:
    id: UUID                # (optional - default null) ID of the dataset source
    name: String            # (optional - default null) Name of the dataset
  uri: String               # (optional - default null) URI of the dataset
parameters: ...             # (REQUIRED) See Parameters

📘
Referencing a dataset
Users can use the name of the dataset in case there are no conflicts identifying the dataset. Users can rely on the ID of the dataset, the data source name, or the data source ID to prevent any conflict.

Identifying Monitors

There are two options to identify a monitor and to ensure any changes to the monitor will not overwrite it and keep all run history.

With `id`

kind: Monitor  
id: 7edf1177-1a3c-4d71-b85f-e38b773735b4

id needs to be a completely unique UUID

With `friendlyId`

Coming up with UUIDs is sometimes not appropriate, friendlyId allows for an alternative. friendlyId Needs to be unique per dataset, this means a dataset cannot have two monitors with the same friendlyId.

kind: Monitor
version: 2
friendlyId: customerEmailUnique
datasets:
- name: sales
  datasource:
    name: mySqlDatabase
...

The above monitor has a friendlyId customerEmailUnique , only one of those monitors can be added to the sales table in our mysqlDatabase.

Referencing other Sifflet objects

The monitor definition references many other Sifflet objects such as: tags, classification tags, terms, datasets, slack channels, teams channels or emails.

To do so, you can use the ID of the object. This will ensure that applying your monitor will link to the correct object.

tags:
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b

However, this process of retrieving and using these IDs is very cumbersome. So you can also use more natural identifier like the name of the object.

tags:
- name: Production
- name: Important Asset

Referencing by name will work if there is a matching object and if there is no ambiguity (there are not 2 objects with the same name).

In case of ambiguity with the name, you need to use another property to distinguish the different objects.

For instance, for the tags, it can be kind property (if a classification tag and a classical tag are both using the name Production).

tags:
- name: Production
  kind: Classification
- name: Important Asset

Finally, using the id will always guarantee non-ambiguious reference.

You can also use the id with other properties to provide more context:

tags:
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
  name: Production
  kind: Classification
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
  name: Important Asset

In this case, if the object is renamed or changes kind, the monitor can still be applied. However a Warning will be generated with the mismatch.

For tags

For tags, you can use the id, the name and the kind properties.

For instance:

tags:
- name: To fix
- id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
  name: Production
  kind: Classification
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
  name: Important Asset

For terms

For terms, you can use the id and the name properties.

For instance:

terms:
- name: ROI
- id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
  name: Marketing

For notifications

Email, Slack, Microsoft Teams

For Email, Slack, and Microsoft Teams notifications, you can use the id, the kind and the name properties.

For instance:

notifications:
- kind: Email
  name: [email protected]
- kind: Slack
  name: Alerts
  id: dd6f06ec-fab1-4a87-9544-b113f496d61d
- kind: Slack
  id: 8c2dbfe5-0911-4586-b937-c5f48b7c21d9

Jira

For Jira notifications, you can use the kind, the projectKey, and the issueTypeId properties.

For instance:

notifications:
- kind: Jira
  projectKey: DATAENG
  issueTypeId: 10022

Finding the Issue Type ID

The ID of a Jira issue type can be found in the URL when editing your issue type. The issue type ID should be located right after the issuetypes/ string in the URL.

More details on how to find the issue type ID in Jira Cloud can be found here.

For datasets

For datasets, you can use the id, the name the datasource.id and datasource.name or uri properties.

For instance:

datasets:
- name: Sales
- name: Prices
  datasource: 
    name: BigQuery Data warehouse
- id: 70217023-1a89-4c0b-9b6a-c85192c918b3
- name: company_employees
  datasource:
    id: ce3e9dd9-b007-42b0-b884-8c419f7f6daa
- uri: snowflake://xyz12345.eu-central-1/DATABASE.SCHEMA.TABLE

With `uri`

URIs helps referencing assets in a unique way without depending on any Sifflet names or IDs. Find out more about URIs

- uri: bigquery:sifflet-demo-project.sandbox_dataset.cbsa_2008_1yr
- uri: snowflake://xyz12345.eu-central-1/DATABASE.SCHEMA.TABLE

Retrieving IDs from the UI

To get the ID of a dataset, go to the dataset page and click on Copy Data Asset ID button of :

You can get a data source ID from the data source page and when clicking on Copy Source ID

📘
URIs
You can also retrieve URIs from the Catalog's Asset Page.

Examples

Simple example

kind: Monitor
version: 2
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Uniqueness on customerEmail
schedule: "@daily"
incident:
  severity: Low
datasets:
- name: sales
  datasource:
    name: mySqlDatabase
parameters:
  kind: FieldDuplicates
  field: customerEmail

Complex example

kind: Monitor
version: 2
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Average Monitor on price
description: The monitor fails if the field's average is outside of a given range.
tags:
- name: Low Cardinality
  kind: Classification
- name: Production
schedule: 5 4 * * *
incident:
  severity: Low
  message: Some message
notifications:
  - kind: Slack
    name: team-data-science
  - kind: Email
    name: [email protected]
  - kind: Email
    id: 41df515f-e5b6-4b4e-b684-d9108ac563bf
    name: [email protected]
datasets:
- id: 5a93a977-df0b-4f17-b4dd-d12ebe21549d
parameters:
  kind: Metrics
  field: price
  aggregation:
    kind: Average
  threshold:
  	kind: Static
    min: 1000.0
    isMinInclusive: false
  groupBy:
    field: channel
  timeWindow:
    field: time
    firstRun: P100D

Multiple Monitors on the same file

kind: Monitor
version: 2
friendlyId: customerEmailUnique
name: Uniqueness on customerEmail
schedule: "@daily"
incident:
  severity: Low
datasets:
- name: sales
  datasource:
    name: mySqlDatabase
parameters:
  kind: FieldDuplicates
  field: customerEmail

---

kind: Monitor
version: 2
friendlyId: customerIdUnique
name: Uniqueness on customerEmail
schedule: "@daily"
incident:
  severity: Low
datasets:
- name: sales
  datasource:
    name: mySqlDatabase
parameters:
  kind: FieldDuplicates
  field: customerId

Common types

Simple Types

Duration

Applicable to firstRun, frequency,offset,deltaQuerying,rollingTimeWindow

Uses the ISO 8601 format. It only accept a single positive date/time element.

So only the following formats are accepted: PnY (Year unit), PnM (Month unit), PnW (Week unit), PnD (Day unit), PTnH (Hour unit), PTnM (Minute unit), PTnS (Second unit), where n is a positive number or 0.

Examples:

P1W represents 1 week
P2D represents 2 days
PT24H represents 24 hours
P0D represents 0 days

Note: In most cases, only a subset of units supported (for instance, only Year, Months and Days, but not the other ones). These cases are described in the schema of each monitor.

`Date`

Uses the ISO 8601 format: YYYY-MM-DD

Examples:

2023-01-03 represents January 3rd, 2023
2023-12-31 represents December 31st, 2023

`UUID`

Identifier using the format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where x is an hexadecimal character

Examples:

20be41bc-9f0e-4a2f-b7d7-e737051589cd
6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
ecd00144-153e-479c-b51b-1273271e33ed

Monitor specific types

`DynamicThreshold`

sensitivity:    # (optional - default *Normal*) Sensitivity of the ML model
  "Low" | "Normal" | "High"
bounds:         # (optional - default *LowerAndUpper*) Indicates on which bound to alert
                # *Lower* - Alert only if the value is below the expected range
                # *Upper* - Alert only if the value is above the expected range
                # *LowerAndUpper* - Alert if the value is below or above the expected range
  "Lower" | "Upper" | "LowerAndUpper"

`WhereStatement`

(optional - default null) SQL boolean expression used to filter out data from input dataset.

null to disable this filtering.

Examples:

myColumn > 5
category LIKE 'production-%'
(myColumn > 5 AND myColumn < 25) OR COMPLEX_FUNCTION(category)

`GroupBy`

(optional - default null) Field to use for alerting per category.

null to disable categorical analysis.

Examples:

myColumn

`TimeWindow`

(optional - default null) Time Window parameters for the query

null to disable using time window

field: String       # (REQUIRED) Field to use for time window
firstRun: Duration  # (REQUIRED) How many past points to fetch
                    # Allowed duration units: Days, Hours, Minutes
frequency: Duration           # (REQUIRED) Aggregation size
                              # Can be 1 month, 1 week, 1 day, 1 hour, 30 minutes, 20 minutes, 15 minutes, 10 minutes
                              # *null* to auto-select aggregation between 1 day and 1 hour

`Offsets and Lookback`

Same as TimeWindow, with the following properties added:

# Same as TimeWindowWithOffset, with the following properties added
offsetPeriods: Integer              # (optional - default *null*) Number of time periods based on frequency offset by. 0 will query the current time period. 1 is recommended to query the last complete time period. 
                              # *null* to use the default value of 1 meaning it will query the last complete period
disableDeltaQuerying: Boolean # (optional - default false) Disable the Lookback/delta querying feature
deltaQuerying: Duration       # (optional - default 1 day) Lookback/Length of data to re-query at each run

`Partition`

(optional - default null) If input dataset is partitioned, filter to use on the partitioned column.

kind:                # (REQUIRED) Kind of partition to use
  "IngestionTime" | "TimeUnitColumn" | "IntegerRange"
# For *IngestionTime* kind, to use 
interval: Duration   # (REQUIRED)
# For *TimeUnitColumn* kind
field: String        # (REQUIRED) 
interval: Duration   # (REQUIRED) 
# For IntegerRange kind
field: String        # (REQUIRED) 
min: Integer         # (REQUIRED)
max: Integer         # (REQUIRED)

The partition kind and field must match the partitioning of the input dataset.

kind: Monitor
version: 2
id: 9699f47e-bb01-4514-0001-451424fe4179  # (REQUIRED) ID of the monitor
name: My monitor name                     # (REQUIRED) Name of the monitor
description: First monitor using CLI      # Description of the monitor
tags:                                     # Tags to apply to the monitor
- name: Production
- name: IMPORTANT
terms:                                    # Business Terms to apply to the monitor
- name: ROI
schedule: 31 * * * *                      # Schedule for the monitor (using CRON format)
incident:
  severity: Low														# Severity of the incident to create in case of alert
  message: ""															# Message to add to the incident/alert notification
datasets:																	# (REQUIRED) Datasets to monitor
- id: 6d8f779d-f6ac-41f7-be74-7237caa76967
parameters:																# (REQUIRED) Parameters of the monitor
  kind: Metrics											# (REQUIRED) Kind of monitor to use
  field: price
  aggregation:
    kind: Average
  aggregation:
    kind: Sum
  threshold:
    kind: Static
    min: 1000
    isMinInclusive: false

📘
Parameters list and example for every monitor type
You can find the exact list of parameters for every monitor type with examples in this section of the documentation

threshold -> excludedDates

In Monitors where a time window is available, it is also possible to exclude specific dates from alerting.

excludedDates Takes a list of values of either standardCalendaror name.

standardCalendar refers to out of the box calendars available. Possible values are : GERMANY_PUBLIC_HOLIDAYS, FRANCE_PUBLIC_HOLIDAYS, BELGIUM_PUBLIC_HOLIDAYS, NETHERLANDS_PUBLIC_HOLIDAYS, SPAIN_PUBLIC_HOLIDAYS, UK_PUBLIC_HOLIDAYS, US_PUBLIC_HOLIDAYS, SUNDAYS, WEEKENDS

namerefers to the name of the custom special dates calendar created via the API.

threshold:
  excludedDates:
  - standardCalendar: GERMANY_PUBLIC_HOLIDAYS
  - standardCalendar: FRANCE_PUBLIC_HOLIDAYS
  - name: test

Monitor schema

📘
Use Monitor Version 2

Schema

📘
Referencing a dataset

Identifying Monitors

With `id`

With `friendlyId`

Referencing other Sifflet objects

For tags

For terms

For notifications

Email, Slack, Microsoft Teams

Jira

Finding the Issue Type ID

For datasets

With `uri`

Retrieving IDs from the UI

📘
URIs

Examples

Simple example

Complex example

Multiple Monitors on the same file

Common types

Simple Types

Duration

`Date`

`UUID`

Monitor specific types

`DynamicThreshold`

`WhereStatement`

`GroupBy`

`TimeWindow`

`Offsets and Lookback`

`Partition`

📘
Parameters list and example for every monitor type

threshold -> excludedDates

📘Use Monitor Version 2

Schema

📘Referencing a dataset

Identifying Monitors

With id

With friendlyId

Referencing other Sifflet objects

For tags

For terms

For notifications

Email, Slack, Microsoft Teams

Jira

Finding the Issue Type ID

For datasets

With uri

Retrieving IDs from the UI

📘URIs

Examples

Simple example

Complex example

Multiple Monitors on the same file

Common types

Simple Types

Duration

Date

UUID

Monitor specific types

DynamicThreshold

WhereStatement

GroupBy

TimeWindow

Offsets and Lookback

Partition

📘Parameters list and example for every monitor type

threshold -> excludedDates

📘
Use Monitor Version 2

📘
Referencing a dataset

With `id`

With `friendlyId`

With `uri`

📘
URIs

`Date`

`UUID`

`DynamicThreshold`

`WhereStatement`

`GroupBy`

`TimeWindow`

`Offsets and Lookback`

`Partition`

📘
Parameters list and example for every monitor type