Monitor schema
Schema
kind: Monitor
version: 1
id: UUID # (REQUIRED) ID of the monitor
name: String # (REQUIRED) Name of the monitor
description: String # (optional - default null) Description for the monitor
tags:
- id: UUID # (optional - default null) ID of the tag
name: String # (optional - default null) Name of the tag
kind: # (optional - default null) Type of tag
"Tag" | "Classification"
terms:
- id: UUID # (optional - default null) ID of the term
name: String # (optional - default null) Name of the term
schedule: String # (optional - default null) Schedule for monitor execution. null for no schedule
# Defined via a crontab CRON expression
incident: # (REQUIRED)
severity: # (REQUIRED) Severity of the incident
"Low" | "Moderate" | "High" | "Critical"
message: String # (optional - default null) Custom message to add to the incident and notifications
notifications:
- kind: # (REQUIRED) Kind of notification to send
"Slack" | "Email" | "MicrosoftTeams"
id: UUID # (optional - default null) ID of the notification
name: String # (optional - default null) Name of the Slack Channel, Email address or Microsoft Teams channel
datasets: # (REQUIRED) List of datasets to monitor. Most monitors can only have a single dataset
- id: UUID # (optional - default null) ID of the dataset
- name: String # (optional - default null) Name of the dataset
- datasource:
- id: UUID # (optional - default null) ID of the dataset source
- name: String # (optional - default null) Name of the dataset
parameters: ... # (REQUIRED) See Parameters
Referencing a dataset
Users can use the name of the dataset in case there are no conflicts identifying the dataset. Users can rely on the ID of the dataset, the data source name, or the data source ID to prevent any conflict.
Referencing other Sifflet objects
The monitor definition references many other Sifflet objects such as: tags, classification tags, terms, datasets, slack channels, teams channels or emails.
To do so, you can use the ID of the object. This will ensure that applying your monitor will link to the correct object.
tags:
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
However, this process of retrieving and using these IDs is very cumbersome. So you can also use more natural identifier like the name of the object.
tags:
- name: Production
- name: Important Asset
Referencing by name will work if there is a matching object and if there is no ambiguity (there are not 2 objects with the same name).
In case of ambiguity with the name, you need to use another property to distinguish the different objects.
For instance, for the tags, it can be kind
property (if a classification tag and a classical tag are both using the name Production
).
tags:
- name: Production
kind: Classification
- name: Important Asset
Finally, using the id
will always guarantee non-ambiguious reference.
You can also use the id
with other properties to provide more context:
tags:
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
name: Production
kind: Classification
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
name: Important Asset
In this case, if the object is renamed or changes kind, the monitor can still be applied. However a Warning will be generated with the mismatch.
For tags
For tags, you can using the id
, the name
and the kind
properties.
For instance:
tags:
- name: To fix
- id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
name: Production
kind: Classification
- id: 6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
name: Important Asset
For terms
For terms, you can using the id
and the name
properties.
For instance:
terms:
- name: ROI
- id: 7edf1177-1a3c-4d71-b85f-e38b773735b4
- id: 20be41bc-9f0e-4a2f-b7d7-e737051589cd
name: Marketing
For notifications
For notifications, you can using the id
, the kind
and the name
properties.
For instance:
notifications:
- kind: Email
name: [email protected]
- kind: Slack
name: Alerts
id: dd6f06ec-fab1-4a87-9544-b113f496d61d
- kind: Slack
id: 8c2dbfe5-0911-4586-b937-c5f48b7c21d9
For datasets
For datasets, you can use the id
, the name
the datasource.id
and datasource.name
properties.
For instance:
datasets:
- name: Sales
- name: Prices
datasource:
name: BigQuery Data warehouse
- id: 70217023-1a89-4c0b-9b6a-c85192c918b3
- name: company_employees
datasource:
id: ce3e9dd9-b007-42b0-b884-8c419f7f6daa
Retrieving IDs from the UI
To get the ID of a dataset, go to the dataset page and click on Copy Data Asset ID button of :
You can get a data source ID from the data source page and when clicking on Copy Source ID
Examples
Simple example
kind: Monitor
version: 1
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Uniqueness on customerEmail
schedule: "@daily"
incident:
severity: Low
datasets:
- name: sales
datasource:
name: mySqlDatabase
parameters:
kind: Unique
field: customerEmail
Complex example
kind: Monitor
version: 1
id: 998156a0-efc3-429b-b150-38525bc8f0cd
name: Average Monitor on price
description: The monitor fails if the field's average is outside of a given range.
tags:
- name: Low Cardinality
kind: Classification
- name: Production
schedule: 5 4 * * *
incident:
severity: Low
message: Some message
notifications:
- kind: Slack
name: team-data-science
- kind: Email
- name: [email protected]
- kind: Email
id: 41df515f-e5b6-4b4e-b684-d9108ac563bf
name: [email protected]
datasets:
- id: 5a93a977-df0b-4f17-b4dd-d12ebe21549d
parameters:
kind: StaticMetrics
field: price
aggregation:
kind: Average
threshold:
min: 1000.0
isMinInclusive: false
groupBy:
field: channel
timeWindow:
field: time
duration: P100D
Common types
Simple Types
Duration
Duration
Uses the ISO 8601 format. It only accept a single positive date/time element.
So only the following formats are accepted: PnY
(Year unit), PnM
(Month unit), PnW
(Week unit), PnD
(Day unit), PTnH
(Hour unit), PTnM
(Minute unit), PTnS
(Second unit), where n
is a positive number or 0.
Examples:
P1W
represents 1 weekP2D
represents 2 daysPT24H
represents 24 hoursP0D
represents 0 days
Note: In most cases, only a subset of units supported (for instance, only Year, Months and Days, but not the other ones). These cases are described in the schema of each monitor.
Date
Date
Uses the ISO 8601 format: YYYY-MM-DD
Examples:
2023-01-03
represents January 3rd, 20232023-12-31
represents December 31st, 2023
UUID
UUID
Identifier using the format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
where x
is an hexadecimal character
Examples:
20be41bc-9f0e-4a2f-b7d7-e737051589cd
6d2d4dfd-8b86-4f7e-871b-7a2fcd29748b
ecd00144-153e-479c-b51b-1273271e33ed
Monitor specific types
DynamicThreshold
DynamicThreshold
sensitivity: # (optional - default *Normal*) Sensitivity of the ML model
"Low" | "Normal" | "High"
bounds: # (optional - default *LowerAndUpper*) Indicates on which bound to alert
# *Lower* - Alert only if the value is below the expected range
# *Upper* - Alert only if the value is above the expected range
# *LowerAndUpper* - Alert if the value is below or above the expected range
"Lower" | "Upper" | "LowerAndUpper"
WhereStatement
WhereStatement
(optional - default null) SQL boolean expression used to filter out data from input dataset.
null to disable this filtering.
Examples:
myColumn > 5
category LIKE 'production-%'
(myColumn > 5 AND myColumn < 25) OR COMPLEX_FUNCTION(category)
GroupBy
GroupBy
(optional - default null) Field to use for alerting per category.
null to disable categorical analysis.
Examples:
myColumn
TimeWindow
TimeWindow
(optional - default null) Time Window parameters for the query
null to disable using time window
field: String # (REQUIRED) Field to use for time window
duration: Duration # (REQUIRED) Length of the time window to use
# Allowed duration units: Days, Hours, Minutes
TimeWindowWithOffset
TimeWindowWithOffset
Same as TimeWindow
, with the following property added:
offset: Duration # (optional - default *null*) Delay the query of data by this duration
# *null* to disable using an offset
# Allowed duration units: Days, Hours
TimeWindowWithOffsetAndDelta
TimeWindowWithOffsetAndDelta
Same as TimeWindowWithOffset
, with the following properties added:
# Same as TimeWindowWithOffset, with the following properties added
disableDeltaQuerying: Boolean # (optional - default false) Disable the delta querying feature
deltaQuerying: Duration # (optional - default 1 day) Length of data to re-query at each run
Partition
Partition
(optional - default null) If input dataset is partitioned, filter to use on the partitioned column.
kind: # (REQUIRED) Kind of partition to use
"IngestionTime" | "TimeUnitColumn" | "IntegerRange"
# For *IngestionTime* kind, to use
interval: Duration # (REQUIRED)
# For *TimeUnitColumn* kind
field: String # (REQUIRED)
interval: Duration # (REQUIRED)
# For IntegerRange kind
field: String # (REQUIRED)
min: Integer # (REQUIRED)
max: Integer # (REQUIRED)
The partition kind and field must match the partitioning of the input dataset.
kind: Monitor
version: 1
id: 9699f47e-bb01-4514-0001-451424fe4179 # (REQUIRED) ID of the monitor
name: My monitor name # (REQUIRED) Name of the monitor
description: First monitor using CLI # Description of the monitor
tags: # Tags to apply to the monitor
- name: Production
- name: IMPORTANT
terms: # Business Terms to apply to the monitor
- name: ROI
schedule: 31 * * * * # Schedule for the monitor (using CRON format)
incident:
severity: Low # Severity of the incident to create in case of alert
message: "" # Message to add to the incident/alert notification
datasets: # (REQUIRED) Datasets to monitor
- id: 6d8f779d-f6ac-41f7-be74-7237caa76967
parameters: # (REQUIRED) Parameters of the monitor
kind: StaticMetrics # (REQUIRED) Kind of monitor to use
field: price
aggregation:
kind: Average
threshold:
min: 1000
isMinInclusive: false
Parameters list and example for every monitor type
You can find the exact list of parameters for every monitor type with examples in this section of the documentation
Updated 18 days ago