Referential Integrity
Overview
The Sifflet Referential Monitor monitor provides tools to ensure the validity of columns between two different tables. Ensure that values in a table match the valid values of another.
Example: If we have anORDER
table with a customer_id
foreign key that points to the CUSTOMER
Table's id
key. We want to ensure that the customer_id
values in the ORDER
table are present in the CUSTOMER
Table and thus valid
How to
Use
The Referential Integrity Monitor requires two tables to be select. A common usecase is to select a fact table first and a dimension table second. When two tables are selected a field from each table must be selected to compare values between the tables.
Configuration
Comparison Settings
Left Records Match Right Dataset ( Default )
The most common referential integrity use case. This will check that the values in the first ( left ) table are present in the second ( right ) table.
Use case: Every Order must have a valid Customer that has an entry in the customer table
Right Records Match Left Dataset
This will check that the values in the second ( right ) table are present in the first ( left ) table.
Use case: Every Product must have an inventory line.
Left and Right Records fully match
Both tables should contain all the values present in their counterpart.
Usecase: Every product must have an inventory line and every inventory line must have a valid product
Time Settings
In the Referential integrity monitors the time settings only apply to the first ( left ) table.
Time settings allow for incremental checks on the first ( left ) table to avoid scanning the entire table.
For example Time Settings configured with a daily aggregation and no rollback settings will only scan one day's worth of the first table each run.
Use Case: I want to check every day that all the orders from the past day have valid customer ids.
Incremental Monitors can save costs
Make sure that the first table selected in your monitor is your fact table
Time Settings Off : When time settings are not activated, the entire table will be scanned each run. Use it when comparing two dimension tables! i.e. Every Product should have a valid Product Category!
As Code
Kind: ReferentialIntegrity
Params
left: References the first table selected
left.field: The field to match on the left table
left.whereStatement: SQL WHERE statement ( without the WHERE ) to apply to the left table.
left.timeWindow: Time Window Settings for the left table, to allow for incremental checks.
right: References the second table selected
right.field: The field to match on the right table.
right.whereStatement: SQL WHERE statement ( without the WHERE ) to apply to the left table.
matchType:
LeftOnRight
(default): Ensures the values in the left tables are present on the right tableRightOnLeft
: Ensures the values in the right table are present on the left tableFull
: Values on either side must be present on the other
Simple Example - Referential Integrity on Customer_Id foreign key
datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
parameters:
kind: ReferentialIntegrity
left:
field: CUSTOMER_ID
right:
field: CUSTOMER_ID
Same example but with tables in the opposite order
datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
parameters:
kind: ReferentialIntegrity
matchType: RightOnLeft
left:
field: CUSTOMER_ID
right:
field: CUSTOMER_ID
Complex Example - Incremental Referential Integrity on Customer_id
- Checks that orders in the last day have valid customer ids
- Includes where statements to filter out bad orders and expired customers.
datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
parameters:
kind: ReferentialIntegrity
left:
field: CUSTOMER_ID
whereStatement: goodorder IS TRUE
timeWindow:
field: auto
firstRun: P365D
frequency: P1D
right:
field: CUSTOMER_ID
whereStatement: NOT expired
Updated about 1 month ago