Referential Integrity

Overview

The Sifflet Referential Monitor monitor provides tools to ensure the validity of columns between two different tables. Ensure that values in a table match the valid values of another.

Example: If we have anORDERtable with a customer_id foreign key that points to the CUSTOMER Table's id key. We want to ensure that the customer_id values in the ORDER table are present in the CUSTOMER Table and thus valid


How to

Use

The Referential Integrity Monitor requires two tables to be select. A common usecase is to select a fact table first and a dimension table second. When two tables are selected a field from each table must be selected to compare values between the tables.

Configuration

Comparison Settings

Left Records Match Right Dataset ( Default )

The most common referential integrity use case. This will check that the values in the first ( left ) table are present in the second ( right ) table.

Use case: Every Order must have a valid Customer that has an entry in the customer table

Right Records Match Left Dataset

This will check that the values in the second ( right ) table are present in the first ( left ) table.

Use case: Every Product must have an inventory line.

Left and Right Records fully match

Both tables should contain all the values present in their counterpart.

Usecase: Every product must have an inventory line and every inventory line must have a valid product

Time Settings

In the Referential integrity monitors the time settings only apply to the first ( left ) table.

Time settings allow for incremental checks on the first ( left ) table to avoid scanning the entire table.

For example Time Settings configured with a daily aggregation and no rollback settings will only scan one day's worth of the first table each run.

Use Case: I want to check every day that all the orders from the past day have valid customer ids.

📘

Incremental Monitors can save costs

Make sure that the first table selected in your monitor is your fact table

Time Settings documentation.

Time Settings Off : When time settings are not activated, the entire table will be scanned each run. Use it when comparing two dimension tables! i.e. Every Product should have a valid Product Category!


As Code

Kind: ReferentialIntegrity

Params

left: References the first table selected

left.field: The field to match on the left table

left.whereStatement: SQL WHERE statement ( without the WHERE ) to apply to the left table.

left.timeWindow: Time Window Settings for the left table, to allow for incremental checks.

right: References the second table selected

right.field: The field to match on the right table.

right.whereStatement: SQL WHERE statement ( without the WHERE ) to apply to the left table.

matchType:

  • LeftOnRight(default): Ensures the values in the left tables are present on the right table
  • RightOnLeft: Ensures the values in the right table are present on the left table
  • Full: Values on either side must be present on the other

Simple Example - Referential Integrity on Customer_Id foreign key

datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
parameters:
  kind: ReferentialIntegrity
  left:
    field: CUSTOMER_ID
  right:
    field: CUSTOMER_ID

Same example but with tables in the opposite order

datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
parameters:
  kind: ReferentialIntegrity
  matchType: RightOnLeft
  left:
    field: CUSTOMER_ID
  right:
    field: CUSTOMER_ID

Complex Example - Incremental Referential Integrity on Customer_id

  • Checks that orders in the last day have valid customer ids
  • Includes where statements to filter out bad orders and expired customers.
datasets:
- uri: snowflake://sifflet/DEMO.SE_ENV.ORDERS
- uri: snowflake://sifflet/DEMO.SE_ENV.CUSTOMERS
parameters:
  kind: ReferentialIntegrity
  left:
    field: CUSTOMER_ID
    whereStatement: goodorder IS TRUE
    timeWindow:
      field: auto
      firstRun: P365D
      frequency: P1D
  right:
    field: CUSTOMER_ID
    whereStatement: NOT expired