You can integrate with BigQuery by following these steps:

  1. Create a dedicated custom role with the minimal permissions required for Sifflet to operate
  2. Create a service account and generate a JSON Key file that you will import as a secret in Sifflet
  3. Connect to Sifflet

📘

You will need the following minimum GCP roles in order to set up BigQuery in Sifflet:

  • Project IAM Admin
  • Role Administrator
  • Service Account Admin
  • Service Account Key Admin
roles/resourcemanager.projectIamAdmin
roles/iam.organizationRoleAdmin
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountKeyAdmin

1- Create a custom role in BigQuery

Create a dedicated custom role in GCP (GCP official documentation):

  • On your Google Cloud Platform console, under IAM & Admin, go to Roles, and create a new dedicated Sifflet Role
  • Set the following permissions for the role:
bigquery.datasets.get
bigquery.jobs.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.listAll

📘

The last permissions enable our field-level lineage feature (see here for more details).

2- Create a service account

Create the service account (GCP official documentation) and assign the role created in the previous step:

  • In "IAM & Admin" --> "Service Accounts", create a dedicated service account
  • Select the Role you previously created (in the example, you can find it in "Custom"->"SiffletRole"
  • You can skip the last step, "Grant users access to this service account"
  • Once created, select it and on the right handside in "Actions", choose Manage Keys.
  • You can then create a New Key: Add Key-> Create new key -> select JSON format
  • Save the JSON file that you will need in Sifflet

3- Connect to Sifflet

To connect to BigQuery on Sifflet, you will need three items:

  • The connection details: your Project Id and Dataset
  • The secret which corresponds to the JSON key file previously saved
  • the frequency: how often you want the information to be refreshed

Create the Secret

To add the JSON key file to Sifflet secrets, please follow the below steps:

  • In "Integration" --> submenu "Secrets", create a new secret
  • In the "Secret" area, copy-paste the same content present in the JSON key file

Add the datasource

You can refer to this page on adding a data source in Sifflet for more detailed information.

  • In Integration --> click on "+ New"
  • Fill out the necessary information collected above
    The Billing Id is an optional parameter, it allows you to direct Sifflet's querying cost to a specific billing

FAQ

BigQuery external table

I have external tables on BQ, how does Sifflet handle them?

Sifflet treats your external table the same way as any regular table on BQ: you can search them in the catalog and setup monitoring rules.

Sifflet currently handles the below type of external tables. You will also need to grant additional access rights:

External Table typeSupportedAdditional rights required
GCS (Cloud Storage)You can grant the service account the following rights in order to have Viewer access for all GCS buckets/objects:
storage.buckets.list
storage.buckets.get
storage.objects.list
storage.objects.get

OR

You can choose to grant the service account Viewer access only on the buckets referenced by your external tables.
Google DriveYou can grant the service account Viewer access only on directories/files referenced by your external tables.
Cloud BigTable

Unable to add BigQuery dataset

While adding the datasource, after pressing "List Dataset" I cannot find the dataset I want to add, is there any configuration issue?

Sifflet will display only the datasets the service account has access to. If you do not see one specific dataset, you might need to review the permissions.