You can integrate with BigQuery by following these steps:
- Create a dedicated custom role with the minimal permissions required for Sifflet to operate
- Create a service account and generate a JSON Key file that you will import as a secret in Sifflet
- Connect to Sifflet
You will need the following minimum GCP roles in order to set up BigQuery in Sifflet:
- Project IAM Admin
- Role Administrator
- Service Account Admin
- Service Account Key Admin
roles/resourcemanager.projectIamAdmin roles/iam.organizationRoleAdmin roles/iam.serviceAccountAdmin roles/iam.serviceAccountKeyAdmin
Create a dedicated custom role in GCP (GCP official documentation):
- On your Google Cloud Platform console, under IAM & Admin, go to Roles, and create a new dedicated Sifflet Role
- Set the following permissions for the role:
bigquery.datasets.get bigquery.jobs.create bigquery.tables.get bigquery.tables.getData bigquery.tables.list bigquery.jobs.listAll
The last permissions enable our field-level lineage feature (see here for more details).
Create the service account (GCP official documentation) and assign the role created in the previous step:
- In "IAM & Admin" --> "Service Accounts", create a dedicated service account
- Select the Role you previously created (in the example, you can find it in "Custom"->"SiffletRole"
- You can skip the last step, "Grant users access to this service account"
- Once created, select it and on the right handside in "Actions", choose
- You can then create a New Key:
Create new key-> select JSON format
- Save the JSON file that you will need in Sifflet
To connect to BigQuery on Sifflet, you will need three items:
- The connection details: your Project Id and Dataset
- The secret which corresponds to the JSON key file previously saved
- the frequency: how often you want the information to be refreshed
To add the JSON key file to Sifflet secrets, please follow the below steps:
- In "Integration" --> submenu "Secrets", create a new secret
- In the "Secret" area, copy-paste the same content present in the JSON key file
You can refer to this page on adding a data source in Sifflet for more detailed information.
- In Integration --> click on "+ New"
- Fill out the necessary information collected above
I have external tables on BQ, how does Sifflet handle them?
Sifflet treats your external table the same way as any regular table on BQ: you can search them in the catalog and setup monitoring rules.
Sifflet currently handles the below type of external tables. You will also need to grant additional access rights:
|External Table type||Supported||Additional rights required|
|GCS (Cloud Storage)||✅||You can grant the service account the following rights in order to have Viewer access for all GCS buckets/objects:|
You can choose to grant the service account Viewer access only on the buckets referenced by your external tables.
|Google Drive||✅||You can grant the service account Viewer access only on directories/files referenced by your external tables.|
While adding the datasource, after pressing "List Dataset" I cannot find the dataset I want to add, is there any configuration issue?
Sifflet will display only the datasets the service account has access to. If you do not see one specific dataset, you might need to review the permissions.
Updated about 2 months ago