BigQuery
You can integrate with BigQuery by following these steps:
- Create a dedicated custom role with the minimal permissions required for Sifflet to operate
- Create a service account and generate a JSON Key file that you will import as a secret in Sifflet
- Connect to Sifflet
You will need the following minimum GCP roles in order to set up BigQuery in Sifflet:
- Project IAM Admin
- Role Administrator
- Service Account Admin
- Service Account Key Admin
roles/resourcemanager.projectIamAdmin
roles/iam.organizationRoleAdmin
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountKeyAdmin
1- Create a custom role in BigQuery
Create a dedicated custom role in GCP (GCP official documentation):
- On your Google Cloud Platform console, under IAM & Admin, go to Roles, and create a new dedicated Sifflet Role
- Set the following permissions for the role:
bigquery.datasets.get
bigquery.jobs.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.listAll
The last permissions enable our field-level lineage feature (see here for more details).
BigQuery Lineage
Sifflet can retrieve the lineage information directly computed by BigQuery.
You will need to:
datalineage.events.get
datalineage.events.list
datalineage.locations.searchLinks
datalineage.processes.get
datalineage.processes.list
datalineage.runs.get
datalineage.runs.list
2- Create a service account
Create the service account (GCP official documentation) and assign the role created in the previous step:
- In "IAM & Admin" --> "Service Accounts", create a dedicated service account
- Select the Role you previously created (in the example, you can find it in "Custom"->"SiffletRole"
- You can skip the last step, "Grant users access to this service account"
- Once created, select it and on the right handside in "Actions", choose
Manage Keys
.
- You can then create a New Key:
Add Key
->Create new key
-> select JSON format - Save the JSON file that you will need in Sifflet
3- Connect to Sifflet
To connect to BigQuery on Sifflet, you will need three items:
- The connection details: your Project Id and Dataset
- The secret which corresponds to the JSON key file previously saved
- the frequency: how often you want the information to be refreshed
Create the Secret
To add the JSON key file to Sifflet secrets, please follow the below steps:
- In "Integration" --> submenu "Secrets", create a new secret
- In the "Secret" area, copy-paste the same content present in the JSON key file
Add the datasource
You can refer to this page on adding a data source in Sifflet for more detailed information.
- In Integration --> click on "+ New"
- Fill out the necessary information collected above
TheBilling Id
is an optional parameter, it allows you to direct Sifflet's querying cost to a specific billing
FAQ
BigQuery external table
I have external tables on BQ, how does Sifflet handle them?
Sifflet treats your external table the same way as any regular table on BQ: you can search them in the catalog and setup monitoring rules.
Sifflet currently handles the below type of external tables. You will also need to grant additional access rights:
External Table type | Supported | Additional rights required |
---|---|---|
GCS (Cloud Storage) | ✅ | You can grant the service account the following rights in order to have Viewer access for all GCS buckets/objects:storage.buckets.list storage.buckets.get storage.objects.list storage.objects.get OR You can choose to grant the service account Viewer access only on the buckets referenced by your external tables. |
Google Drive | ✅ | You can grant the service account Viewer access only on directories/files referenced by your external tables. |
Cloud BigTable | ❌ |
Unable to add BigQuery dataset
While adding the datasource, after pressing "List Dataset" I cannot find the dataset I want to add, is there any configuration issue?
Sifflet will display only the datasets the service account has access to. If you do not see one specific dataset, you might need to review the permissions.
Managing permissions at a dataset granularity
I would like to manage the permissions for the service account at a dataset level, how should I do this?
- Create a Role as described in the doc but without jobs.create
bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.listAll
- Create another Role with only the following permission:
bigquery.jobs.create
- Create a service account as described in Sifflet’s doc and assign the last role (with the jobs.create permission). This will give the service account jobs.create at the project level. This does not give it access to every dataset !
- For each dataset that you wish to grant access to Sifflet, follow the following steps:
- In the BigQuery Explorer panel, select a dataset
- Click Sharing -> Permissions
- Click Add Principal
- In the New Principals Field: Select or Enter the Service account you created for Sifflet
- In the "Select a role" List , select the custom role you created in the first step (the one with the 5 permissions)
- Click Save
Billing on another BigQuery project
I would like to bill my queries on another project than the one I am using in production. Is it possible?
In order to better manage the costs associated to the queries run on Sifflet, you can bill your queries on an additional project, an empty one for instance.
The service account will need the following permissions on the billing project:
bigquery.jobs.create
On the queried project, the permissions needed are:
bigquery.datasets.get
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.listAll
You will just need to specify the Id of the billing project in your data source parameters
Updated 9 months ago