You can integrate with BigQuery by following these steps:

  1. Create a dedicated custom role with the minimal permissions required for Sifflet to operate
  2. Create a service account and generate a JSON Key file that you will import as a secret in Sifflet
  3. Connect to Sifflet

πŸ“˜

You will need the following minimum GCP roles in order to set up BigQuery in Sifflet:

  • Project IAM Admin
  • Role Administrator
  • Service Account Admin
  • Service Account Key Admin
roles/resourcemanager.projectIamAdmin
roles/iam.organizationRoleAdmin
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountKeyAdmin

1- Create a custom role in BigQuery

Create a dedicated custom role in GCP (GCP official documentation):

  • On your Google Cloud Platform console, under IAM & Admin, go to Roles, and create a new dedicated Sifflet Role
  • Set the following permissions for the role:
bigquery.datasets.get
bigquery.jobs.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.jobs.listAll

πŸ“˜

The last permissions enable our field-level lineage feature (see here for more details).

πŸ“˜

BigQuery Lineage

Sifflet can retrieve the lineage information directly computed by BigQuery.

You will need to:

  • Enable the Data Catalog, BigQuery, and data lineage APIs (link, cf BQ docs)
  • Grant the Sifflet role the following roles (see BQ docs) : roles/datacatalog.viewer, roles/datalineage.viewer and roles/bigquery.dataViewer
    OR at least the below permissions:
datalineage.events.get
datalineage.events.list
datalineage.locations.searchLinks
datalineage.processes.get
datalineage.processes.list
datalineage.runs.get
datalineage.runs.list

2- Create a service account

Create the service account (GCP official documentation) and assign the role created in the previous step:

  • In "IAM & Admin" --> "Service Accounts", create a dedicated service account
  • Select the Role you previously created (in the example, you can find it in "Custom"->"SiffletRole"
  • You can skip the last step, "Grant users access to this service account"
  • Once created, select it and on the right handside in "Actions", choose Manage Keys.
  • You can then create a New Key: Add Key-> Create new key -> select JSON format
  • Save the JSON file that you will need in Sifflet

3- Connect to Sifflet

To connect to BigQuery on Sifflet, you will need three items:

  • The connection details: your Project Id and Dataset
  • The secret which corresponds to the JSON key file previously saved
  • the frequency: how often you want the information to be refreshed

Create the Secret

To add the JSON key file to Sifflet secrets, please follow the below steps:

  • In "Integration" --> submenu "Secrets", create a new secret
  • In the "Secret" area, copy-paste the same content present in the JSON key file

Add the datasource

You can refer to this page on adding a data source in Sifflet for more detailed information.

  • In Integration --> click on "+ New"
  • Fill out the necessary information collected above
    The Billing Id is an optional parameter, it allows you to direct Sifflet's querying cost to a specific billing

FAQ

BigQuery external table

I have external tables on BQ, how does Sifflet handle them?

Sifflet treats your external table the same way as any regular table on BQ: you can search them in the catalog and setup monitoring rules.

Sifflet currently handles the below type of external tables. You will also need to grant additional access rights:

External Table typeSupportedAdditional rights required
GCS (Cloud Storage)βœ…You can grant the service account the following rights in order to have Viewer access for all GCS buckets/objects:
storage.buckets.list
storage.buckets.get
storage.objects.list
storage.objects.get

OR

You can choose to grant the service account Viewer access only on the buckets referenced by your external tables.
Google Driveβœ…You can grant the service account Viewer access only on directories/files referenced by your external tables.
Cloud BigTable❌

Unable to add BigQuery dataset

While adding the datasource, after pressing "List Dataset" I cannot find the dataset I want to add, is there any configuration issue?

Sifflet will display only the datasets the service account has access to. If you do not see one specific dataset, you might need to review the permissions.

Managing permissions at a dataset granularity

I would like to manage the permissions for the service account at a dataset level, how should I do this?

  • Create a Role as described in the doc but without jobs.create
bigquery.datasets.get  
bigquery.tables.get  
bigquery.tables.getData  
bigquery.tables.list  
bigquery.jobs.listAll
  • Create another Role with only the following permission:
bigquery.jobs.create
  • Create a service account as described in Sifflet’s doc and assign the last role (with the jobs.create permission). This will give the service account jobs.create at the project level. This does not give it access to every dataset !
  • For each dataset that you wish to grant access to Sifflet, follow the following steps:
    • In the BigQuery Explorer panel, select a dataset
    • Click Sharing -> Permissions
    • Click Add Principal
    • In the New Principals Field: Select or Enter the Service account you created for Sifflet
    • In the "Select a role" List , select the custom role you created in the first step (the one with the 5 permissions)
    • Click Save

Billing on another BigQuery project

I would like to bill my queries on another project than the one I am using in production. Is it possible?

In order to better manage the costs associated to the queries run on Sifflet, you can bill your queries on an additional project, an empty one for instance.

The service account will need the following permissions on the billing project:

bigquery.jobs.create

On the queried project, the permissions needed are:

bigquery.datasets.get  
bigquery.tables.get  
bigquery.tables.getData  
bigquery.tables.list  
bigquery.jobs.listAll

You will just need to specify the Id of the billing project in your data source parameters