Databricks

To connect Sifflet to Databricks, you will need an account with admin rights to create a read-only access.
You can integrate by following these steps:

  1. Create a dedicated Service Principal with its token or use a user's Personal Access Token
  2. Grant permissions
  3. Connect to Sifflet

📘

Unity Catalog and Databricks SQL

We currently support Databricks with Unity Catalog and Databricks SQL. For other configurations, don't hesitate to reach out.

1- Create a Service Principal or a user Personal Access Token

A service principal is a dedicated identity created for use with automated tools such as Sifflet. Similar to Databricks' documentation, we recommend using a service principal instead of a Personal Access Token belonging to a Databricks user.

Service Principal

  1. To create a service principal, you have the two options below:
    • Directly from your Databricks admin console: from your username on the top right, go to Admin Console-> Service Principals tab -> Click on Add service principal -> Add new service principal. You can name it for instance Sifflet service principal
    • Or programmatically: you can refer to Databricks' documentation here

The service principal should have only Databricks SQL access as Entitlements.

  1. Once created, grant the usage of token for this service principal. More information here
  2. The last step is to generate a token for it. More information here. Save the token as you will need it to connect to Sifflet.

User Personal Access Token

You can create a user personal access token by referring to Databricks' documentation here. Save the token as you will need it to connect to Sifflet.

2- Grant permissions

You can grant permissions to the service principal at either Catalog, Schema or even table level.

📘

Grant to existing and future tables

  • Granting permissions at Catalog level will automatically propagate the permissions for the existing and future Schemas (and consequently Tables)
  • Granting permissions at Schema level will automatically propagate the permissions for the existing and future Tables

To grant access, you have the two options below:

  • Directly from Databricks console: in Data Science & Engineering -> Navigate to the Catalog or Schema you want to add to Sifflet -> Permissions tab -> Click on Grant and choose for the service principal/user the Data Reader preset permissions
  • Or run the below SQL queries.
    For the service principal, you will need the service principal Application ID that can be found in Admin Console -> Service principals tab
    For the user Personal Access Token, you can replace the Application ID by the user name.

Granting permissions at Catalog level:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<Application_ID>`;

Or at Schema level:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;
GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;

Or for specific tables:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <schema_name> TO `<Application_ID>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<Application_ID>`;

3- Connect to Sifflet

Create the Secret

To add the newly created token in Sifflet, please follow the below steps:

  • In "Integration" --> submenu "Secrets", create a new secret
  • In the "Secret" area, copy-paste the token

Add the datasource

📘

Warehouse

You can use an existing Warehouse or create a new dedicated one for Sifflet. You can follow the instructions here to create a new one.
You can choose the cluster size depending on the number of data assets you want to monitor. As a reference, X-Small is enough for environments with thousands of tables or fewer.

  1. First, let's find the information that Sifflet will require to connect. In your Databricks environment, go to SQL Warehouse-> Choose the Warehouse Sifflet will use -> Navigate to the Connection Details tab.
  1. Back to Sifflet:
  • Go to Integration --> click "+ New"
  • Fill out the necessary information that was collected in the previous step.
    • Host : corresponds to the Server hostname on Databricks, with a format xxxxx.cloud.databricks.com
    • Port: 443
    • Http Path: corresponds to the HTTP path on Databricks
    • Catalog: the Catalog you want to add to Sifflet
    • Schema: the Schema you want to add to Sifflet
    • Secret: the name of the secret containing the token

You can refer to this page for more detailed information on adding a data source in Sifflet.