Databricks

To connect Sifflet to Databricks, you will need an account with admin rights to create a read-only access.
You can integrate by following these steps:

  1. Create a dedicated Service Principal with its token or use a user's Personal Access Token
  2. Grant permissions
  3. Connect to Sifflet

📘

Unity Catalog and Databricks SQL

We currently support Databricks with Unity Catalog and Databricks SQL. For other configurations, don't hesitate to reach out.

1- Create a Service Principal or a user Personal Access Token

A service principal is a dedicated identity created for use with automated tools such as Sifflet. Similar to Databricks' documentation, we recommend using a service principal instead of a Personal Access Token belonging to a Databricks user.

Option 1 - Service Principal (recommended)

📘

Service principal token

When configuring with Service Principal, Sifflet only support personal access token for service principal (which requires Databricks CLI to be generated) and not OAuth Machine-to-Machine (available on the interface).

  1. Create a service principal, following one of those two options:
    • Option 1.A - Directly from your Databricks admin console:
      • From your username on the top right, go to Settings
      • In Workspace admin -> Identity and access-> Service principals select Manage.
      • Click on Add service principal -> Add new
      • Provide a name to your service principal. Example: Sifflet service principal
    • Option 1.B - Programmatically: please refer to Databricks' documentation here
  2. In the Configurations tab of the newly created service principal, enable the Databricks SQL access entitlement (this will be the only entitlement required).
  3. Grant the usage of token for this service principal (more information here)
    • In Workspace admin -> Advanced-> Personal Access Tokens -> Permission Settings
    • Add the Can Use permission to the service principal previously created.
  4. Generate a personal access token for the service principal
    • This section requires the use of Databricks CLI
    • For a step-by-step description please follow the "Databricks personal access tokens for service principals" from the official documentation.
    • Save the token as you will need it to connect to Sifflet.

Option 2 - User Personal Access Token

To create a user personal access token, please follow the section "Databricks personal access tokens for workspace users" from the official documentation.

Save the token as you will need it to connect to Sifflet.

2- Grant permissions

You can grant permissions to the service principal at either Catalog, Schema or even table level.

📘

Grant to existing and future tables

  • Granting permissions at Catalog level will automatically propagate the permissions for the existing and future Schemas and Tables.
  • Granting permissions at Schema level will automatically propagate the permissions for the existing and future Tables.

To grant access, you have the two options below:

Grant from the Databricks console

In the left menu select Catalog

Granting permissions at Catalog level

Navigate to the Catalog you want to add to Sifflet -> Permissions tab -> Click on Grant -> Search the service principal/user in the Principals box -> Select the Data Reader privilege preset.

Granting permissions at Schema level

1 - You need to ensure to grant USE CATALOG on the parent catalog to perform any actions in the target schema.

Navigate to the Catalog you want to add to Sifflet -> Permissions tab -> Click on Grant -> Search the service principal/user in the Principals box -> Select USE CATALOG.

2 - You can now grant Schema level permissions.

Navigate to the Schema you want to add to Sifflet -> Permissions tab -> Click on Grant -> Search the service principal/user in the Principals box -> Select the Data Reader privilege preset.

Grant by running SQL queries

For the service principal, you will need the service principal Application ID that can be found in Admin Console -> Service principals tab.
For the user Personal Access Token, you can replace the Application ID by the user name.

Granting permissions at Catalog level:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<Application_ID>`;

Or at Schema level:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;
GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;

Or for specific tables:

GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <schema_name> TO `<Application_ID>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<Application_ID>`;

3- Connect to Sifflet

Create the Secret

To add the newly created token in Sifflet, please follow the below steps:

  • In "Integration" --> submenu "Credentials", create a new secret
  • In the "Secret" area, copy-paste the token

Add the datasource

📘

Warehouse

You can use an existing Warehouse or create a new dedicated one for Sifflet. You can follow the instructions here to create a new one.
You can choose the cluster size depending on the number of data assets you want to monitor. As a reference, X-Small is enough for environments with thousands of tables or fewer.

  1. First, let's find the information that Sifflet will require to connect. In your Databricks environment, go to SQL Warehouse-> Choose the Warehouse Sifflet will use -> Navigate to the Connection Details tab.
  1. Back to Sifflet:
  • Go to Integration --> click "+ New"
  • Fill out the necessary information that was collected in the previous step.
    • Host : corresponds to the Server hostname on Databricks, with a format xxxxx.cloud.databricks.com
    • Port: 443
    • Http Path: corresponds to the HTTP path on Databricks
    • Catalog: the Catalog you want to add to Sifflet
    • Schema: the Schema you want to add to Sifflet
    • Secret: the name of the secret containing the token

You can refer to this page for more detailed information on adding a data source in Sifflet.