Databricks
To connect Sifflet to Databricks, you will need an account with admin rights to create a read-only access.
You can integrate by following these steps:
- Create a dedicated Service Principal with its token or use a user's Personal Access Token
- Grant permissions
- Connect to Sifflet
Unity Catalog and Databricks SQL
We currently support Databricks with Unity Catalog and Databricks SQL. For other configurations, don't hesitate to reach out.
1- Create a Service Principal or a user Personal Access Token
A service principal is a dedicated identity created for use with automated tools such as Sifflet. Similar to Databricks' documentation, we recommend using a service principal instead of a Personal Access Token belonging to a Databricks user.
Option 1 - Service Principal (recommended)
Service principal token
When configuring with Service Principal, Sifflet only support personal access token for service principal (which requires Databricks CLI to be generated) and not OAuth Machine-to-Machine (available on the interface).
- Create a service principal, following one of those two options:
- Option 1.A - Directly from your Databricks admin console:
- From your username on the top right, go to
Settings
- In
Workspace admin
->Identity and access
->Service principals
select Manage. - Click on
Add service principal
->Add new
- Provide a name to your service principal. Example:
Sifflet service principal
- From your username on the top right, go to
- Option 1.B - Programmatically: please refer to Databricks' documentation here
- Option 1.A - Directly from your Databricks admin console:
- In the
Configurations
tab of the newly created service principal, enable theDatabricks SQL access
entitlement (this will be the only entitlement required). - Grant the usage of token for this service principal (more information here)
- In
Workspace admin
->Advanced
->Personal Access Tokens
->Permission Settings
- Add the
Can Use
permission to the service principal previously created.
- In
- Generate a personal access token for the service principal
- This section requires the use of Databricks CLI
- For a step-by-step description please follow the "Databricks personal access tokens for service principals" from the official documentation.
- Save the token as you will need it to connect to Sifflet.
Option 2 - User Personal Access Token
To create a user personal access token, please follow the section "Databricks personal access tokens for workspace users" from the official documentation.
Save the token as you will need it to connect to Sifflet.
2- Grant permissions
You can grant permissions to the service principal at either Catalog, Schema or even table level.
Grant to existing and future tables
- Granting permissions at Catalog level will automatically propagate the permissions for the existing and future Schemas and Tables.
- Granting permissions at Schema level will automatically propagate the permissions for the existing and future Tables.
To grant access, you have the two options below:
Grant from the Databricks console
In the left menu select Catalog
Granting permissions at Catalog level
Navigate to the Catalog you want to add to Sifflet -> Permissions
tab -> Click on Grant
-> Search the service principal/user in the Principals
box -> Select the Data Reader
privilege preset.
Granting permissions at Schema level
1 - You need to ensure to grant USE CATALOG
on the parent catalog to perform any actions in the target schema.
Navigate to the Catalog you want to add to Sifflet -> Permissions
tab -> Click on Grant
-> Search the service principal/user in the Principals
box -> Select USE CATALOG
.
2 - You can now grant Schema level permissions.
Navigate to the Schema you want to add to Sifflet -> Permissions
tab -> Click on Grant
-> Search the service principal/user in the Principals
box -> Select the Data Reader
privilege preset.
Grant by running SQL queries
For the service principal, you will need the service principal Application ID
that can be found in Admin Console
-> Service principals
tab.
For the user Personal Access Token, you can replace the Application ID
by the user name.
Granting permissions at Catalog level:
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<Application_ID>`;
Or at Schema level:
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;
GRANT SELECT ON SCHEMA <catalog_name>.<schema_name> TO `<Application_ID>`;
Or for specific tables:
GRANT USE_CATALOG ON CATALOG <catalog_name> TO `<Application_ID>`;
GRANT USE_SCHEMA ON SCHEMA <schema_name> TO `<Application_ID>`;
GRANT SELECT ON TABLE <catalog_name>.<schema_name>.<table_name> TO `<Application_ID>`;
3- Connect to Sifflet
Create the Secret
To add the newly created token in Sifflet, please follow the below steps:
- In "Integration" --> submenu "Credentials", create a new secret
- In the "Secret" area, copy-paste the token
Add the datasource
Warehouse
You can use an existing Warehouse or create a new dedicated one for Sifflet. You can follow the instructions here to create a new one.
You can choose the cluster size depending on the number of data assets you want to monitor. As a reference, X-Small is enough for environments with thousands of tables or fewer.
- First, let's find the information that Sifflet will require to connect. In your Databricks environment, go to
SQL Warehouse
-> Choose the Warehouse Sifflet will use -> Navigate to theConnection Details
tab.
- Back to Sifflet:
- Go to Integration --> click "+ New"
- Fill out the necessary information that was collected in the previous step.
Host
: corresponds to theServer hostname
on Databricks, with a formatxxxxx.cloud.databricks.com
Port
: 443Http Path
: corresponds to theHTTP path
on DatabricksCatalog
: the Catalog you want to add to SiffletSchema
: the Schema you want to add to SiffletSecret
: the name of the secret containing the token
You can refer to this page for more detailed information on adding a data source in Sifflet.
Updated 6 months ago