Hive on Cloudera (beta)
You can now integrate with Hive on Cloudera.
To connect Sifflet to Hive, you will need the following rights at least:
- rights to create a principal and keytab in Kerberos
- admin rights for Ranger
The main steps are the following:
- Create a principal on Kerberos for Sifflet
- Grant access rights for the principal via Ranger
- Find the JDBC URL used to connect to your Hive instance
- Add a Hive datasource in Sifflet
1- Create a Sifflet principal on Kerberos
- Create the principal and its keytab. See here for more details. One way to proceed is the following:
- Create a new principal, using
kadmin
(orkadmin.local
if you are logged in on the KDC)
After enteringkadmin
, create the principal withaddprinc principal-name
.
You can renameprincipal-name
for the actual name you want to give to this principal, for examplesifflet_principal
. Save the principal name as you will need it for step 4. It will have the following format:[email protected]
- Create a keytab file for this principal with
xst -k /path/to/keytab/file.keytab [email protected]
- Encode the keytab file in base64, with
openssl base64 -in /path/to/keytab/file.keytab -out /path/to/keytab/file.keytab.b64
- Save the content of the generated base64 file for step 4. To access the content:
cat /path/to/keytab/file.keytab.b64
- Create a new principal, using
- Get the krb5.conf configuration file. You can find an example on your cluster node.
Save the content of this file for step 4
2- Grant access rights for the principal via Ranger
Go to your Ranger Admin interface and configure authorisations.
- Create a new user with the same name as your principal, keeping only the part prior to
@YOURREALM
.[email protected]
→ the Ranger user should be namedprincipal-name
- Add a policy that grants this user
SELECT
permission on:
- the
sys
database - Any other Hive databases that you want to connect to Sifflet
This results in something similar to : database: <sys + list of target databases>, tables: *, columns: *
3- Find the JDBC URL of your Hive instance
The JDBC URL of your Hive instance will have the following format:
jdbc:hive2://<host>:<port>/<database>;principal=<principal>
. There could be additional optional parameters such as transportMode
and httpPath
. You can find more information here.
<host>
: the hostname or IP address of the server you connect to. It will usually point either to the host running HiveServer2 or a Zookeeper node (if using Zookeeper)<port>
: the port number to connect to (Usually 10000 or 10001 if connected directly to HiveServer, but can be something else for instance for Zookeeper)<database>
: the name of the Hive database you want to connect to<principal>
: a Kerberos user used by Hive (usually hive/[email protected])
4- Add a Hive datasource in Sifflet
Create the Secret
- In "Integration" --> submenu "Secrets", create a new secret
- In the "Secret" area, copy-paste the content of the generated base64 keytab file of step 1
Add the datasource
-
On the left panel, choose "Integration" and then the "Sources" submenu
-
Click "New" Datasource and choose "Hive"
-
Information required:
- Name: the Sifflet name of the datasource
- Database: the name of the Hive database you want to connect to
- Jdbc URL: the JDBC URL of step 3
- Principal: the principal name you saved at step 1
- Secret: the secret you just created in the previous step
- krb5 Configuration File Content: You can paste directly the content of the krb5.conf saved in step 1

Updated about 1 month ago