Athena
You can integrate with Athena to map the dependencies between S3 objects and data pipelines.
To connect Sifflet to Athena, you will need to log in your AWS account as a user with admin permissions.
The main steps are the following:
- Configure a read-only role in your AWS account, with a trust policy allowing Sifflet to assume this role
- Add an Athena datasource in Sifflet
1- AWS configuration
Please find below the steps to configure your AWS account to allow Sifflet to connect with Athena:
a. Create an AWS policy granting read-only permissions on Athena resources
b. Create a role with a custom trust policy
a. Create a AWS policy granting read-only permissions
- In the AWS console, go to
IAM
->Policies
, click onCreate policy
- Copy the following JSON and make the following changes:
- In the first three
Resource
sections, replaceeu-west-1:111122223333
with your own region and AWS account ID - In the last
Resource
section, replace [YOUR-BUCKET-TO-MONITOR] (and if necessary [YOUR-FOLDER-TO-MONITOR]) with your own values.
- In the first three
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:GetQueryResultsStream",
"athena:ListQueryExecutions",
"athena:CreatePreparedStatement",
"athena:DeletePreparedStatement",
"athena:GetPreparedStatement"
],
"Resource": [
"arn:aws:athena:eu-west-1:111122223333:workgroup/*"
]
},
{
"Effect": "Allow",
"Action": [
"athena:ListDataCatalogs",
"athena:GetDatabase",
"athena:ListDatabases",
"athena:ListTableMetadata",
"athena:GetTableMetadata"
],
"Resource": [
"arn:aws:athena:eu-west-1:111122223333:datacatalog/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:eu-west-1:111122223333:catalog",
"arn:aws:glue:eu-west-1:111122223333:database/*",
"arn:aws:glue:eu-west-1:111122223333:table/*/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]",
"arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]/*",
"arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]/[YOUR-FOLDER-TO-MONITOR]/*"
]
}
]
}
- On the next screen, you can choose the name of the policy, for instance,
sifflet_athena_read_policy
b. Create the role with a custom trust policy
Please find the main steps below (you can also refer to AWS's official documentation):
- In the AWS console, go to
IAM
->Roles
, click onCreate role
- Choose
Custom trust policy
and copy the following JSON.
Do not forget to replacetenant
with your tenant name.
For example, if your instance is<https://datacompany.siffletdata.com
> then your tenant isdatacompany
.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::386396202409:role/sifflet-<tenant>-integration"
]
},
"Action": "sts:AssumeRole"
}
]
}
- On the next screen, choose the previously created policy,
sifflet_athena_read_policy
in our example - On the next screen, you can choose the name of the role, for instance,
sifflet_athena_read_role
- You can now access your newly created role. Please take note of its ARN, you will need it to connect Sifflet to Athena. It will be similar to the following
arn:aws:iam::123456789101:role/sifflet_athena_read_role
2- Add an Athena datasource in Sifflet
- On the left panel, choose "Integration" and then the "Sources" submenu
- Click "New" Datasource and choose "Athena"
- Information required:
- Name: the Sifflet name of the datasource
- AWS Region: the AWS region of your Athena instance
- Athena Data source: the name of your Athena data source
- Database: the Athena database to ingest from
- S3 Output location: the S3 bucket where Athena stores the results of any query
- Workgroup: the name of your Athena Workgroup
- AWS Role Arn: the ARN of the previously created AWS role
arn:aws:iam::123456789101:role/sifflet_athena_read_role
Updated about 1 month ago