You can integrate with Athena to map the dependencies between S3 objects and data pipelines.

To connect Sifflet to Athena, you will need to log in your AWS account as a user with admin permissions.

The main steps are the following:

  1. Configure a read-only role in your AWS account, with a trust policy allowing Sifflet to assume this role
  2. Add an Athena datasource in Sifflet

1- AWS configuration

Please find below the steps to configure your AWS account to allow Sifflet to connect with Athena:

a. Create an AWS policy granting read-only permissions on Athena resources

b. Create a role with a custom trust policy

a. Create a AWS policy granting read-only permissions

  • In the AWS console, go to IAM -> Policies, click on Create policy
  • Copy the following JSON and make the following changes:
    • In the first three Resource sections, replace eu-west-1:111122223333 with your own region and AWS account ID
    • In the last Resource section, replace [YOUR-BUCKET-TO-MONITOR] (and if necessary [YOUR-FOLDER-TO-MONITOR]) with your own values.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:GetQueryResultsStream",
                "athena:ListQueryExecutions",
                "athena:CreatePreparedStatement",
                "athena:DeletePreparedStatement",
                "athena:GetPreparedStatement"
            ],
            "Resource": [
                "arn:aws:athena:eu-west-1:111122223333:workgroup/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "athena:ListDataCatalogs",
                "athena:GetDatabase",
                "athena:ListDatabases",
                "athena:ListTableMetadata",
                "athena:GetTableMetadata"
            ],
            "Resource": [
                "arn:aws:athena:eu-west-1:111122223333:datacatalog/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition"
            ],
            "Resource": [
                "arn:aws:glue:eu-west-1:111122223333:catalog",
                "arn:aws:glue:eu-west-1:111122223333:database/*",
                "arn:aws:glue:eu-west-1:111122223333:table/*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]",
                "arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]/*",
                "arn:aws:s3:::[YOUR-BUCKET-TO-MONITOR]/[YOUR-FOLDER-TO-MONITOR]/*"
            ]
        }
    ]
}
  • On the next screen, you can choose the name of the policy, for instance, sifflet_athena_read_policy

b. Create the role with a custom trust policy

Please find the main steps below (you can also refer to AWS's official documentation):

  • In the AWS console, go to IAM -> Roles, click on Create role
  • Choose Custom trust policy and copy the following JSON.
    Do not forget to replace tenant with your tenant name.
    For example, if your instance is <https://datacompany.siffletdata.com> then your tenant is datacompany.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::386396202409:role/sifflet-<tenant>-integration"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  • On the next screen, choose the previously created policy, sifflet_athena_read_policy in our example
  • On the next screen, you can choose the name of the role, for instance, sifflet_athena_read_role
  • You can now access your newly created role. Please take note of its ARN, you will need it to connect Sifflet to Athena. It will be similar to the following arn:aws:iam::123456789101:role/sifflet_athena_read_role

2- Add an Athena datasource in Sifflet

  • On the left panel, choose "Integration" and then the "Sources" submenu
  • Click "New" Datasource and choose "Athena"
  • Information required:
    • Name: the Sifflet name of the datasource
    • AWS Region: the AWS region of your Athena instance
    • Athena Data source: the name of your Athena data source
    • Database: the Athena database to ingest from
    • S3 Output location: the S3 bucket where Athena stores the results of any query
    • Workgroup: the name of your Athena Workgroup
    • AWS Role Arn: the ARN of the previously created AWS role arn:aws:iam::123456789101:role/sifflet_athena_read_role