dbt Impact Analysis GitLab Component

When you raise a merge request containing changes to one or many dbt models, the impact analysis component uses Sifflet's lineage API to generate the potential impact of your dbt code changes and posts it as a merge request comment.

The potential impact consists of the list of assets downstream of the dbt model(s) you’re modifying. These assets can be other dbt models, BI dashboards, or any other asset cataloged in Sifflet.

Using the component

To add the component to your repository, you can follow these steps:

  1. Generate a Sifflet API Access Token with the “Viewer” role. Once the token is generated, copy its value since it can only be viewed once.
  2. Add the token as a masked and hidden CI/CD variable to your GitLab repository. If you’re using an external secret manager with GitLab, you can use that instead.
  3. Generate a GitLab access token with the "Developer" role and the “api” scope.
  4. Add the GitLab token as a masked and hidden CI/CD variable to your GitLab repository. If you’re using an external secret manager with GitLab, you can use that instead.
  5. Add the following GitLab CI/CD component to your pipeline (the component is also available via the CI/CD catalog):
include:
  - component: $CI_SERVER_FQDN/siffletdata/dbt-impact-analysis-component/dbt-impact-analysis-component@v1
    inputs:
      gitlab_token: <THE VARIABLE CONTAINING YOUR GITLAB TOKEN GENERATED IN STEP #3>
      sifflet_instance_url: <THE FULL URL OF YOUR SIFFLET ENVIRONMENT> # This should have the following format: https://<your_instance_name>.siffletdata.com
      sifflet_api_token: <THE VARIABLE CONTAINING YOUR SIFFLET API TOKEN GENERATED IN STEP #1>
      impacted_asset_types: 'DASHBOARD, DATASET' # Optional, comma-separated list of impacted asset types that you want to appear in the impact report
      impacted_tags: 'tag1, tag2' # Optional, list of tags that you want to appear in the impact report (only impacted assets that have a tag from the list will appear in the report)
      gitlab_server_url: 'https://acme-self-hosted-gitlab.com' # Optional. Use it only if your repository is NOT hosted on https://gitlab.com
      stage: test # You can choose the stage at which you want the component to run

stages: [test]
  1. Configure the component:
    • gitlab_token: This should be the variable containing the GitLab token you generated in step #3.
    • sifflet_instance_url: This should be the full URL of your Sifflet environment in the following format: "https://<your_instance_name>.siffletdata.com".
    • sifflet_api_token: This is the token generated in the first step, which you stored as a GitLab secret.
    • impacted_asset_types: Optional, comma-separated list of impacted asset types that you want to appear in the impact report. Supported values: ‘DASHBOARD, DATASET’. If not provided, all asset types will be included in the impact report.
    • impacted_tags: Optional, list of tags that you want to appear in the impact report (only impacted assets that have a tag from the list will appear in the report). If not provided, all tags will be included in the impact report.
    • gitlab_server_url: Optional. Use it only if your repository is NOT hosted on https://gitlab.com.

After performing the above steps, whenever a merge request that modifies SQL files is raised, Sifflet will add a comment containing an impact analysis report.

Sample comment

The MR comment generated by the action will look as follows:

Known limitations

  • The impact is currently generated at the asset level, so the potentially impacted assets may contain assets that don’t use the field(s) you’re modifying. We plan to transition this feature to field-level lineage in the upcoming months.

  • The impact is currently generated based on model file changes, so changes to macros won’t be detected by the feature. We plan to transition to Manifest-based impact generation in the upcoming months.