Apache Airflow ᴮᴱᵀᴬ

Overview

Connect Sifflet to Apache Airflow to bring your data orchestration metadata into Sifflet's unified data observability platform. This allows you to monitor pipeline execution alongside data quality and lineage.

Key Benefits:

  • Monitor DAG Status: Once integrated, Sifflet displays your Airflow DAGs in the catalog and lineage. You can quickly check the status (success/failure) of the latest run for each DAG directly throughout Sifflet (in the catalog, the asset page, and the lineage).

    Airflow DAG status in the Sifflet catalog.

    Airflow DAG status in the Sifflet catalog.

  • End-to-End View: Understand how your Airflow tasks impact data downstream in warehouses and BI tools through integrated lineage.

  • Centralized Overview: View critical Airflow metadata alongside other data assets without switching tools.

📘

Airflow custom operators

This page focuses on bringing Airflow metadata into Sifflet. To trigger Sifflet actions from Airflow, please refer to our custom Airflow operators documentation here.

This page covers integrating Sifflet with a self-hosted Airflow instance. If, on the other hand, you're using a cloud-managed variation, you can refer to its separate page:

To integrate Airflow with Sifflet, these are the steps to follow:

  1. Create a dedicated read-only user
  2. Enable basic authentication
  3. Connect to Sifflet

📘

Supported Airflow versions

We currently support any self-hosted Airflow instance (version 2.0.0+) in addition to cloud-managed variations (Amazon MWAA on AWS and Cloud Composer on GCP).

Integration guide

1. Create a read-only user

You can create a dedicated Sifflet user with a "Viewer" role.
Please choose a "User Name" (for instance, "sifflet_user") and a secure password. Store them carefully as you will need them when configuring the connection in Sifflet later.

1320

Sample configuration for a Sifflet user in Airflow

2. Enable basic authentication

By default, basic authentication is disabled in Airflow. Enabling it gives Sifflet access to your Airflow instance via API.

To check which authentication backends are currently set, you can use the command below:

$ airflow config get-value api auth_backends
airflow.api.auth.backend.basic_auth

To enable basic authentication, set the following in the Airflow configuration:

[api]  
auth_backends = airflow.api.auth.backend.basic_auth

3. Connect to Sifflet

Add an Airflow secret

To create the Airflow secret, follow the below steps:

  • In "Integration" --> tab "Secrets", create a new secret.
  • In the "Secret" area, copy-paste the below text and replace it with the correct username and password previously created in step 1:
{
  "user": "<username>",
  "password": "<password>"
}

Create a new Airflow integration

To connect Airflow with Sifflet, you will need three items:

  • Connection details:
    • Host: You can add the entire URL. For instance, if your URL is http://xxxxx.yy, your Host value would be http://xxxxx.yy.
    • Port: The port used to interact with Airflow's REST API. By default, this is 8080.
  • Secret: corresponds to the username and password you previously chose.
  • Frequency: determines how often the information is refreshed.
1832

The different details that you need to provide when configuring the integration

You can also refer to this page on adding a data source in Sifflet.