Apache Airflow ᴮᴱᵀᴬ
This page covers Sifflet's Airflow integration, which allows you to see Airflow metadata in Sifflet. To trigger Sifflet operations from within your Airflow DAGs, you can find all of Sifflet's custom operators and their purposes here.
By integrating Sifflet with Airflow, you can leverage our full data stack approach to obtain a complete view of your data pipelines, from your orchestrator to the data warehouse and your BI tool.
Once the integration is configured, you will have at a glance:
- Various metadata about your DAGs, such as "Last/Next Execution Date" and "Last Updated Date".
- The latest status of your DAG runs, allowing you to detect failures as soon as they happen.
This page covers integrating Sifflet with a self-hosted Airflow instance. If, on the other hand, you're using a cloud-managed variation, you can refer to its separate page:
To integrate Airflow with Sifflet, these are the steps to follow:
- Create a dedicated read-only user
- Connect to Sifflet
Supported Airflow versions
We currently support any self-hosted Airflow instance (version 2.0.0+) in addition to cloud-managed variations (Amazon MWAA on AWS and Cloud Composer on GCP).
1. Create a read-only user
You can create a dedicated Sifflet user with a "Viewer" role.
Please choose a "User Name" (for instance, "sifflet_user") and a secure password. Store them carefully as you will need them when configuring the connection in Sifflet later.
2. Connect to Sifflet
Add an Airflow secret
To create the Airflow secret, follow the below steps:
- In "Integration" --> tab "Secrets", create a new secret.
- In the "Secret" area, copy-paste the below text and replace it with the correct username and password previously created in step 1:
{
"user": "<username>",
"password": "<password>"
}
Create a new Airflow integration
To connect Airflow with Sifflet, you will need three items:
- Connection details:
- Host: You can add the entire URL. For instance, if your URL is
http://xxxxx.yy
, your Host value would behttp://xxxxx.yy
. - Port: The port used to interact with Airflow's REST API. By default, this is 8080.
- Host: You can add the entire URL. For instance, if your URL is
- Secret: corresponds to the username and password you previously chose.
- Frequency: determines how often the information is refreshed.
You can also refer to this page on adding a data source in Sifflet.
Updated 3 months ago