Declarative lineage

By integrating Sifflet with your Data stack, you automatically get a global vision of your data lineage, from ingestion to consumption.
In some cases, Sifflet cannot be added to your environment or doesn't have enough access to retrieve and compute the lineage. For these cases, you have the possibility to add programmatically the links and nodes in your lineage.

One common case is to get visibility on which Airflow DAG triggers which dbt models. This allows you to:

  • have a detailed view of your data pipeline: orchestrator + transformation + data assets
  • In case of a data anomaly alert, accelerate the investigation time of the root cause issue

Current scope

As of today, you can link in your lineage any orchestrator information to your dbt models. If you need to link other information, do not hesitate to reach out. This page is constantly updated with our feature coverage.

Declarative lineage scopedbt modelOther transformation node
Airflow DAG
Other

How to declare your lineage in Sifflet

Two steps are required programmatically declare your lineage:

  1. Generate an API Token
  2. Add/Delete your nodes and links

1. Generate an API Token

You can find more information on how to generate an API Token here

2. Add/Remove your nodes and links

You can find the API reference here.


POST /lineage/_create
{
    "linkType": "pipeline",
    "automationPlatform": {
        "type": "airflow",
        "metadata": {
            "pipelineName": "orchestrator_object_name"
        }
    },
    "job": {
        "type": "dbt",
        "metadata": {
            "project": project,
            "model": model,
            "target": target
        }
    }
}


POST /lineage/_remove
{
    "linkType": "pipeline",
    "automationPlatform": {
        "type": "airflow",
        "metadata": {
            "pipelineName": "orchestrator_object_name"
        }
    },
    "job": {
        "type": "dbt",
        "metadata": {
            "project": project,
            "model": model,
            "target": target
        }
    }
}

Parameters:

  • linkType: pipeline

    • type: airflow or other (for instance, a Prefect Flow)
    • metadata/pipelineName: Name of your orchestrator node. It will appear on Sifflet's lineage with this name.
  • job

    • type: dbt
    • metadata
      • project: your dbt project
      • target: your dbt target
      • model (optional). The dbt model triggered by your orchestrator.
        If your dbt node is not specified, Sifflet will link the node to all models within the specified dbt project and target

3. Examples

Please find below examples in Python and Bash to declare orchestrators nodes and link them to dbt models.

Python

Please find below an example of how to link your Airflow DAG to your dbt models.

Requirements: pip install requests

import requests
import json

tenant = "tenant_name" # change me
token = "your_token" # change with token created at step 1

url = f"https://{tenant}api.siffletdata.com/api/v1/lineages/_create"

payload = json.dumps({
  "linkType": "pipeline",
  "automationPlatform": {
    "type": "airflow",
    "metadata": {
      "pipelineName": "dag_1" # change me
    }
  },
  "job": {
    "type": "dbt",
    "metadata": {
      "project": "my_dbt_project", # change me
      "model": "my_dbt_model", # change me
      "target": "my_dbt_target" # change me
    }
  }
})
headers = {
  'Authorization': f'Bearer {token}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

if response.status_code != 204:
    print("Error using the API")
    print(response.text)
else:
    print("Successfully pushed lineage")

Bash

Please find below an example of how to link your Prefect Flow to your dbt models.

TENANT="my_tenant" # change to your Sifflet deployment
TOKEN="my_token" # change with token created at step 1

BASE_URL="https://${TENANT}api.siffletdata.com"

curl --location "$BASE_URL/api/v1/lineages/_create" \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/json" \
--data '{
    "linkType": "pipeline",
    "automationPlatform": {
        "type": "other",
        "metadata": {
            "pipelineName": "name_CHANGE_ME"
        }
    },
    "job": {
        "type": "dbt",
        "metadata": {
            "project": "my_dbt_project_CHANGE_ME",
            "model": "my_dbt_model_CHANGE_ME",
            "target": "my_dbt_target_CHANGE_ME"
        }
    }
}'

Sifflet's lineage examples

Example in the case of Airflow with dbt

Example in the case of Airflow with dbt

Example in the case of any other Orchestrator with dbt

Example in the case of any other Orchestrator with dbt