Overview

Sifflet comes with a large number of built-in integrations spanning your entire data pipelines’ stack, from ingestion to consumption. Built-in integrations automatically collect metadata and lineage information and make it available in the Data Catalog.

In some cases though, Sifflet cannot retrieve assets metadata and lineage information, for instance because the built-in integration doesn't exist yet, because Sifflet cannot connect to your closed environment or doesn't have enough access to retrieve and compute the lineage, etc.

For these cases, you have the possibility to programmatically declare data pipeline assets and lineage to ensure an end-to-end observability experience. You can for instance:

Reflect data sources such as CRMs (e.g. Salesforce, Pipedrive, etc.), ERPs (e.g. SAP, etc.), marketing automation tools (e.g. HubSpot, Marketo, etc.), etc.
Create assets from your preferred BI tools (e.g. Metabase, MicroStrategy, etc.)
Surface key API calls and custom scripts
Display machine learning models
Catalog and reflect orchestrators on lineage
Show custom data applications
Link assets Sifflet cannot compute lineage for

Getting Started

To programatically declare assets and their lineage, you need to leverage the declared assets & lineage framework. This framework leverages two API endpoints that allow you to declare three main object types.

Assets

Assets are the smallest entity you can currently declare. They correspond to data-related components such as a dashboard, a table, a machine learning model, etc. Declared assets will show up like regular assets on your Data catalog.

Sources

Sources are a logic way to group your assets. You can for instance have a source corresponding to your staging assets and another one corresponding to your production assets. Declared sources will show up alongside Sifflet sources. Declaring sources is optional as it is done automatically from URI.

Workspaces

Workspaces are the highest entity you can declare, they contain declared sources and assets. Workspaces are isolated from each other. Workspaces allow you to manage a collection of declared sources and assets without interfering with collections owned by different teams that would exist in other workspaces.

Prerequisites

See API prerequisites

Manage Declarative Assets & Lineage

Declare Assets & Lineage

To declare assets & lineage, you need to leverage the POST /v1/assets/sync endpoint.

You can declare:

Assets only, using the assets array of objects
Lineage links only, using the lineages array of objects
Both assets and lineage links
- Using the assets array of objects that contains a lineages object
- Using the assets array of objects and the lineages array of objects
- Using a mix of the two above options

Sifflet is equipped with a set of technology icons, which allows you to have your declarative assets show up with the proper technology icon in the Sifflet application if you use the appropriate technology name when defining your assets' URIs. The list of expected technology names for URIs is available here.

Declare Sources

Declaring sources via the sources array of objects is optional. Declaring a source is useful if you want to attach it specific metadata (e.g. a name, a description, etc.). If no source is declared, Sifflet automatically adds declared assets to sources using declared assets URIs.

Modify Declarative Assets & Lineage

Once assets, lineage links, and sources are created in a workspace, they become read-only on the User Interface (UI). Modifications or deletions are exclusively through the declarative asset & lineage framework. This will evolve in the future to allow you to manage your declarative assets & lineage both programmatically and from the UI.

To modify declarative assets and sources, just update your JSON payload and push the new version to the POST /v1/assets/sync endpoint.

Deleted Declarative Assets & Lineage

To delete declarative assets & lineage, you need to leverage the DELETE /v1/assets/{name} endpoint.