URI sources
Concept
URI stands for Uniform Resource Identifier. Using this concept in Sifflet aims to provide a way to identify any object inside and outside the application with a universal id.
You can read more about the generic concept of URI starting on the dedicated Wikipedia page: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
URI in Sifflet
OpenLineage designed a standard URI model to apply to datasets and jobs. In Sifflet we choose to adapt and extend this standard to our technology requirement.
Every asset present in Sifflet catalog have the possibility to be identified through an URI that is taking the following format:
URI = scheme ":" ["//" authority] uniqueName
- Scheme is used to define the identifier assignment specification. In our data asset domain, it will be dependent to the technology of the asset we want to define: for example
bigquery
for BigQuery,mysql
for MySQL,oracle
for Oracle, etc… - Authority is used to define the instance where we are locating our asset. Most commonly it would be composed of an association of the host and the port of a service, but for some technologies it can simply be a workspace identifier, an account identifier. In the case of a fully decentralized service it can be fully omitted. A few examples to illustrate:
- An Oracle asset authority is defined by the address of the
host
of the Oracle server and its attachedport
. - A PowerBI asset authority is defined using the id of the workspace where the asset is located.
- BigQuery assets doesn’t have authority: the BigQuery service is fully decentralized.
- An Oracle asset authority is defined by the address of the
- Unique Name is used to define the path to the asset inside an instance. In some cases it will be directly the identifier or name of the asset and in some other cases it will follow a more complex namespace hierarchy in usage in the system. Here are some examples:
- A PostgreSQL asset unique name is defined using the hierarchy
database
→schema
→table
. It will be writtendatabaseName.schemaName.tableName
. - A Looker dashboard unique name is defined using the id of the dashboard inside the Looker instance it belongs to. There is no hierarchy to take into account given a dashboard id is unique inside a given instance.
- A BigQuery asset unique name is defined using the hierarchy
project
→dataset
→table
. It will be writtenprojectName.datasetName.tableName
.
- A PostgreSQL asset unique name is defined using the hierarchy
For full definition of URI per technology or about how to craft a generic specification URI, you can access the dedicated pages of the documentation.
Finding assets' URI
You can get the URI of existing assets from their asset page .
- Go to the details page of an asset
- Click the three dot menu located at the top right end corner of the page
- Click Copy Data Asset URI to copy the asset URI to your clipboard
Updated 6 months ago