Generic URI format

You can push a lot of technologies inside Sifflet using a generic uri format. If the technology you are using does not figure in the other sections, you can use the generic format of this section to identify your assets.

Identifier fragments:

  • Namespace: {technology name}://{authority} OR {technology name}:
    • Scheme = {technology name}
    • Authority = {authority}
  • Unique name: {namespace1}.{namespace2}.{…}.{namespaceN}.{asset identifier}

URI format:

  • {technology name}://{authority}/{namespace1}.{namespace2}.{…}.{namespaceN}.{asset identifier}
  • OR {technology name}:{namespace1}.{…}.{namespaceN}.{asset identifier}
  • OR {technology name}://{authority}/{asset identifier}

Parameters limitations:

  • technology name must start with a lowercase letter or a number, and include only lowercase letters, uppercase letters, numbers, dot, plus operator and dash.(Regex: [a-zA-Z0-9][a-zA-Z0-9.+-]*)
  • authority must start with a lowercase letter and include only lowercase letters, numbers, dash, colon and dot (regex: [a-z][a-z0-9-.:]+)
  • namespace elements and asset identifier can have two format, following: quoted format and unquoted format
    • unquoted identifier must include only lowercases and uppercases letters, numbers, underscore, dash, dollar and characters included in the unicode range U+0080 to U+FFFF
    • quoted identifier needs to be quoted using double-quotes and must include only characters included in the unicode range U+0001 to U+FFFF. If a double quote character is present in the name, it needs to be escaped with another double quote character in front of it.

Crafting fragments of URI in practice:

1/ Scheme (”technology name” identifier)

  • For technology name ideally you should use lowercase letters only:
    • MuleSoft → mulesoft
    • Amazon → amazon
    • Salesforce → salesforce
    • MongoDB → mongodb
  • If there is multiple words you should separate them with a dash:
    • Amazon DynamoDB → amazon-dynamodb
    • Google Ad Manager → google-ad-manager
  • Use numbers when required:
    • S3 → s3
    • 360Learning → 360learning

2/ Authority

  • It is made to identify your repository location.
  • In OpenLineage definition it very often uses the host address of the server has an authority, associated with the port of the service, given multiple services of the same type can run on the same host.
  • It can also be an account identifier or a workspace identifier depending of the service you are using.
  • You can also identify a local instance using a fixed IPv4 address or a material IPv6 address
  • Examples:
    • Host only: sifflet-dev-cloud.cfbungc.eu-west-1.rds.com
    • Host + port: sifflet-dev-cloud.cfbungc.eu-west-1.rds.com:5432
    • Account id with UUID format for a cloud service: 5e7570e3-43b6-8011-1418bd77d45e
    • Local address: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
  • Authority can be skipped to represent the fact there is no required location for the service. For example BigQuery URI format doesn’t use an authority as the service is seemingly server-less.

3/ Unique name

  • asset identifier parameter is meant to identify the asset in a unique way inside the namespace where it is provided. A unique id or name for the asset is ideal for that.
  • The rest of the unique name {namespace1}.{…}.{namespaceN} is meant to represent the different level of namespaces you can find in most technologies: for example in a lot of relational databases, inside the same instance you will have a database when you can define schema with unique names and inside schemas you can define tables or view with unique names. In that case the full unique name would be written: databaseName.schemaName.tableName.

Example:

  • hbase://sifflet-dev-hbase.eg56.eu-west:8080/platform-namespace.test_data
    • Scheme: hbase for Apache HBase
    • Authority: sifflet-dev-hbase.eg56.eu-west:8080 following host:port format
    • Unique Name: platform-namespace.test_data
      • platform-namespace is in place of {namespace1}.{…}.{namespaceN}. There is only one hierarchical level here. In some other case, I could define one, two, three or more if required, depending on my hierarchy details.
      • test_data is the name of the table identified by this whole URI. It is defined inside the namespace platform-namespace in the server located at sifflet-dev-hbase.eg56.eu-west:8080