As your organisation grows and the need for data observability intensifies, making sure your team members have access to Sifflet with the appropriate permissions can be both critical and challenging.
With the new sifflet_user resource in our Terraform provider, you can now automate and manage user access seamlessly. This feature eliminates manual user management, ensuring consistent role assignments, improved security, and faster onboarding/offboarding—all through Infrastructure-as-Code.
You can now assign users on your team to two new domain roles:
Monitor Responder : Allows users to search through and view domain assets and monitors. It also includes the ability to view and download failing rows for the domain monitors as well as incident-related permissions. This ensures users are able to troubleshoot and respond to data quality issues without necessarily being allowed to edit domain monitors.
Catalog Editor: Allows users to view and edit assets as well as preview data of the domain assets. It also includes the ability to view domain monitors. This ensures individuals are able to manage the documentation of their domain assets without necessarily being able to edit these domains monitors.
These two new roles complement the existing Domain Viewer and Domain Editor domain roles and bring enhanced flexibility and control, allowing you to better manage permissions and responsibilities within Sifflet.
Some Changes to qualifications recently, as we prepare for some big improvements to qualification propagation and batch qualifications.
Expected & False Positive have been merged into False Positive / Expected: This cleans up the slight misunderstanding around Expected where it was acting the same as False Positive but was not always used correctly. False positive / Expected should be used for any point that should be considered as a "Good point" and not alerted on in the future.
Known Error has been changed to No action needed / Known Error and the icon has been changed. This should be used to flag Known issues that require no specific action regarding resolution/investigation. These points will be ignored by the model when using Dynamic Thresholds to ensure the model does not train on anomalous values.
Reviewed has been added. This fully neutral qualification which does not impact the anomaly model should be used for a default qualification when an anomaly has been reviewed by a user who has either dealt with it or investigated it but you don't want to impact the model in any way. This is the safest qualification and feel free to use it when you like!
Quick recap of the final two who have not changed:
Fixed This will mark the point as Fixed in the interface, this qualification can be done automatically by sifflet when a point previously in anomaly is rechecked by the lookback period. As a reminder, the lookback period can be setup in incremental monitors to recheck some previous points, such as rechecking the last 7 days in pipelines where recent data can still be "patched" or "updated" a few days later!
False Negative This is the only qualification destined for non anomalous datapoints, if the model did not detect an anomaly when it should have, False Negative will guide the model to try to detect an anomaly for similar values.
What can we expect?
Reviewed and No action needed are intended target qualifications for batch qualification processes. Sifflet is planning to shortly add propagation of Incident qualifications and qualifications of monitors themselves. For example, when closing an incident you will have the ability to automatically classify all the anomalous datapoints related to the anomaly as Reviewed or No Action Needed to ensure any user looking at the monitor will know which anomalies have been dealt with !
Some big changes happening when it comes to setting up monitors! We've merged a lot of monitors to make them even easier to set up!
Metrics - Replaces All static metric monitors and combines them with the Metrics ( Dynamic ) monitor : Sum ( Static Metrics ) , Average (Static Metrics, ..., Metrics ( Dynamic)
Nulls - Replaces all Null monitors which had different versions for percentages, and for static/dynamic Monitors
Duplicates - Same as Null, Replaces the 4 different monitors to setup duplicate checks!
Improved Threshold Settings for nearly every Monitor
We've standardised Threshold methodologies across most sifflet monitors adding Threshold capabilities to monitors that did not have any! You can now configure format monitors to only alert if there are more than X infringements! Or have relative thresholds for nearly every monitor !
Standardised Time settings !
Static Monitors and Dynamic Monitors in the past had different Time settings, with Static monitors being a bit limited when it comes to incremental checks ! Now every monitor gets a streamlined Time Window Settings, with Rolling aggregations, offsets, first run configurations !
Stay Tuned for a revamped experience for Time Settings this year as well !
Data Quality as code - Version 2
To follow all these improvements we've now updated our Data Quality as Code spec to Version 2. Version 1 specs will still work for now but we heavily recommend switching to Version 2.
We're excited to announce the release of our first CI workflow for both GitHub and GitLab: an impact analysis workflow that uses Sifflet's lineage API to generate the potential impact of your dbt code changes and posts it as a pull request comment. With this workflow, you can detect and resolve potential issues before code changes are merged.
The potential impact consists of the list of assets downstream of the dbt model(s) you’re modifying. These assets can be other dbt models, BI dashboards, or any other asset cataloged in Sifflet.
The workflow comes packed with two filtering capabilities, using either asset types or tags to minimize noise and focus on assets that matter.
To activate this new feature, refer to the dedicated documentation for GitHub and GitLab.
Throughout the next months, we'll continue to release Sifflet-powered CI workflows that bring all of Sifflet's capabilities to your software development platform.
Manage Sifflet Credentials and Sources as Code With Sifflet Official Terraform Provider
Ensuring your data observability adapts to changes in your data environment can be challenging. With Sifflet official Terraform provider, you can now manage Sifflet credentials and sources as code, making it easy to automate their creation and maintenance. Leveraging Terraform resources also improves governance and security by simplifying versioning and auditing of any changes performed on credentials and sources.
Secure and Private Data Observability With Private Link Support for AWS and Azure
We are excited to announce that Sifflet now supports Private Link for secure, private connectivity between Sifflet and your AWS and Azure sources. With Private Link, Sifflet can access sources located within your private network while ensuring that data traffic stays completely off the public internet. This feature allows your teams to benefit from the value of data observability regardless of your organisation's security and compliance requirements.
Extended Airflow Support to Amazon MWAA and GCP Cloud Composer
Sifflet can now integrate with managed Apache Airflow environments on Amazon MWAA and GCP Cloud Composer. For setup details, you can check out the dedicated documentation:
Since Sifflet now inserts a JSON comment at the top of its queries, we excluded all Sifflet queries from usage stats. This ensures that usage information is more accurate and only represents your own usage.
Simplify Incident Management with Jira Status Changes Sync
You can now optionally enable status changes sync from Jira to Sifflet. This will ensure any Sifflet incident gets automatically moved to In progress or Closed • Fixed when its associated Jira issue gets transitioned to an In-progress or Done status. Enabling status changes sync allows you to benefit from Sifflet incident management troubleshooting and collaboration capabilities while ensuring a smooth incident management process thanks to automated closing of your Sifflet incidents.
Improve Data Trust With Sifflet Insights Browser Extension Support for Power BI Reports
Ensuring teams have access to dashboard health status while browsing their BI tools is key to ensure trust in your data. Sifflet is going one step further in improving data self-service reliability by now supporting Sifflet Insights browser extension on Power BI reports.
Improved Model for Full Data Scan Monitors: When monitors query entire tables, the data often does not change much or have much seasonality, we've improved our models to be less noisy in these cases.
Improvements to Monitor Pages: More coming but you'll have noticed we've started to improve the UX of the monitor page ! Enjoy!
Improved Jira Integration: We fixed a bug that was preventing the manual creation of Jira issues in case of large number of projects and issue types on the Jira account.
Improved Support: We've removed the ability to disable the access of Sifflet support teams to SaaS accounts to ensure optimised support experience.
User creation: We fixed a bug that was preventing the creation of users in some specific cases.
Users with the Admin system role can now control whether or not users should to be able to leverage Sifflet AI Assistant to generate metadata on Data Catalog assets. This makes it simpler to ensure your Sifflet account is in compliance with your organisation's requirements.
SQL Table Tracer (STT): Sifflet's New Adaptive SQL Parser
With this release, we're introducing SQL Table Tracer (STT), our in-house-built adaptive SQL parser. STT, a generic ANTLR-based parser, will extract lineage for a wide range of integrations, starting with table-level lineage for Amazon Redshift and Azure Synapse next in line.
Stay tuned for an upcoming blog post where we'll go over our journey building STT and the significant improvements it brings with it, both in terms of efficiency and query parsing accuracy, over our legacy parser.
Improvements
Improved Monitor Run Schedule and Timezone Selection
Monitors will now default to having a schedule suitable for the monitor. The default will be daily or hourly depending on the type of monitor and the interface will apply the simpler "Standard Options" format.
We've also added a Timezone selector to both Cron Expressions and Standard options to allow worldwide businesses to easily schedule monitors to meet their business hours !
Improved Support of Concurrent dbt Runs
Previously, if the artifacts of multiple dbt runs (within the same project) were submitted in a short timeframe, Sifflet could occasionally miss processing some files. With this release, we've enhanced our support for concurrent dbt runs to ensure all dbt artifacts are correctly processed, even when received simultaneously.
Improved Credentials Security
Once created, your credentials can still be edited but they are no longer accessible from the UI, improving the security in terms of sources' access.
Quickly Access Power BI Reports With the “View In” Button
You can now swiftly access your Power BI reports from Sifflet corresponding asset pages thanks to the addition of the View in button on Power BI reports' asset pages.
Like in the catalog assets you are now able to add rich text descriptions to Monitors
Asset URIs from the Catalog Search page
Asset URIs are a new way of identifying assets which can be used in a variety of ways, whether that is in data quality as code or through APIs. You can now easily retrieve the URIs from the monitor search Page.
Track Data Quality Issues in Jira With the New Integration
You can now manually create Jira issues from Sifflet incidents using the new built-in Jira integration. Generating Jira issues ensures data quality problems detected by Sifflet fit into your existing incident and task management workflow and that no important issue gets accidentally overlooked.
If you're looking for a more automated workflow, note that you can still have Jira issues automatically created in case of Sifflet monitor failure.
Lineage: Extend Power BI native queries support to Snowflake connections
Sifflet now generates lineage between Power BI and Snowflake even in scenarios where you use native queries to interact with Snowflake data sources from Power BI.
🛠 Fixes
Row-level Duplicates monitor - Include date type columns for Databricks and Hive
Improve Your Data Documentation With Description Formatting
Take your data documentation to the next level by adding extensive formatting to your data assets' and business terms' descriptions. Data product owners and other members of the data team can now add all the relevant information to assets' description and structure it in a way that makes it easy to understand by other catalog users.
Markdown-formatted descriptions collected from providers such as dbt are now also properly interpreted, making it simpler for catalog users to leverage those for data self-service.
We're currently in the process of improving the Monitor Creation Experience. Today's release impacts Threshold settings and makes them simpler and more intuitive to use!