Docly

Using Data Lineage Graphs

Estimated reading: 3 minutes

Use Data Stemma’s Lineage Graphs to:

Some examples

  • An analytics engineer builds a table using Dbt; that table becomes popular and several dashboards are built on top of it. It becomes well trusted, and the logic becomes a standard for calculating certain metrics. But the table is missing several important entity IDs (foreign key columns) which makes it difficult to join to other tables. The analytics engineer and a data engineer work together to complete a “micro migration” to a new, more performant and reusable table.
  • A machine-learning model is used to predict churn for a customer. Each day after the model runs, users that are predicted to churn are identified in a table in Snowflake and then sent an email to re-engage them. The product team is launching the second major product and wants to measure churn on a per-product basis. The engineering team updates their data pipeline to add a new column to the output table that identifies the product that churn is being predicted for.
  • An analyst created several views to consolidate the common logic that the team were using to query tables in their dashboards. But the performance of the views has deteriorated as the amount of data has grown. The analyst and the engineer work together to create materialized versions and to update all of the dashboards using the prior views.

Use Data Lineage in Stemma to Identify Affected Assets

The challenge with such changes is that it is hard to accurately curate a list of all assets that will be affected. If you cannot be certain of what will need to be changed or what could break, you cannot do risk analysis or even make a time estimate. Stemma enables you to do the analysis and create the appropriate messaging.

The key question for this type of change is “What tables and dashboards will I need to update if I rename this view?”

Stemma provides a lineage graph for each table and dashboard. From here you can see who and what is utilizing your resources downstream, and how popular your assets are. Data owners can use this in conjunction with their own knowledge to build roadmaps and plan how data should evolve.

Use Stemma Messaging to Communicate Expected Changes

Everyone affected needs to know what is going to change and when. After building your list of affected assets and their users, use Stemma messaging to provide advance notice and instructions.

How Stemma Discovers Data Lineage

When you execute SQL against your warehouse, the database captures the raw SQL and saves it in a historical format that includes the time the SQL command was executed and the user who executed it. Stemma reads these SQL logs and parses all of the commands that have been run, extracting several distinct pieces of information which appear in the Stemma UI on the Table Details page.

In the case of table and column lineage, any change to a table– via an insert, update or delete from another table– will create the appropriate table and column lineage tracking.

Note

When lineage is created via insert into table (select * from ...) statements, Stemma does not capture column level lineage.

See also: How Stemma Captures Data: SQL Parsing Overview and Caveats