DBT Integration

Stemma ingests DBT catalog and manifest data, automatically importing data such as lineage and table and column descriptions:

DBT Stemma
Tables Tables
Table lineage Table lineage
Columns Columns
Table definitions Programmatic descriptions (DBT descriptions section)
Column definitions Column definitions

To support ingesting these dbt artifacts, we require you to generate them daily and upload them to an AWS S3 cloud storage bucket provisioned by Stemma.

Stemma's DBT integration relies on the following pieces of information:

  • The database name (e.g., "snowflake", "postgresql")
  • manifest.json and catalog.json (produced via dbt docs generate)
  • Optional: The base url for your dbt github repository (e.g., "https://github.com/{customer}/dbt-{customer}/tree/main
    "). This allows us to includes github links to your dbt models on Stemma's table details page.

We recommend testing out the integration first by simply sending the two json files to your Stemma contact via Slack or email.

For an ongoing integration, we'll provide you with the following information for the daily delivery of the dbt artifacts.

  • AWS Access Key pair
  • S3 Bucket (e.g, "s3://{customer}-stemma-integrations")

The S3 prefix path will be "dbt/{date}" where "{date}" is the current days date in UTC in the format "YYYY-MM-DD". For example, for a file delivery on 04/01/2022 we would expect to see the following files:

  • "s3://customerxyz-stemma-integrations/dbt/2022-04-01/manifest.json"
  • "s3://customerxyz-stemma-integrations/dbt/2022-04-01/catalog.json"

Did this page help you?