Docly

dbt integration

Estimated reading: 3 minutes

Stemma integrates with your dbt Cloud or Core projects and is able to ingest and keep up to date all of the metadata in the table below.

 dbtStemma
TablesTables
Table lineageTable lineage
ColumnsColumns
Table definitionsProgrammatic descriptions (dbt descriptions section)
Column definitionsColumn definitions

dbt Cloud

Integrating with dbt Cloud requires minimal information and access to the dbt Cloud account of your organization and an administrative access to Stemma.

The following 3 properties are needed from dbt Cloud:

  • Account ID
    • Usually visible within the browser URL on the account settings page: https://cloud.getdbt.com/next/settings/accounts/<account-id-number> or within any any dbt Cloud project: https://cloud.getdbt.com/next/deploy/<account-id-number>/projects/<project-id>.
  • Service Token
    • Create a Service Token with the appropriate permissions:
      • For organizations on the Team Plan, the account wide permissions required are Metadata and Read-Only.
      • For organizations on the Entreprise Plan, the necessary permission set is Account Viewer and Job Viewer for all projects that should have metadata ingestion.
  • Host
    • Unless organization specific customization has taken place, the host is likely cloud.getdbt.com

With these credentials in hand, all that’s left is adding the connection information to Stemma:

  1. Navigate to the Admin tab of the Stemma UI and choose Connections and Add New Connection.
  2. In the pop-up window, choose dbt as the Connection type, provide a Connection Name, and fill in the API Token, Host and Account ID collected from dbt Cloud.
  3. When you are finished, click Save Connection

dbt Cloud integration may take up to 24 hours for initial ingestion.



dbt Core

To support ingesting metadata from dbt artifacts for dbt Core workflows, we require you to generate them daily and upload them to an AWS S3 cloud storage bucket provisioned by Stemma.

Stemma’s dbt integration relies on the following pieces of information:

  • The database name (e.g., “snowflake”, “postgresql”)
  • manifest.json and catalog.json (produced via dbt docs generate)
  • Optional: The base url for your dbt github repository (e.g., “https://github.com/{customer}/dbt-{customer}/tree/main
    “). This allows us to includes github links to your dbt models on Stemma’s table details page.

We recommend testing out the integration first by simply sending the two JSON files to your Stemma contact via Slack or email.

For an ongoing integration, we’ll provide you with the following information for the daily delivery of the dbt artifacts.

  • AWS Access Key pair
  • S3 Bucket (e.g, “s3://{customer}-stemma-integrations”)

The S3 prefix path will be “dbt/{date}” where “{date}” is the current days date in UTC in the format “YYYY-MM-DD”. For example, for a file delivery on 04/01/2022 we would expect to see the following files:

  • “s3://customerxyz-stemma-integrations/dbt/2022-04-01/manifest.json”
  • “s3://customerxyz-stemma-integrations/dbt/2022-04-01/catalog.json”