Docly

Redshift Integration

Estimated reading: 3 minutes

What we need from you

Stemma needs certain information and credentials to extract Redshift metadata into the catalog. To provide these, proceed as follows.

  1. Navigate to the Admin tab of the Stemma UI and choose Connections and Add New Connection:
  1. In the pop-up window, choose Redshift as the Connection type and then provide the following information in the dialog box:
    • A username and password for Stemma to use to access the INFORMATION_SCHEMA
    • The hostname of the Amazon Redshift cluster. You can find it in the Properties tab of your Redshift Cluster Details page. Under the Connection details section, copy the Endpoint up to but not including the colon.
    • A port number. The default port number for Redshift is 5439. From your Redshift Cluster Details page, you can find the port number under the Database configurations section of the Properties tab.
    • A list of databases: Stemma whitelists databases, so you will need to provide a list of the databases we will be importing.

Access

The Stemma Redshift integration requires access to two pieces of information within the Redshift install:

  • information_schema metadata
  • Query logs

information_schema

The specific tables and views we access include:

  • PG_GET_LATE_BINDING_VIEW_COLS
  • svv_external_columns
  • COLUMNS

Because of the way permissions work in Redshift, these tables are only populated with data to which the user has SELECT access. If you don’t want to use column stats, and therefore want to restrict access for this credential, Stemma will give you the SQL code required to create a materialized view. You create that materialized view as a superuser, and grant our credential the ability to read and refresh that materialized view.

Query logs

Stemma Redshift requires query logs. We can extract these in two ways: push or pull.

Push

The recommended integration involves configuring Redshift to dump user activity logs into S3.

There are multiple methods to achieve this, either using CloudTrail or directly from Redshift. the exact right way differs between organizations (many already have audit logging configured for other purposes).

Pull

Stemma can extract logs from STL_QUERY. However, because this option requires the credentials be a superuser, it is not the recommended integration method.

Column stats – Optional

The Stemma integration user must have full SELECT access to any tables or views for which you want to see column stats.