Docly

Athena Integration

Estimated reading: 2 minutes

Stemma’s Athena integration supports: extracting metadata information (tables, columns, etc)

Proceed was follows to create an Athena connection to Stemma.

Create a user

To integrate with Stemma we recommend creating a new user in IAM that has access specifically scoped to the actions and resources required. The following script enables the required access:

Terraform

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
    }
  }
}

provider "aws" {
  # You are only creating IAM which are global, you may
  # change the region but it should not impact this script
  region = "us-east-1"
}

resource "aws_iam_user" "stemma_read_user" {
  name = "stemma-read-user"
}

resource "aws_iam_user_policy" "stemma_read_policy" {
  name        = "stemma-read-policy"
  user = aws_iam_user.stemma_read_user.name

  policy      = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "athena:GetDatabase",
          "athena:ListDataCatalogs",
          "athena:GetDataCatalog",
          "athena:ListDatabases",
          "athena:ListTableMetadata",
          "glue:GetDatabases",
          "glue:GetTables"
        ]
        Effect   = "Allow"
        Resource = "*"
      },
    ]
  })
}

Generate credentials

After creating the user, generate an AWS Access Key and Secret for the user.

Provide credentials to Stemma

Now provide this information to Stemma:

  1. Navigate to the Admin tab of the Stemma UI and choose Connections and Add New Connection:
  1. In the pop-up window, choose AWS as the Connection type, provide a Connection Name, choose Access Key as the Authentication type, provide the AWS Access Key and AWS Access Key Secret you have just created, and specify the AWS Region:
  2. When you are finished, click Save Connection.

Testing the access

Optionally, you can validate the access to the catalog, databases and table that Stemma will have with the following code snippet; make sure to replace ACCESS_KEY and ACCESS_KEY_SECRET with the values for the Stemma Athena user:

import json
import boto3
from botocore.config import Config


client = boto3.client(
    "athena",
    aws_access_key_id="{ACCESS_KEY}",
    aws_secret_access_key="{ACCESS_KEY_SECRET}",
    config=Config(region_name="us-east-1"),
)
all_tables = []
for catalog in [cat['CatalogName'] for cat in client.list_data_catalogs()['DataCatalogsSummary']]:
    for database in [database["Name"] for database in client.list_databases(CatalogName=catalog)['DatabaseList']]:
        for table_name in [tbl['Name'] for tbl in client.list_table_metadata(CatalogName=catalog, DatabaseName=database)['TableMetadataList']]:
            all_tables.append(f'{catalog}.{database}.{table_name}')
            
print('All tables available:')
print(json.dumps(all_tables, indent=2))