Athena Integration
Stemma’s Athena integration supports: extracting metadata information (tables, columns, etc)
Proceed was follows to create an Athena connection to Stemma.
Create a user
To integrate with Stemma we recommend creating a new user in IAM that has access specifically scoped to the actions and resources required. The following script enables the required access:
Terraform
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
}
provider "aws" {
# You are only creating IAM which are global, you may
# change the region but it should not impact this script
region = "us-east-1"
}
resource "aws_iam_user" "stemma_read_user" {
name = "stemma-read-user"
}
resource "aws_iam_user_policy" "stemma_read_policy" {
name = "stemma-read-policy"
user = aws_iam_user.stemma_read_user.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"athena:GetDatabase",
"athena:ListDataCatalogs",
"athena:GetDataCatalog",
"athena:ListDatabases",
"athena:ListTableMetadata",
"glue:GetDatabases",
"glue:GetTables"
]
Effect = "Allow"
Resource = "*"
},
]
})
}
Generate credentials
After creating the user, generate an AWS Access Key and Secret for the user.
Provide credentials to Stemma

- Navigate to the Admin tab of the Stemma UI and choose Connections and Add New Connection
- In the pop-up window, choose AWS as the Connection type, provide a Connection Name, choose Access Key as the Authentication type, provide the AWS Access Key and AWS Access Key Secret you have just created, and specify the AWS Region:
- When you are finished, click Save Connection.
Testing the access
Optionally, you can validate the access to the catalog, databases and table that Stemma will have with the following code snippet; make sure to replace ACCESS_KEY
and ACCESS_KEY_SECRET
with the values for the Stemma Athena user:
import json
import boto3
from botocore.config import Config
client = boto3.client(
"athena",
aws_access_key_id="{ACCESS_KEY}",
aws_secret_access_key="{ACCESS_KEY_SECRET}",
config=Config(region_name="us-east-1"),
)
all_tables = []
for catalog in [cat['CatalogName'] for cat in client.list_data_catalogs()['DataCatalogsSummary']]:
for database in [database["Name"] for database in client.list_databases(CatalogName=catalog)['DatabaseList']]:
for table_name in [tbl['Name'] for tbl in client.list_table_metadata(CatalogName=catalog, DatabaseName=database)['TableMetadataList']]:
all_tables.append(f'{catalog}.{database}.{table_name}')
print('All tables available:')
print(json.dumps(all_tables, indent=2))