Docly

Hive Integration

Estimated reading: 2 minutes

What we need from you

Stemma needs certain information and credentials to extract Hive metadata into the catalog. Contact Stemma by email or Slack, and provide the following:

  • Metastore Host: Hostname is the IP address of the Hive server to which you are connecting.
  • Username and Password: You will need to provide a username and password for Stemma to use to access the Hive schema.
  • Metastore Port: Server port used for accessing metadata about hive tables and partitions. The default Hive metastore port is 9083.
  • List of Databases: Stemma whitelists databases, and so you will need to provide a list of the databases we will be importing.

Metadata extracted

The metadata Stemma extracts includes:

  • TBLS – stores basic information about Hive tables, views, and index tables.
  • DBS – stores the basic information of all databases in Hive.
  • PARTITION_KEYS – the field information of the table storage partition.
  • TABLE_PARAMS – stores the attribute information of the table/view.
  • SDS – saves the basic information of file storage, such as INPUT_FORMAT, OUTPUT_FORMAT, whether it is compressed or not.
  • COLUMNS_V2 – stores the field information corresponding to the table.
  • PARTITIONS – stores the basic information of table partitions.