Best Practices for Data Ownership
👍 The three types of ownership that work well
It works well to break down ownership of the most important date into three distinct roles:
- Delivery owner — Ensures that this particular data gets delivered on time, on a promised delivery SLA. Usually, this is a data engineer or analytics engineer responsible for developing and maintaining the pipeline that produces this data.
- Domain owner — What does this particular value in a field (or column) mean? When does this particular event get triggered? Usually, this is a product engineer who created the event or the analysts and data scientists who use this data most frequently and understand the physical reality that the data represents.
- Policy owner — Ensures that this data gets used according to the classification and terms associated with it. Sometimes you acquire data from a source that should not be used for a certain category of use-cases. For example, YouTube is permitted to show ads to kids, but not permitted to personalize them. Therefore, the personalization data can’t be used if the subject is a child. The person making these calls is usually not an engineer or data scientist, but someone on the Policy or Privacy team at the company.
👎 Types of ownership that often don’t work as well
- Overall Quality owner: In practice it’s often hard to find people able and willing to take ownership of the end-to-end quality of a data set. This is because data engineers don’t consider themselves owners of the data that’s produced from the upstream application and don’t want to be responsible for hunting down a website bug that impacts the data in the warehouse. Product engineers, on the other hand, don’t have enough context about how data gets joined and transformed downstream to own the final derived data artifact. This may change as decentralized data management (or data mesh) is deployed more broadly.
- Shared ownership: This often does not work in practice, though it can work well if the owner group shares a good understanding of the different types of ownership and the group can efficiently redirect each question to the right person.
🏥 The data engineer’s role
In practice, the data engineer plays the role that a triage nurse would play in an Emergency Room. When an issue arises (like a patient arriving in the ER), the data engineer triages to see what’s going on. Sometimes it’s a problem the engineer can fix), so they fix it and resolve the issue (like the nurse treating an injury on the spot). In other cases, the engineer redirect the issue to the appropriate owner (as in a referral to a doctor or other health practitioner). In short, the issue determines who the owner should be.
Using Stemma
Use Stemma to assign and document ownership. Divide ownership as follows:
Stemma makes it easy to assign table ownership broadly: see Assigning owners in bulk