Data governance is a combination of physical systems, data models and business processes. Learn the core concepts by exploring available libraries and tools.
This article is the first in the wider series about Data Governance and Metadata. In them, I write about what I’ve learned on data platforms, how I think of it, and how I use that knowledge. I plan to release other posts in the future.
The majority of freely available content about the data governance is vendor specific. Understandably, Informatica, Collibra, Alation, and other vendors, seek to create more demand and praise their features.
As a result, IMHO, many metadata management and data governance aspects are exaggerated and made more complex that it should be.
Secondly, research and advisory companies, like Gartner or McKinsey, publish many data governance articles too.
The issue there - too high level and disconnect from the technology capabilities.
In these series, I want to focus primarily on explaining the concepts by using a specific library or a tool.
Topics like privacy, governance, or security are very formal and get boring fast. I hope you don’t mind a funny picture or meme.
I use the below Venn diagram as a starting point. I used Willem Koenders’ LinkedIn post as an inspiration.
The physical systems cover data engineering and technical aspects.
The data models describe conceptual and data modelling techniques to bring the most value to the business.
Last but not least, the business processes & compliance focus on an organization, its processes, privacy and industry regulations.
Here’s my video about the topic recorded at Big Data Conference 2020
Other projects