HomeLandscapeAbout me

Data Platform Mastermind

By Valdas Maksimavicius
Published in Data Governance
September 05, 2021
2 min read
Data Platform Mastermind

Data Platform Mastermind is a community of data platform builders using Azure and/or Databricks. It consists of:

  • Live sessions where all can contribute and learn from each other
  • Data Platform School Slack channel (recently established)

Past meetups

Data Platform Mastermind #3 - Lakehouse

Here are a few points I noted down for myself during the meeting:

  • Many doing proof of concepts of the data serving technologies
    • Databricks even with caching won’t replace semantical analytical cubes (i.e. Analysis Services, PBI Premium or Dremio)
    • Synapse, Snowflake, Databricks SQL - possible options for data serving
  • Databricks cluster sizing recommendations is still a challenge; not straight forward approach for big teams
  • Databricks labs projects on GitHub
  • dbt growing in popularity and used by more data professionals - My video about dbt and Databricks

Recommended read and further investigation:

Data Platform Mastermind #2 - Data Ops

Here are a few points I noted down for myself during the meeting:

Recommended read and further investigation:

Data Platform Mastermind #1 - Data Governance - how to get started?

Data governance is a very broad subject, many layers of abstractions, interpretations, countless rabbit holes…

But on the other hand, the number of messages and requests to start with such topic only indicates the need of looking the devil in the eye ;)

Here are a few points I noted down for myself during the meeting:

  • Data Governance is a very loaded term with a lot of baggage. Metadata management is a better concept to start with.
  • Data Governance is like agile methodology in its early days. To achieve success, there has to be buy-in at the top of the organization, but also data practitioners’ commitment.
  • Until people don’t see benefits of governance, it will be perceived as a burden that no one wants to do. Instead, focus on showcasing value proposition that you get with proper governance in place.
  • Documenting measures & data is as important as the data itself.
  • Even the best data catalog product is useless if users don’t contribute. Get your community excited about it. Make the processes of documenting data “sexy”. Enable crowdsourcing, award users for contributions with gamification.
  • It’s better to build momentum, spark interest in data catalog and value of metadata before buying an expensive COTS offering. Wikipedia page with described data definitions and terms is a great place to start.
  • Data engineers can’t be left alone to take care of all governance nuances. They need support and close collaboration from business, data stewards, data consumers.

Further reading:

  • “Data Management at Scale” by Piethein Strengholt
  • “Non-Invasive Data Governance” by Robert S. Seiner
  • “Data Management Body of Knowledge” DAMA-DMBOK

A few rules:

  1. My request is to keep the discussions vendor neutral
  2. Focus on pragmatic examples - we are engineers, data scientists, architects, managers.
  3. No slides. I expect a free-flowing discussion. Though it’s totally OK if you want just to listen in.
  4. No recording. I value your privacy and it’s all about learning from each other.

Tags

#mastermind

Share

Previous Article
Apache Ranger Evaluation for Cloud Migration and Adoption Readiness
Valdas Maksimavicius

Valdas Maksimavicius

Data & Analytics Leader

Topics

Data Architecture
Data Engineering
Data Governance
Miscellaneous

Related Posts

Apache Ranger Evaluation for Cloud Migration and Adoption Readiness
May 24, 2021
15 min

Quick Links

About mePrivacyContactLandscape

Social Media