This project intends to collect, analyze and synthetize referential material about data management, i.e., how to build and operate so-called Modern Data Stack (MDS) and Modern Metadata Platform (MMP).
Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.
- Material for the Data platform - Metadata
- Material for the Data platform - Data-lakes, data warehouses, data lake-houses
- Material for the Data platform - Data contracts
- Material for the Data platform - Data quality
- Material for the Data platform - Modern Data Stack (MDS) in a box
- Architecture principles for data engineering pipelines on the Modern Data Stack (MDS)
- Specifications/principles for a data engineering pipeline deployment tool
- Title: Composable data management at Meta
- Date: May 2024
- Authors: Pedro Pedreira, Amit Purohit
- Link to the article: https://engineering.fb.com/2024/05/22/data-infrastructure/composable-data-management-at-meta/
- Publisher: Meta
- Title: Open sourcing Openhouse
- Author: Sumedh Sakdeo
- Date: March 2024
- Link to the article: https://www.linkedin.com/blog/engineering/open-source/open-sourcing-openhouse
- The Grand Rewrite of DataHub, by Mars Lan et al, Sep. 2023 - https://metaphor.io/blog/the-grand-rewrite-of-datahub
- The Modern Metadata Platform (MMP): What, Why, and How? by Mars Lan et al, Jan. 2022 - https://metaphor.io/blog/the-modern-metadata-platform-what-why-and-how
- DataHub: A generalized metadata search & discovery tool, by Mars Lan et al, Aug. 2019 - https://engineering.linkedin.com/blog/2019/data-hub
- Delta Lake Universal Format (UniForm) for Iceberg compatibility, now Generally Available (GA): https://www.databricks.com/blog/delta-lake-universal-format-uniform-iceberg-compatibility-now-ga
- Authors: Jonathan Brito, Fred Liu and Susan Pierce
- Date: June 2024
See GitHub - Material for the Data platform - Metadata frameworks