Skip to content

Commit

Permalink
deploy: 47d0f0a
Browse files Browse the repository at this point in the history
  • Loading branch information
RoelantVos committed Sep 3, 2023
0 parents commit 53d74ff
Show file tree
Hide file tree
Showing 135 changed files with 14,916 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
176 changes: 176 additions & 0 deletions Introduction.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Interface specification - Data Solution Automation Metadata | Schema for Data Warehouse Automation </title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="title" content="Interface specification - Data Solution Automation Metadata | Schema for Data Warehouse Automation ">

<link rel="icon" href="favicon.ico">
<link rel="stylesheet" href="public/docfx.min.css">
<link rel="stylesheet" href="public/main.css">
<meta name="docfx:navrel" content="toc.html">
<meta name="docfx:tocrel" content="toc.html">

<meta name="docfx:rel" content="">

<meta name="docfx:disabletocfilter" content="true">
<meta name="docfx:docurl" content="https://github.com/data-solution-automation-engine/data-warehouse-automation-metadata-schema/blob/dev/docs/Introduction.md/#L1">
</head>

<script type="module">
import options from './public/main.js'
import { init } from './public/docfx.min.js'
init(options)
</script>

<script>
const theme = localStorage.getItem('theme') || 'auto'
document.documentElement.setAttribute('data-bs-theme', theme === 'auto' ? (window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light') : theme)
</script>


<body class="tex2jax_ignore" data-layout="" data-yaml-mime="">
<header class="bg-body border-bottom">
<nav id="autocollapse" class="navbar navbar-expand-md" role="navigation">
<div class="container-xxl flex-nowrap">
<a class="navbar-brand" href="index.html">
<img id="logo" class="svg" src="." alt="">

</a>
<button class="btn btn-lg d-md-none border-0" type="button" data-bs-toggle="collapse" data-bs-target="#navpanel" aria-controls="navpanel" aria-expanded="false" aria-label="Toggle navigation">
<i class="bi bi-three-dots"></i>
</button>
<div class="collapse navbar-collapse" id="navpanel">
<div id="navbar">
<form class="search" role="search" id="search">
<i class="bi bi-search"></i>
<input class="form-control" id="search-query" type="search" disabled="" placeholder="Search" autocomplete="off" aria-label="Search">
</form>
</div>
</div>
</div>
</nav>
</header>

<main class="container-xxl">
<div class="toc-offcanvas">
<div class="offcanvas-md offcanvas-start" tabindex="-1" id="tocOffcanvas" aria-labelledby="tocOffcanvasLabel">
<div class="offcanvas-header">
<h5 class="offcanvas-title" id="tocOffcanvasLabel">Table of Contents</h5>
<button type="button" class="btn-close" data-bs-dismiss="offcanvas" data-bs-target="#tocOffcanvas" aria-label="Close"></button>
</div>
<div class="offcanvas-body">
<nav class="toc" id="toc"></nav>
</div>
</div>
</div>

<div class="content">
<div class="actionbar">
<button class="btn btn-lg border-0 d-md-none" style="margin-top: -.65em; margin-left: -.8em" type="button" data-bs-toggle="offcanvas" data-bs-target="#tocOffcanvas" aria-controls="tocOffcanvas" aria-expanded="false" aria-label="Show table of contents">
<i class="bi bi-list"></i>
</button>

<nav id="breadcrumb"></nav>
</div>

<article data-uid="">
<h1 id="interface-specification---data-solution-automation-metadata">Interface specification - Data Solution Automation Metadata</h1>

<p>The <strong>interface for data solution automation metadata</strong> provides an agreed (canonical) format for the exchange of relevant metadata for data solution/warehouse automation. The intent is to define a <em>sufficiently generic</em> format, that can be used to record and share information about data solution automation metadata, so that more time can be spent on concepts, patterns, and solution ideas - instead of reinventing the wheel on what exactly is required to automate a data solution.</p>
<p>This in itself aims to facilitate greater interoperability between various data solution / data warehouse automation and data logistics generations approaches and ecosystems.</p>
<p>The schema definition can be directly viewed <a href="https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface/blob/master/GenericInterface/interfaceDataWarehouseAutomationMetadata.json">here</a>, and is part of <a href="https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface">this GitHub repository</a>. The repository contains various supporting components such as:</p>
<ul>
<li>A simple Class Library (DLL) that has implemented the schema structure, as well as a validation function to test JSON files / messages against the schema</li>
<li>Starter documentation.</li>
<li>A sample implementation that generates code using <a href="http://roelantvos.com/blog/using-handlebars-to-generate-data-vault-hub-load-processes/">Handlebars.Net</a>. The example that uses the Handlebars generates code using a sample JSON file that conforms to the interface schema.</li>
<li>A simple regression test application that demonstrates different usages of the schema.</li>
</ul>
<p>The schema is and examples are validated / extended using <a href="https://www.jsonschemavalidator.net/">https://www.jsonschemavalidator.net/</a>. Standards are followed from <a href="http://json-schema.org/">json-schema.org</a>. Also see <a href="http://json-schema.org/learn/miscellaneous-examples.html">some miscellaneous examples</a>.</p>
<p>In principle, the schema can be used to generate an entire Data Warehouse, Data Lake and equivalent and/or similar.</p>
<h2 id="schema"><strong>Schema</strong></h2>
<p>The proposed Json schema has standard components for table (DataObjects) and column (DataItem) structures that are reused for sources and targets. At the mapping level only the classification, filter and load direction are added, the rest is generic reuse of definitions.</p>
<p>The schema is available in the Github under: <a href="https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface">https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface</a>.</p>
<p>The schema definition specifically is located here: [https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface/blob/master/Generic%20interface/interfaceDataWarehouseAutomationMetadata.json](<a href="https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface/blob/master/Generic">https://github.com/RoelantVos/Data_Warehouse_Automation_Metadata_Interface/blob/master/Generic</a> interface/interfaceDataWarehouseAutomationMetadata.json).</p>
<p>It is also referenced in the Class Library.</p>
<h2 id="how-does-the-interface-schema-work">How does the interface schema work?</h2>
<p>The interface is a Json Schema Definition that has been designed following draft 7 of the Json schema. It contains a series of reusable defined objects (‘definitions’) that are implemented as a source-to-target mapping object called a ‘Data Object Mapping’.</p>
<p>The Data Object Mapping is literally a mapping between Data Objects. It is a unique ETL mapping / transformation that moves, or interprets, data from a given source to a given destination.</p>
<p>At a high level there are two elements that form the core of a Data Object Mapping, these are the:</p>
<ul>
<li>Data Object, which defines the source and target of the Data Object Mapping. A Data Object can optionally have a connection defined as a string or token, and can be a query, file or table.</li>
<li>Data Item, which belong to a Data Object and represents an individual column or calculation (query) in a Data Object Mapping.</li>
</ul>
<p><img src="http://roelantvos.com/blog/wp-content/uploads/2020/01/DataObject-3-1024x466.png" alt="img"></p>
<h2 id="mapping-metadata">Mapping metadata</h2>
<p>A Data Object Mapping reuses the definitions of the Data Object and Data Item. The Data Object is used twice: as the <em>SourceDataObject</em> and as the <em>TargetDataObject</em> – both instances of the DataObject class / type.</p>
<p>The other key component of a Data Object Mapping is the <em>Data Item Mapping</em>, which describes the column-to-column (or transformation-to-column) and reuses the Data Item class.</p>
<p>The Source Data Object, Target Data Object and Data Item Mapping are the mandatory components of a Data Object Mapping.</p>
<p>There are many other attributes that can be set, and there are mandatory items within the Data Objects and Data Items also. These are described in the Json schema, and the concept is that the validation functions will make it easy to try out different uses of the schema.</p>
<p>One of the goals of defining this schema has been to find a good balance between being too generic and too specific (restrictive). For this reason there are only a few mandatory elements.</p>
<p>It is possible to add a specific class to a Data Object Mapping: the Business Key Definition. This construct again reuses the earlier definitions but can optionally be added to the Data Object Mapping as an special classified set of transformation.</p>
<p>By combining this, the Data Object Mapping looks as follows at a high level:</p>
<p><img src="http://roelantvos.com/blog/wp-content/uploads/2020/01/DataObjectMapping-1024x453.png" alt="img"></p>
<h2 id="mapping-collections">Mapping collections</h2>
<p>At the top level, one or more Data Object Mappings are grouped into a single Data Object Mapping List. The convention is that, even though only a single Data Object Mapping may be needed in a message or file, a Data Object Mapping is <em>always</em> part of a Data Object Mapping List.</p>
<p>In other words, the Data Object Mapping List is an array of individual Data Object Mappings. In code, this means a Data Object Mapping List is defined as a List<dataobjectmapping>.</dataobjectmapping></p>
<p>The decision to start the format with an array / list that contains potentially multiple Data Object Mappings relates to the Data Warehouse virtualisation use-case. In this style of implementation, multiple individual mappings together create a single view object. Testing revealed it is much harder to piece the relationships between mappings together at a later stage to create a single (view) object, and having the option to define a collection makes this really easy.</p>
<p>For example, consider the loading of a Core Business Concept (‘Hub’) type entity from various different data sources. If you would use these different mappings to generate ETL processes you would create one physical ETL object for each mapping. However, if you are seeking to generate a view that represents the target table you would use the collection (list) of mappings to generate separate statements that are unioned in a single view object.</p>
<h2 id="example">Example</h2>
<p>This is a simple example using the schema definition. Various other examples and use-cases are available in the code sections of this Github. The example shows a single DataObjectMapping in a DataObjectMappingList.</p>
<pre><code class="lang-json">{
&quot;dataObjectMappingList&quot;: [
{
&quot;mappingName&quot;: &quot;Mapping1&quot;,
&quot;sourceDataObject&quot;: {
&quot;name&quot;: &quot;SourceTable&quot;
},
&quot;targetDataObject&quot;: {
&quot;name&quot;: &quot;TargetTable&quot;
},
&quot;dataItemMapping&quot;: [
{
&quot;sourceDataItem&quot;: {
&quot;name&quot;: &quot;SourceColumn1&quot;
},
&quot;targetDataItem&quot;: {
&quot;name&quot;: &quot;TargetColumn1&quot;
}
},
{
&quot;sourceDataItem&quot;: {
&quot;name&quot;: &quot;SourceColumn2&quot;
},
&quot;targetDataItem&quot;: {
&quot;name&quot;: &quot;TargetColumn2&quot;
}
}
]
}
]
}
</code></pre>
</article>


<div class="next-article d-print-none border-top" id="nextArticle"></div>

</div>

<div class="affix">
<nav id="affix"></nav>
</div>
</main>

<div class="container-xxl search-results" id="search-results"></div>

<footer class="border-top">
<div class="container-xxl">
<div class="flex-fill">
<span>Schema for Data Solution Automation</span>
</div>
</div>
</footer>
</body>
</html>
Loading

0 comments on commit 53d74ff

Please sign in to comment.