You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, la…
ETL process using Pentaho Data Integration (Kettle), for Sales and Purchases Datamarts from Adventureworks, as the final project from the Data Management course from the Big Data & Analytics Masters @ EAE class of 2021
A python 3 package to retrieve ambient air monitoring data from the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart API v2 interface
This is the final project I had to do to finish my Big Data Expert Program in U-TAD in September 2017. It uses the following technologies: Apache Spark v2.2.0, Python v2.7.3, Jupyter Notebook (PySpark), HDFS, Hive, Cloudera Impala, Cloudera HUE and Tableau.
Automated data extraction using APIs, AWS functions to build ETL pipelines, and data modeling in SSIS. Performed data manipulation, profiling, cleansing, integration, and price history tracking.
This repository presents a data pipeline from a fictitious company with two endpoints: a Data Warehouse and a Data Mart. The focus of the project is to use Pentaho and Dimensional Modeling.
dimensional modeling of AdventureWorks2017 for sales, creating a DataMart. It includes an ETL pipeline that loads the data from AdventureWorks2017 to AdventureWorksDM using SQL Server Integration Services (SSIS) and implements Slowly Changing Dimension (SCD) handling using the SCD wizard and Merge statement.