Skip to content
Change the repository type filter

All

    Repositories list

    • AWS Glue Configurable Test Data Generator
      Python
      MIT No Attribution
      7000Updated Jun 2, 2023Jun 2, 2023
    • JavaScript
      MIT No Attribution
      3100Updated May 22, 2023May 22, 2023
    • Python
      Other
      4100Updated May 22, 2023May 22, 2023
    • Python
      MIT No Attribution
      3000Updated May 12, 2023May 12, 2023
    • Free Data Engineering course!
      Jupyter Notebook
      5.3k000Updated Apr 21, 2023Apr 21, 2023
    • dbt-glue

      Public
      This repository contains de dbt-glue adapter
      Python
      Apache License 2.0
      68100Updated Apr 7, 2023Apr 7, 2023
    • ClickHouse® is a free analytics DBMS for big data
      C++
      Apache License 2.0
      6.8k000Updated Mar 31, 2023Mar 31, 2023
    • Scala
      MIT No Attribution
      4000Updated Mar 20, 2023Mar 20, 2023
    • you run a script to mimic multiple sensors publishing messages on an IoT MQTT topic, with one message published every second. The events get sent to AWS IoT, where an IoT rule is configured. The IoT rule captures all messages and sends them to Firehose. From there, Firehose writes the messages in batches to objects stored in S3. In S3, you set u…
      Python
      Apache License 2.0
      11031Updated Mar 19, 2023Mar 19, 2023
    • Kinesis Data Analytics Blueprints are a curated collection of Apache Flink applications. Each blueprint will walk you through how to solve a practical problem related to stream processing using Apache Flink. These blueprints can be leveraged to create more complex applications to solve your business challenges in Apache Flink.
      TypeScript
      MIT No Attribution
      7000Updated Mar 14, 2023Mar 14, 2023
    • nextflow

      Public
      A DSL for data-driven computational pipelines
      Groovy
      Apache License 2.0
      623000Updated Mar 11, 2023Mar 11, 2023
    • An Awesome List of Open-Source Data Engineering Projects
      Other
      335200Updated Mar 7, 2023Mar 7, 2023
    • AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker
      Jupyter Notebook
      Apache License 2.0
      1.1k000Updated Feb 20, 2023Feb 20, 2023
    • Python
      MIT No Attribution
      38000Updated Feb 19, 2023Feb 19, 2023
    • Python
      MIT No Attribution
      6000Updated Dec 31, 2022Dec 31, 2022
    • Java
      MIT No Attribution
      3000Updated Nov 10, 2022Nov 10, 2022
    • Alerting and notification in a serverless data lake during failures
      Python
      MIT No Attribution
      3000Updated Nov 4, 2022Nov 4, 2022
    • Build, Test and Deploy ETL solutions using AWS Glue and AWS CDK based CI/CD pipelines
      Python
      MIT No Attribution
      20000Updated Oct 4, 2022Oct 4, 2022
    • Python
      MIT No Attribution
      2000Updated Oct 3, 2022Oct 3, 2022
    • arvados

      Public
      An open source platform for managing and analyzing biomedical big data
      Go
      Other
      116000Updated Sep 19, 2022Sep 19, 2022
    • .github

      Public
      0000Updated Sep 19, 2022Sep 19, 2022
    • Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
      Jupyter Notebook
      33000Updated Sep 12, 2022Sep 12, 2022
    • HCL
      MIT No Attribution
      12000Updated Sep 9, 2022Sep 9, 2022
    • AWS Data Engineering Project using Lambda, S3 and SNS
      Python
      4200Updated Aug 29, 2022Aug 29, 2022
    • querypal

      Public
      Web UI for Amazon Athena
      Vue
      Apache License 2.0
      26000Updated Aug 29, 2022Aug 29, 2022
    • These Terraform modules aggregate Security Hub findings to centralized account using Amazon Kinesis Firehose and AWS Glue
      HCL
      Apache License 2.0
      5000Updated Aug 23, 2022Aug 23, 2022
    • A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
      Java
      GNU General Public License v2.0
      54000Updated Aug 12, 2022Aug 12, 2022
    • Process to gather streaming data from Airline API using NiFi & batch data using AWS redshift using Sqoop and build a data pipeline to analyse the data using Apache Hive and Druid and compare the performances ,to discuss the hive optimization techniques and visualise the data using AWS Quicksight
      GNU General Public License v3.0
      11100Updated Jul 20, 2022Jul 20, 2022
    • This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.
      MIT No Attribution
      40000Updated Jul 18, 2022Jul 18, 2022
    • Python
      MIT License
      0000Updated Jul 18, 2022Jul 18, 2022