diff --git a/.github/workflows/ci-feature.yml b/.github/workflows/ci-feature.yml index 0f651d1..8f2365a 100644 --- a/.github/workflows/ci-feature.yml +++ b/.github/workflows/ci-feature.yml @@ -4,6 +4,7 @@ on: push: branches: - feature** + - '[0-9].[0-9].x' permissions: id-token: write diff --git a/app/src/main.py b/app/src/main.py index b0ca185..99665f3 100644 --- a/app/src/main.py +++ b/app/src/main.py @@ -161,6 +161,6 @@ connection_type=spark_manager.args["CONNECTION_TYPE"], update_behavior=spark_manager.args["UPDATE_BEHAVIOR"], compression=spark_manager.args["COMPRESSION"], - enable_update_catalog=spark_manager.args["ENABLE_UPDATE_CATALOG"], + enable_update_catalog=bool(spark_manager.args["ENABLE_UPDATE_CATALOG"]), output_data_format=spark_manager.args["OUTPUT_DATA_FORMAT"] ) diff --git a/docs/assets/gifs/terraglue-learning-01-datasources.gif b/docs/assets/gifs/terraglue-learning-01-datasources.gif new file mode 100644 index 0000000..9de7a04 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-01-datasources.gif differ diff --git a/docs/assets/gifs/terraglue-learning-02-datadelivery.gif b/docs/assets/gifs/terraglue-learning-02-datadelivery.gif new file mode 100644 index 0000000..c906466 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-02-datadelivery.gif differ diff --git a/docs/assets/gifs/terraglue-learning-03-terraglue.gif b/docs/assets/gifs/terraglue-learning-03-terraglue.gif new file mode 100644 index 0000000..88fe400 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-03-terraglue.gif differ diff --git a/docs/assets/gifs/terraglue-learning-04-variables.gif b/docs/assets/gifs/terraglue-learning-04-variables.gif new file mode 100644 index 0000000..39239b5 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-04-variables.gif differ diff --git a/docs/assets/gifs/terraglue-learning-05-plan.gif b/docs/assets/gifs/terraglue-learning-05-plan.gif new file mode 100644 index 0000000..37421d5 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-05-plan.gif differ diff --git a/docs/assets/gifs/terraglue-learning-06-apply.gif b/docs/assets/gifs/terraglue-learning-06-apply.gif new file mode 100644 index 0000000..e802793 Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-06-apply.gif differ diff --git a/docs/assets/gifs/terraglue-learning-07-resources.gif b/docs/assets/gifs/terraglue-learning-07-resources.gif new file mode 100644 index 0000000..9a0d6ca Binary files /dev/null and b/docs/assets/gifs/terraglue-learning-07-resources.gif differ diff --git a/docs/mkdocs/demos/learning-mode.md b/docs/mkdocs/demos/learning-mode.md index e38662f..557d272 100644 --- a/docs/mkdocs/demos/learning-mode.md +++ b/docs/mkdocs/demos/learning-mode.md @@ -1,4 +1,170 @@ # Learning Mode -???+ warning "Work in progress" - Content will be updated here as soon as possible! \ No newline at end of file +What if users don't have a custom Glue job to be deployed but still they want to see and learn more about all the pieces needed to make a Glue job run in AWS? Well, the **learning mode** on terraglue can be used to deploy a preconfigured Glue job with everything is needed to see things running in practice. + +Check the [home page](../index.md) to see all things that happen in the target AWS account when we call terraglue on learnind mode. + + +## Structuring a Terraform Project + +If you checked the [production mode demo](production-mode.md) you saw that the Terraform project structured in that context was a little bit more complex. For this demo, as we are talking about using terraglue to deploy a preconfigured Glue job, we will only need a `main.tf` file to put all Terraform code that is required. + +??? question "Why do I need only a main.tf Terraform file when using terraglue on learning mode?" + Well, there is no need to have different folders in our project to address Glue scripts files, policies or anything. By using terraglue on learning mode, all those elements, files and folders are located inside the module. + + You can check all of them on the `.terraform/` folder after running the `terraform init` command. + +If you need more information about the structure of a Terraform project you can check the [official Hashicorp documentation](https://developer.hashicorp.com/terraform/language/modules/develop/structure) about it. + + +## Collecting Terraform Data Sources + +Once we structured the Terraform project, let's start by collecting some [Terraform data sources](https://developer.hashicorp.com/terraform/language/data-sources) that will be used along the project. Terraform data sources can improve the development of a Terraform project in a lot of aspects. In the end, this is not a required step, but it can be considered as a good practice according to which resources will be declared and which configurations will be applied. + +So, let's take our `main.tf` file and get the three Terraform data sources stated balow: + +- A [aws_caller_identity](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) data source to extract the user account id +- A [aws_region](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) data source to get the target AWS region + +??? example "Collecting Terraform data sources" + [![A video demo showing how to get Terraform data sources](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-01-datasources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-01-datasources.gif?raw=true) + + ___ + + 💻 **Terraform code**: + ```python + # Collecting data sources + data "aws_caller_identity" "current" {} + data "aws_region" "current" {} + ``` + +Before calling terraglue module, let's call the [datadelivery](https://datadelivery.readthedocs.io/en/latest/) module in order to deploy buckets, data files, catalog tables and other useful things that is mandatory to use terraglue on learning mode! + + +## Configuring Datadelivery + +> datadelivery is an open source Terraform module that provides an infrastructure toolkit to be deployed in any AWS account in order to help users to explore analytics services like Athena, Glue, EMR, Redshift and others. It does that by uploading and cataloging public datasets that can be used for multiple purposes, either to create jobs or just to query data using AWS services. + +When we use terraglue on learning mode, the Glue job deployed on the AWS target account uses buckets and tables delivered by datalivery module. That's why we need to combine both solutions in order to reach the final goal. + +??? example "Calling datadelivery module" + [![A video demo showing how to call datadelivery Terraform module from GitHub](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-02-datadelivery.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-02-datadelivery.gif?raw=true) + + ___ + + 💻 **Terraform code**: + ```python + # Collecting data sources + data "aws_caller_identity" "current" {} + data "aws_region" "current" {} + + # Calling datadelivery module + module "datadelivery" { + source = "git::https://github.com/ThiagoPanini/datadelivery?ref=main" + } + ``` + +## Configuring Terraglue + +Now we're ready to call terraglue. Unlike the production mode (the default one), the learning mode just need to be passed on `mode` module variable and nothins more is needed. + +### Calling The Source Module + +This section is all about showing how to call the terraglue module directly from GitHub. + +??? example "Calling terraglue module" + [![A video demo showing how to call terraglue Terraform module from GitHub](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-03-terraglue.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-03-terraglue.gif?raw=true) + + ___ + + 💻 **Terraform code**: + ```python + # Collecting data sources + data "aws_caller_identity" "current" {} + data "aws_region" "current" {} + + # Calling datadelivery module + module "datadelivery" { + source = "git::https://github.com/ThiagoPanini/datadelivery?ref=main" + } + + # Calling terraglue module on learning mode + module "terraglue" { + source = "git::https://github.com/ThiagoPanini/terraglue?ref=main" + + mode = "learning" + } + ``` + +### Setting Up S3 and Job Outputs + +The only thing that is required when calling terraglue on learning mode is the set up three variables: + +- `glue_scripts_bucket_name`: to tell terraglue the name of the bucket where the script files are stored +- `job_output_bucket_name` to tell terraglue the name of the output bucket that will store the table generated by the job +- `job_output_database` to tell terraglue the output database that will handle the catalog process of the table generated by the job + +??? example "Configuring terraglue's required variables when using it on learning mode" + [![A video demo showing how to configure terraglue required variables when used on learning mode](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-04-variables.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-04-variables.gif?raw=true) + + ___ + + 💻 **Terraform code**: + ```python + # Collecting data sources + data "aws_caller_identity" "current" {} + data "aws_region" "current" {} + + # Calling datadelivery module + module "datadelivery" { + source = "git::https://github.com/ThiagoPanini/datadelivery?ref=main" + } + + # Calling terraglue module on learning mode + module "terraglue" { + source = "git::https://github.com/ThiagoPanini/terraglue?ref=main" + + mode = "learning" + + # Setting up the scripts bucket name + glue_scripts_bucket_name = "datadelivery-glue-assets-${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}" + + # Setting up output variables + job_output_bucket_name = "datadelivery-sot-data-${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}" + job_output_database = "db_datadelivery_sot" + } + ``` + +And that's literally all! The learning mode was built to make things easiest as possible to users that don't have much experience on deploying Glue jobs in AWS. The idea is to provide an end-to-end example on how things works. + +The next step is to run the Terraform commands to deploy the resources in the target AWS account. + +## Running Terraform Commands + +After all this configuration journey, we now just need to plan and apply the deployment using the respective Terraform commands. + +### Terraform plan + +With `terraform plan` command, we will be able to see all the resources that will be deployed with the configuration we chose. + +??? example "Running the terraform plan command" + [![A gif showing how to run terraform plan Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-05-plan.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-05-plan.gif?raw=true) + +### Terraform apply + +And now we can finally deploy the infrastructure declared using the `terraform apply` command. + +??? example "Running the terraform apply command" + [![A gif showing how to run terraform apply Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-06-apply.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-06-apply.gif?raw=true) + + +## Deployed Resources + +In the end, to finish this demo, let's navigate through all resources deployed in the target AWS account to see a preconfigured Glue job in scene! + +??? example "A little tour through all deployed resources by terraglue" + [![A gif showing different AWS console pages in order to show all the deployed resources by terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-07-resources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-learning-07-resources.gif?raw=true) + +___ + +✅ I hope all the demos can help you somehow on using terraglue to learn more about how a Glue job works in practice. Keep reading the docs to become a master user in terraglue! diff --git a/docs/mkdocs/demos/production-mode.md b/docs/mkdocs/demos/production-mode.md index 05588d3..eafad1e 100644 --- a/docs/mkdocs/demos/production-mode.md +++ b/docs/mkdocs/demos/production-mode.md @@ -1,6 +1,6 @@ # Production Mode -So, let's take a deep dive on how an user can call the terraglue module to deploy it's own Glue job in AWS. +Let's take a deep dive on how an user can call the terraglue module to deploy it's own Glue job in AWS. For this task, let's suppose we want to: @@ -43,7 +43,7 @@ If you need more information about the structure of a Terraform project you can ## Collecting Terraform Data Sources -Once we structured the Terraform project, let's start by collecting some [Terraform data sources](https://developer.hashicorp.com/terraform/language/data-sources) that will be used along the project. To get and use Terraform data sources can improve the development of a Terraform project in a lot of aspects. In the end, this is not a required step, but it can be considered as a good practice according to which resources will be declared and which configurations will be applied. +Once we structured the Terraform project, let's start by collecting some [Terraform data sources](https://developer.hashicorp.com/terraform/language/data-sources) that will be used along the project. Terraform data sources can improve the development of a Terraform project in a lot of aspects. In the end, this is not a required step, but it can be considered as a good practice according to which resources will be declared and which configurations will be applied. So, let's take our `main.tf` file and get the three Terraform data sources stated balow: @@ -52,7 +52,7 @@ So, let's take our `main.tf` file and get the three Terraform data sources state - A [aws_kms_key](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_key) data source to get a KMS key by its alias (assuming that there is a KMS key alias in the target AWS account) ??? example "Collecting Terraform data sources" - [![A video demo showing how to get Terraform data sources](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-01-datasources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-01-datasources.gif?raw=true) + [![A video demo showing how to get Terraform data sources](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-01-datasources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-01-datasources.gif?raw=true) ___ @@ -86,7 +86,7 @@ By following all demos from each topic, users will be able to fully understand t This section is all about showing how to call the terraglue module directly from GitHub. ??? example "Calling the terraglue module directly from GitHub" - [![A gif showing how to call the terraglue module from the GitHub](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-02-module.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-02-module.gif?raw=true) + [![A gif showing how to call the terraglue module from the GitHub](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-02-module.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-02-module.gif?raw=true) ___ @@ -112,7 +112,7 @@ This section is all about showing how to call the terraglue module directly from Optionally, users can initialize the terraglue module declared through `terraform init` command in order to get a simple but huge feature: the autocomplete text in variable names from the module. This can make things a lot easier whe configuring terraglue in the next sections. - [![A demo gif showing the execution of the terraform init command](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-02b-init.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-02b-init.gif?raw=true) + [![A demo gif showing the execution of the terraform init command](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-02b-init.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-02b-init.gif?raw=true) ### Setting Up IAM Variables @@ -126,7 +126,7 @@ For this demo, let's set the following configurations: - Inform terraglue the name of the IAM role to be created ??? example "Setting up IAM variables on terraglue" - [![A gif showing how to configure IAM variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-03-iam.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-03-iam.gif?raw=true) + [![A gif showing how to configure IAM variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-03-iam.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-03-iam.gif?raw=true) ___ @@ -160,7 +160,7 @@ Well, the next step in this demo will handle KMS key configuration that affects - Inform terraglue the ARN of the existing KMS key (collected from the `aws_kms_key` Terraform data source declared at the beginning of the project) ??? example "Setting up KMS variables on terraglue" - [![A gif showing how to configure KMS variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-04-kms.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-04-kms.gif?raw=true) + [![A gif showing how to configure KMS variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-04-kms.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-04-kms.gif?raw=true) ___ @@ -199,7 +199,7 @@ Basically, this is the step where users provide a bucket name to host the files In this demo, we will use the `aws_caller_identity` and `aws_region` data sources collected at the beginning of the project to build a bucket name without hard coding informations such as account ID and AWS region. ??? example "Setting up a s3 bucket name to store scripts files" - [![A gif showing how to configure S3 variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-05-s3.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-05-s3.gif?raw=true) + [![A gif showing how to configure S3 variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-05-s3.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-05-s3.gif?raw=true) ___ @@ -244,7 +244,7 @@ The idea with this variables block is: - Inform terraglue to use 5 workers ??? example "Setting up a Glue job" - [![A gif showing how to configure Glue job variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-06-gluejob.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-06-gluejob.gif?raw=true) + [![A gif showing how to configure Glue job variables on terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-06-gluejob.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-06-gluejob.gif?raw=true) ___ @@ -296,7 +296,7 @@ The main key points about the job arguments declared in this demo are: In this step, users are free to set all Glue acceptable arguments. A full list can be found in the [AWS official documentation about job parameters](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html). ??? example "Setting up Glue job arguments" - [![A gif showing how to configure Glue job arguments on terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-07-jobargs.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-07-jobargs.gif?raw=true) + [![A gif showing how to configure Glue job arguments on terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-07-jobargs.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-07-jobargs.gif?raw=true) ___ @@ -357,19 +357,17 @@ After all this configuration journey, we now just need to plan and apply the dep ### Terraform plan -Well, now it's time to see the deployment plan using the `terraform plan` command. - -Here we will be able to see all the resources that will be deployed with the configuration we chose. +With `terraform plan` command, we will be able to see all the resources that will be deployed with the configuration we chose. ??? example "Running the terraform plan command" - [![A gif showing how to run terraform plan Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-08-plan.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-08-plan.gif?raw=true) + [![A gif showing how to run terraform plan Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-08-plan.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-08-plan.gif?raw=true) ### Terraform apply And now we can finally deploy the infrastructure declared using the `terraform apply` command. ??? example "Running the terraform apply command" - [![A gif showing how to run terraform apply Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-09-apply.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-09-apply.gif?raw=true) + [![A gif showing how to run terraform apply Terraform comand](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-09-apply.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-09-apply.gif?raw=true) ## Deployed Resources @@ -381,7 +379,7 @@ Well, to finish this demo page, let's see all the resources that were deployed b - A Glue job with parameters and arguments chosen by users ??? example "A little tour through all deployed resources by terraglue" - [![A gif showing different AWS console pages in order to show all the deployed resources by terraglue](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-10-resources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/feature/improve-docs/docs/assets/gifs/terraglue-production-10-resources.gif?raw=true) + [![A gif showing different AWS console pages in order to show all the deployed resources by terraglue](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-10-resources.gif?raw=true)](https://github.com/ThiagoPanini/terraglue/blob/2.0.x/docs/assets/gifs/terraglue-production-10-resources.gif?raw=true) ___ diff --git a/docs/mkdocs/index.md b/docs/mkdocs/index.md index 4c2e9a3..e3f4d97 100644 --- a/docs/mkdocs/index.md +++ b/docs/mkdocs/index.md @@ -73,8 +73,9 @@ The *terraglue* Terraform module isn't alone. There are other complementary open ## Read the Docs -- If you like stories, check ouy the [Project Story](story.md) to see how terraglue was born +- If you like stories, check out the [Project Story](story.md) to see how terraglue was born - To take the first steps on terraglue, don't forget to check the [Quickstart](./quickstart/gettingstarted.md) section +- Everyone likes demos, right? Check the [Demos](./demos/about.md) section to see terraglue in practice - Don't forget to check the [Variables](./variables/variables.md) section to see different ways to customize terraglue