Skip to content

Commit

Permalink
Init EKS 1.29 upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
zacharyblasczyk committed Oct 2, 2024
1 parent 2f1982a commit 5c9bdcd
Show file tree
Hide file tree
Showing 18 changed files with 65 additions and 339 deletions.
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 4.0 |
| <a name="provider_aws"></a> [aws](#provider\_aws) | 4.67.0 |

## Modules

Expand Down Expand Up @@ -164,12 +164,14 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_bucket_kms_key_arn"></a> [bucket\_kms\_key\_arn](#input\_bucket\_kms\_key\_arn) | n/a | `string` | `""` | no |
| <a name="input_bucket_name"></a> [bucket\_name](#input\_bucket\_name) | n/a | `string` | `""` | no |
| <a name="input_bucket_path"></a> [bucket\_path](#input\_bucket\_path) | path of where to store data for the instance-level bucket | `string` | `""` | no |
| <a name="input_clickhouse_endpoint_service_id"></a> [clickhouse\_endpoint\_service\_id](#input\_clickhouse\_endpoint\_service\_id) | The service ID of the VPC endpoint service for Clickhouse | `string` | `""` | no |
| <a name="input_controller_image_tag"></a> [controller\_image\_tag](#input\_controller\_image\_tag) | Tag of the controller image to deploy | `string` | `"1.14.0"` | no |
| <a name="input_create_bucket"></a> [create\_bucket](#input\_create\_bucket) | ######################################### External Bucket # ######################################### Most users will not need these settings. They are ment for users who want a bucket and sqs that are in a different account. | `bool` | `true` | no |
| <a name="input_create_elasticache"></a> [create\_elasticache](#input\_create\_elasticache) | Boolean indicating whether to provision an elasticache instance (true) or not (false). | `bool` | `true` | no |
| <a name="input_create_vpc"></a> [create\_vpc](#input\_create\_vpc) | Boolean indicating whether to deploy a VPC (true) or not (false). | `bool` | `true` | no |
| <a name="input_custom_domain_filter"></a> [custom\_domain\_filter](#input\_custom\_domain\_filter) | A custom domain filter to be used by external-dns instead of the default FQDN. If not set, the local FQDN is used. | `string` | `null` | no |
| <a name="input_database_binlog_format"></a> [database\_binlog\_format](#input\_database\_binlog\_format) | Specifies the binlog\_format value to set for the database | `string` | `"ROW"` | no |
| <a name="input_database_engine_version"></a> [database\_engine\_version](#input\_database\_engine\_version) | Version for MySQL Auora | `string` | `"8.0.mysql_aurora.3.05.2"` | no |
| <a name="input_database_engine_version"></a> [database\_engine\_version](#input\_database\_engine\_version) | Version for MySQL Aurora | `string` | `"8.0.mysql_aurora.3.07.1"` | no |
| <a name="input_database_innodb_lru_scan_depth"></a> [database\_innodb\_lru\_scan\_depth](#input\_database\_innodb\_lru\_scan\_depth) | Specifies the innodb\_lru\_scan\_depth value to set for the database | `number` | `128` | no |
| <a name="input_database_instance_class"></a> [database\_instance\_class](#input\_database\_instance\_class) | Instance type to use by database master instance. | `string` | `"db.r5.large"` | no |
| <a name="input_database_kms_key_arn"></a> [database\_kms\_key\_arn](#input\_database\_kms\_key\_arn) | n/a | `string` | `""` | no |
Expand All @@ -183,14 +185,16 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS cluster kubernetes version | `string` | n/a | yes |
| <a name="input_eks_policy_arns"></a> [eks\_policy\_arns](#input\_eks\_policy\_arns) | Additional IAM policy to apply to the EKS cluster | `list(string)` | `[]` | no |
| <a name="input_elasticache_node_type"></a> [elasticache\_node\_type](#input\_elasticache\_node\_type) | The type of the redis cache node to deploy | `string` | `"cache.t2.medium"` | no |
| <a name="input_enable_dummy_dns"></a> [enable\_dummy\_dns](#input\_enable\_dummy\_dns) | Boolean indicating whether or not to enable dummy DNS for the old alb | `bool` | `false` | no |
| <a name="input_enable_operator_alb"></a> [enable\_operator\_alb](#input\_enable\_operator\_alb) | Boolean indicating whether to use operatore ALB (true) or not (false). | `bool` | `false` | no |
| <a name="input_enable_clickhouse"></a> [enable\_clickhouse](#input\_enable\_clickhouse) | Provision clickhouse resources | `bool` | `false` | no |
| <a name="input_enable_yace"></a> [enable\_yace](#input\_enable\_yace) | deploy yet another cloudwatch exporter to fetch aws resources metrics | `bool` | `true` | no |
| <a name="input_external_dns"></a> [external\_dns](#input\_external\_dns) | Using external DNS. A `subdomain` must also be specified if this value is true. | `bool` | `false` | no |
| <a name="input_extra_fqdn"></a> [extra\_fqdn](#input\_extra\_fqdn) | Additional fqdn's must be in the same hosted zone as `domain_name`. | `list(string)` | `[]` | no |
| <a name="input_kms_clickhouse_key_alias"></a> [kms\_clickhouse\_key\_alias](#input\_kms\_clickhouse\_key\_alias) | KMS key alias for AWS KMS Customer managed key used by Clickhouse CMEK. | `string` | `null` | no |
| <a name="input_kms_clickhouse_key_policy"></a> [kms\_clickhouse\_key\_policy](#input\_kms\_clickhouse\_key\_policy) | The policy that will define the permissions for the clickhouse kms key. | `string` | `""` | no |
| <a name="input_kms_key_alias"></a> [kms\_key\_alias](#input\_kms\_key\_alias) | KMS key alias for AWS KMS Customer managed key. | `string` | `null` | no |
| <a name="input_kms_key_deletion_window"></a> [kms\_key\_deletion\_window](#input\_kms\_key\_deletion\_window) | Duration in days to destroy the key after it is deleted. Must be between 7 and 30 days. | `number` | `7` | no |
| <a name="input_kms_key_policy"></a> [kms\_key\_policy](#input\_kms\_key\_policy) | The policy that will define the permissions for the kms key. | `string` | `""` | no |
| <a name="input_kms_key_policy_administrator_arn"></a> [kms\_key\_policy\_administrator\_arn](#input\_kms\_key\_policy\_administrator\_arn) | The principal that will be allowed to manage the kms key. | `string` | `""` | no |
| <a name="input_kubernetes_alb_internet_facing"></a> [kubernetes\_alb\_internet\_facing](#input\_kubernetes\_alb\_internet\_facing) | Indicates whether or not the ALB controlled by the Amazon ALB ingress controller is internet-facing or internal. | `bool` | `true` | no |
| <a name="input_kubernetes_alb_subnets"></a> [kubernetes\_alb\_subnets](#input\_kubernetes\_alb\_subnets) | List of subnet ID's the ALB will use for ingress traffic. | `list(string)` | `[]` | no |
| <a name="input_kubernetes_instance_types"></a> [kubernetes\_instance\_types](#input\_kubernetes\_instance\_types) | EC2 Instance type for primary node group. | `list(string)` | <pre>[<br> "m5.large"<br>]</pre> | no |
Expand All @@ -212,6 +216,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_network_private_subnets"></a> [network\_private\_subnets](#input\_network\_private\_subnets) | A list of the identities of the private subnetworks in which resources will be deployed. | `list(string)` | `[]` | no |
| <a name="input_network_public_subnet_cidrs"></a> [network\_public\_subnet\_cidrs](#input\_network\_public\_subnet\_cidrs) | List of private subnet CIDR ranges to create in VPC. | `list(string)` | <pre>[<br> "10.10.0.0/24",<br> "10.10.1.0/24"<br>]</pre> | no |
| <a name="input_network_public_subnets"></a> [network\_public\_subnets](#input\_network\_public\_subnets) | A list of the identities of the public subnetworks in which resources will be deployed. | `list(string)` | `[]` | no |
| <a name="input_operator_chart_version"></a> [operator\_chart\_version](#input\_operator\_chart\_version) | Version of the operator chart to deploy | `string` | `"1.3.4"` | no |
| <a name="input_other_wandb_env"></a> [other\_wandb\_env](#input\_other\_wandb\_env) | Extra environment variables for W&B | `map(any)` | `{}` | no |
| <a name="input_parquet_wandb_env"></a> [parquet\_wandb\_env](#input\_parquet\_wandb\_env) | Extra environment variables for W&B | `map(string)` | `{}` | no |
| <a name="input_private_link_allowed_account_ids"></a> [private\_link\_allowed\_account\_ids](#input\_private\_link\_allowed\_account\_ids) | List of AWS account IDs allowed to access the VPC Endpoint Service | `list(string)` | `[]` | no |
Expand Down Expand Up @@ -246,7 +251,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="output_eks_node_count"></a> [eks\_node\_count](#output\_eks\_node\_count) | n/a |
| <a name="output_eks_node_instance_type"></a> [eks\_node\_instance\_type](#output\_eks\_node\_instance\_type) | n/a |
| <a name="output_elasticache_connection_string"></a> [elasticache\_connection\_string](#output\_elasticache\_connection\_string) | n/a |
| <a name="output_internal_app_port"></a> [internal\_app\_port](#output\_internal\_app\_port) | n/a |
| <a name="output_kms_clickhouse_key_arn"></a> [kms\_clickhouse\_key\_arn](#output\_kms\_clickhouse\_key\_arn) | The Amazon Resource Name of the KMS key used to encrypt Weave data at rest in Clickhouse. |
| <a name="output_kms_key_arn"></a> [kms\_key\_arn](#output\_kms\_key\_arn) | The Amazon Resource Name of the KMS key used to encrypt data at rest. |
| <a name="output_network_id"></a> [network\_id](#output\_network\_id) | The identity of the VPC in which resources are deployed. |
| <a name="output_network_private_subnets"></a> [network\_private\_subnets](#output\_network\_private\_subnets) | The identities of the private subnetworks deployed within the VPC. |
Expand All @@ -263,6 +268,25 @@ Upgrades must be executed in step-wise fashion from one version to the next. You

See our upgrade guide [here](./docs/operator-migration/readme.md)

### Upgrading 4.x to 5.x

This upgrade is intended to be used when upgrading eks to 1.29.

We have also upgraded the following Kubernetes addons:

- MySQL Aurora (8.0.mysql_aurora.3.07.1)
- redis (7.1)
- external-dns helm chart (v1.15.0)
- aws-efs-csi-driver (v2.0.7-eksbuild.1)
- aws-ebs-csi-driver (v1.35.0-eksbuild.1)
- coredns (v1.11.3-eksbuild.1)
- kube-proxy (v1.29.7-eksbuild.9)
- vpc-cni (v1.18.3-eksbuild.3)

> :warning: Please remove the `enable_dummy_dns` and `enable_operator_alb` variables
> as they are no longer valid flags. They were provided to support older versions of
> the module that relied on an alb not created by the ingress controller.
### Upgrading from 3.x -> 4.x

- If egress access for retrieving the wandb/controller image is not available, Terraform apply may experience failures.
Expand Down
2 changes: 0 additions & 2 deletions examples/byo-vpc-eks/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,12 @@ variable "bucket_kms_key_arn" {
default = ""
}


variable "allowed_inbound_cidr" {
default = ["0.0.0.0/0"]
nullable = false
type = list(string)
}


variable "allowed_inbound_ipv6_cidr" {
default = ["::/0"]
nullable = false
Expand Down
3 changes: 0 additions & 3 deletions examples/byo-vpc/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ module "wandb_infra" {
public_access = true
external_dns = true

enable_dummy_dns = var.enable_dummy_dns
enable_operator_alb = var.enable_operator_alb

deletion_protection = true

create_vpc = false
Expand Down
12 changes: 0 additions & 12 deletions examples/byo-vpc/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -103,18 +103,6 @@ variable "other_wandb_env" {
default = {}
}

variable "enable_operator_alb" {
type = bool
default = false
description = "Boolean indicating whether to use operatore ALB (true) or not (false)."
}

variable "enable_dummy_dns" {
type = bool
default = false
description = "Boolean indicating whether or not to enable dummy DNS for the old alb"
}

variable "vpc_id" {
type = string
description = "VPC network ID"
Expand Down
31 changes: 1 addition & 30 deletions examples/public-dns-external/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ module "wandb_infra" {
allowed_inbound_cidr = var.allowed_inbound_cidr
allowed_inbound_ipv6_cidr = ["::/0"]

eks_cluster_version = "1.26"
eks_cluster_version = "1.29"
kubernetes_public_access = true
kubernetes_public_access_cidrs = ["0.0.0.0/0"]

Expand Down Expand Up @@ -84,35 +84,6 @@ provider "helm" {
}
}

module "wandb_app" {
source = "wandb/wandb/kubernetes"
version = "1.12.0"

license = var.wandb_license

host = module.wandb_infra.url
bucket = "s3://${module.wandb_infra.bucket_name}"
bucket_path = var.bucket_path
bucket_aws_region = module.wandb_infra.bucket_region
bucket_queue = "internal://"
bucket_kms_key_arn = module.wandb_infra.kms_key_arn
database_connection_string = "mysql://${module.wandb_infra.database_connection_string}"
redis_connection_string = "redis://${module.wandb_infra.elasticache_connection_string}?tls=true&ttlInSeconds=604800"

wandb_image = var.wandb_image
wandb_version = var.wandb_version

service_port = module.wandb_infra.internal_app_port

# If we dont wait, tf will start trying to deploy while the work group is
# still spinning up
depends_on = [module.wandb_infra]

other_wandb_env = merge({
"GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = "aws-secretmanager://${var.namespace}?namespace=${var.namespace}"
}, var.other_wandb_env)
}

output "bucket_name" {
value = module.wandb_infra.bucket_name
}
Expand Down
1 change: 0 additions & 1 deletion examples/standard/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ provider "helm" {
}
}


output "bucket_name" {
value = module.wandb_infra.bucket_name
}
Expand Down
30 changes: 4 additions & 26 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -173,27 +173,15 @@ module "app_eks" {
aws_loadbalancer_controller_tags = var.aws_loadbalancer_controller_tags
}

locals {
full_fqdn = var.enable_dummy_dns ? "old.${local.fqdn}" : local.fqdn
extra_fqdn = var.enable_dummy_dns ? [for fqdn in var.extra_fqdn : "old.${fqdn}"] : var.extra_fqdn
}

module "app_lb" {
source = "./modules/app_lb"

namespace = var.namespace
load_balancing_scheme = var.public_access ? "PUBLIC" : "PRIVATE"
acm_certificate_arn = local.acm_certificate_arn
zone_id = var.zone_id
namespace = var.namespace

fqdn = local.full_fqdn
extra_fqdn = local.extra_fqdn
allowed_inbound_cidr = var.allowed_inbound_cidr
allowed_inbound_ipv6_cidr = var.allowed_inbound_ipv6_cidr
target_port = local.internal_app_port
network_id = local.network_id
network_private_subnets = local.network_private_subnets
network_public_subnets = local.network_public_subnets
enable_private_only_traffic = var.private_only_traffic
private_endpoint_cidr = var.allowed_private_endpoint_cidr

Expand All @@ -217,12 +205,6 @@ module "private_link" {
]
}

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
for_each = module.app_eks.autoscaling_group_names
autoscaling_group_name = each.value
lb_target_group_arn = module.app_lb.tg_app_arn
}

locals {
network_elasticache_subnets = var.create_vpc ? module.networking.elasticache_subnets : var.network_elasticache_subnets
network_elasticache_subnet_cidrs = var.create_vpc ? module.networking.elasticache_subnet_cidrs : var.network_elasticache_subnet_cidrs
Expand Down Expand Up @@ -316,12 +298,12 @@ module "wandb" {
"alb.ingress.kubernetes.io/listen-ports" = "[{\\\"HTTPS\\\": 443}]"
"alb.ingress.kubernetes.io/certificate-arn" = local.acm_certificate_arn
},
length(var.extra_fqdn) > 0 && var.enable_dummy_dns ? {
length(var.extra_fqdn) > 0 ? {
"external-dns.alpha.kubernetes.io/hostname" = <<-EOF
${local.fqdn}\,${join("\\,", var.extra_fqdn)}\,${local.fqdn}
EOF
} : {
"external-dns.alpha.kubernetes.io/hostname" = var.enable_operator_alb ? local.fqdn : ""
"external-dns.alpha.kubernetes.io/hostname" = local.fqdn
},
length(var.kubernetes_alb_subnets) > 0 ? {
"alb.ingress.kubernetes.io/subnets" = <<-EOF
Expand All @@ -331,11 +313,7 @@ module "wandb" {

}

app = var.enable_operator_alb ? {} : {
extraEnv = merge({
"GORILLA_GLUE_LIST" = "true"
}, var.app_wandb_env)
}
app = {}

# To support otel rds and redis metrics, we need operator-wandb chart min version 0.13.8 (yace subchart)
yace = var.enable_yace ? {
Expand Down
45 changes: 21 additions & 24 deletions modules/app_eks/add-ons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,48 +27,45 @@ resource "aws_iam_role" "oidc" {
assume_role_policy = data.aws_iam_policy_document.oidc_assume_role.json
}



### add-ons for eks version 1.28

### add-ons for eks version 1.29
resource "aws_eks_addon" "aws_efs_csi_driver" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-efs-csi-driver"
addon_version = "v2.0.4-eksbuild.1"
resolve_conflicts = "OVERWRITE"
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-efs-csi-driver"
addon_version = "v2.0.7-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "aws_ebs_csi_driver" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.31.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.35.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "coredns" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.11"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "coredns"
addon_version = "v1.11.3-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "kube_proxy" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "kube-proxy"
addon_version = "v1.28.8-eksbuild.5"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "kube-proxy"
addon_version = "v1.29.7-eksbuild.9"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "vpc_cni" {
Expand All @@ -77,7 +74,7 @@ resource "aws_eks_addon" "vpc_cni" {
]
cluster_name = var.namespace
addon_name = "vpc-cni"
addon_version = "v1.18.2-eksbuild.1"
addon_version = "v1.18.3-eksbuild.3"
resolve_conflicts = "OVERWRITE"
service_account_role_arn = aws_iam_role.oidc.arn
}
2 changes: 1 addition & 1 deletion modules/app_eks/external_dns/external_dns.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ resource "helm_release" "external_dns" {
name = "external-dns"
namespace = "kube-system"
chart = "external-dns"
version = "1.14.1"
version = "1.15.0"
repository = "https://kubernetes-sigs.github.io/external-dns"

set {
Expand Down
Loading

0 comments on commit 5c9bdcd

Please sign in to comment.