Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/existing eks example #233

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions examples/eks/eks_cluster_existing/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Existing EKS cluster and CAST AI example with CAST AI Autoscaler policies and additional Node Configurations for

Following example shows how to onboard existing EKS cluster to CAST AI, configure [Autoscaler policies](https://docs.cast.ai/reference/policiesapi_upsertclusterpolicies) and additional [Node Configurations](https://docs.cast.ai/docs/node-configuration/).

IAM policies required to connect the cluster to CAST AI in the example are created by [castai/eks-role-iam/castai module](https://github.com/castai/terraform-castai-eks-role-iam).

Example configuration should be analysed in the following order:
1. Creates IAM and other CAST AI related resources to connect EKS cluster to CAST AI, configure Autoscaler and Node Configurations - `castai.tf`

# Usage
1. Rename `tf.vars.example` to `tf.vars`
2. Update `tf.vars` file with your cluster name, cluster region, vpc_id, cluster_security_group_id, node_security_group_id, subnets and CAST AI API token.
3. Initialize Terraform. Under example root folder run:
```
terraform init
```
4. Run Terraform apply:
```
terraform apply -var-file=tf.vars
```
5. To destroy resources created by this example:
```
terraform destroy -var-file=tf.vars
```

> **Note**
>
> If you are onboarding existing cluster to CAST AI you need to also update [aws-auth](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) configmap. In the configmap instance profile
> used by CAST AI has to be present. Example of entry can be found [here](https://github.com/castai/terraform-provider-castai/blob/157babd57b0977f499eb162e9bee27bee51d292a/examples/eks/eks_cluster_autoscaler_polices/eks.tf#L28-L38).
172 changes: 172 additions & 0 deletions examples/eks/eks_cluster_existing/castai.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Configure Data sources and providers required for CAST AI connection.
data "aws_caller_identity" "current" {}

data "aws_eks_cluster" "existing_cluster" {
name = var.cluster_name # Replace with the actual name of your EKS cluster
}

resource "castai_eks_user_arn" "castai_user_arn" {
cluster_id = castai_eks_clusterid.cluster_id.id
}


provider "castai" {
api_url = var.castai_api_url
api_token = var.castai_api_token
}

provider "helm" {
kubernetes {
host = data.aws_eks_cluster.existing_cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.existing_cluster.certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed.
args = ["eks", "get-token", "--cluster-name", var.cluster_name, "--region", var.cluster_region]
}
}
}

# Create AWS IAM policies and a user to connect to CAST AI.
module "castai-eks-role-iam" {
source = "castai/eks-role-iam/castai"

aws_account_id = data.aws_caller_identity.current.account_id
aws_cluster_region = var.cluster_region
aws_cluster_name = var.cluster_name
aws_cluster_vpc_id = var.vpc_id

castai_user_arn = castai_eks_user_arn.castai_user_arn.arn

create_iam_resources_per_cluster = true
}

# Configure EKS cluster connection using CAST AI eks-cluster module.
resource "castai_eks_clusterid" "cluster_id" {
account_id = data.aws_caller_identity.current.account_id
region = var.cluster_region
cluster_name = var.cluster_name
}

module "castai-eks-cluster" {
source = "castai/eks-cluster/castai"

api_url = var.castai_api_url
castai_api_token = var.castai_api_token
wait_for_cluster_ready = true

aws_account_id = data.aws_caller_identity.current.account_id
aws_cluster_region = var.cluster_region
aws_cluster_name = var.cluster_name

aws_assume_role_arn = module.castai-eks-role-iam.role_arn
delete_nodes_on_disconnect = var.delete_nodes_on_disconnect

default_node_configuration = module.castai-eks-cluster.castai_node_configurations["default"]

node_configurations = {
default = {
subnets = var.subnets
tags = var.tags
security_groups = [
var.cluster_security_group_id,
var.node_security_group_id
]
instance_profile_arn = module.castai-eks-role-iam.instance_profile_arn
}
}


node_templates = {
default_by_castai = {
name = "default-by-castai"
configuration_id = module.castai-eks-cluster.castai_node_configurations["default"]
is_default = true
should_taint = false

constraints = {
on_demand = true
spot = false
use_spot_fallbacks = false

enable_spot_diversity = false
spot_diversity_price_increase_limit_percent = 20

spot_interruption_predictions_enabled = false
spot_interruption_predictions_type = "aws-rebalance-recommendations"
}
}
}

# Configure Autoscaler policies as per API specification https://api.cast.ai/v1/spec/#/PoliciesAPI/PoliciesAPIUpsertClusterPolicies.
# Here:
# - unschedulablePods - Unscheduled pods policy
# - nodeDownscaler - Node deletion policy
autoscaler_policies_json = <<-EOT
{
"enabled" : false,
"isScopedMode" : false,
"unschedulablePods" : {
"enabled" : false
},
"clusterLimits" : {
"enabled" : false
},
"nodeDownscaler" : {
"enabled" : false,
"emptyNodes" : {
"enabled" : false,
"delaySeconds" : 300
},
"evictor" : {
"enabled" : false,
"aggressiveMode" : false,
"nodeGracePeriodMinutes" : 5
}
}
}

EOT

# depends_on helps Terraform with creating proper dependencies graph in case of resource creation and in this case destroy.
# module "castai-eks-cluster" has to be destroyed before module "castai-eks-role-iam".
depends_on = [module.castai-eks-role-iam]
}

resource "castai_rebalancing_schedule" "spots" {
name = "rebalance spots at every 30th minute"
schedule {
cron = "*/30 * * * *"
}
trigger_conditions {
savings_percentage = 20
}
launch_configuration {
# only consider instances older than 5 minutes
node_ttl_seconds = 300
num_targeted_nodes = 3
rebalancing_min_nodes = 2
keep_drain_timeout_nodes = false
selector = jsonencode({
nodeSelectorTerms = [{
matchExpressions = [
{
key = "scheduling.cast.ai/spot"
operator = "Exists"
}
]
}]
})
execution_conditions {
enabled = true
achieved_savings_percentage = 10
}
}
}

resource "castai_rebalancing_job" "spots" {
cluster_id = castai_eks_clusterid.cluster_id.id
rebalancing_schedule_id = castai_rebalancing_schedule.spots.id
enabled = true
}
15 changes: 15 additions & 0 deletions examples/eks/eks_cluster_existing/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Following providers required by EKS and VPC modules.
provider "aws" {
region = var.cluster_region
}

provider "kubernetes" {
host = data.aws_eks_cluster.existing_cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.existing_cluster.certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", var.cluster_name, "--region", var.cluster_region]
}
}
7 changes: 7 additions & 0 deletions examples/eks/eks_cluster_existing/tf.vars.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
cluster_name = ""
cluster_region = ""
castai_api_token = ""
vpc_id = ""
cluster_security_group_id = ""
node_security_group_id = ""
subnets = ["", ""]
55 changes: 55 additions & 0 deletions examples/eks/eks_cluster_existing/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# EKS module variables.
variable "cluster_name" {
type = string
description = "EKS cluster name in AWS account."
}

variable "cluster_region" {
type = string
description = "AWS Region in which EKS cluster and supporting resources will be created."
}

variable "vpc_id" {
type = string
description = "EKS cluster VPC ID"
}

variable "castai_api_url" {
type = string
description = "URL of alternative CAST AI API to be used during development or testing"
default = "https://api.cast.ai"
}

# Variables required for connecting EKS cluster to CAST AI.
variable "castai_api_token" {
type = string
description = "CAST AI API token created in console.cast.ai API Access keys section"
}

variable "delete_nodes_on_disconnect" {
type = bool
description = "Optional parameter, if set to true - CAST AI provisioned nodes will be deleted from cloud on cluster disconnection. For production use it is recommended to set it to false."
default = true
}

variable "tags" {
type = map(any)
description = "Optional tags for new cluster nodes. This parameter applies only to new nodes - tags for old nodes are not reconciled."
default = {}
}


variable "cluster_security_group_id" {
type = string
description = "EKS cluster security group ID"
}

variable "node_security_group_id" {
type = string
description = "EKS cluster node security group ID"
}

variable "subnets" {
type = list(string)
description = "Subnet IDs used by CAST AI to provision nodes"
}
17 changes: 17 additions & 0 deletions examples/eks/eks_cluster_existing/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
terraform {
required_providers {
castai = {
source = "castai/castai"
}
kubernetes = {
source = "hashicorp/kubernetes"
}
helm = {
source = "hashicorp/helm"
}
aws = {
source = "hashicorp/aws"
}
}
required_version = ">= 0.13"
}
Loading