Unable to set property to null on AgentPool #2462

jaxxstorm · 2023-05-18T20:21:47Z

What happened?

When creating a container service agent pool, modifying the gpu_instance_profile doesn't work for null values

Expected Behavior

If you set the property to None or any other nullable value, it should modify the resource

Steps to reproduce

Create an agent pool with a gpu_instance_profile

containerservice.AgentPool(
  "example",
  name=pool_config.name,
  gpu_instance_profile="MIG1g"
)

Then modify this to remove the GPU instance profile:

containerservice.AgentPool(
  "example",
  name=pool_config.name,
  gpu_instance_profile="None"
)

Output of `pulumi about`

N/A

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

kpitzen · 2023-05-19T15:33:30Z

Hey @jaxxstorm ! Thanks for opening this - is this reproducible in other languages as well, or is it only a Python issue?

thomas11 · 2024-01-05T16:42:07Z

In the line gpu_instance_profile="None", the value is simply the string "None" because it's quoted. I'm surprised Azure didn't respond with a 400 here since gpu_instance_profile is an enum with defined values.

Unless the example snippet was manually edited and None was quoted erroneously then.

thomas11 · 2024-01-06T07:34:00Z

I tried to repro but there's no auto-approvable quota for multi-GPU VM SKUs. sigh Submitted a support request.

import * as resources from "@pulumi/azure-native/resources";
import * as containerservice from "@pulumi/azure-native/containerservice";

const resourceGroup = new resources.ResourceGroup("resourceGroup");

const managedCluster = new containerservice.ManagedCluster("managedCluster", {
    addonProfiles: {},
    agentPoolProfiles: [{
        count: 3,
        enableNodePublicIP: true,
        mode: "System",
        name: "nodepool1",
        osType: "Linux",
        type: "VirtualMachineScaleSets",
        vmSize: "Standard_DS2_v2",
    }],
    dnsPrefix: "dnsprefix1",
    enableRBAC: true,
    kubernetesVersion: "",
    networkProfile: {
        loadBalancerProfile: {
            managedOutboundIPs: {
                count: 2,
            },
        },
        loadBalancerSku: "standard",
        outboundType: "loadBalancer",
    },
    resourceGroupName: resourceGroup.name,
    resourceName: "clustername1",
    servicePrincipalProfile: {
        clientId: REDACTED
        secret: REDACTED
    },
    sku: {
        name: "Base",
        tier: "Free",
    },
});

const pool = new containerservice.AgentPool("sa", {
    agentPoolName: "agentpool1",
    count: 1,
    gpuInstanceProfile: "MIG1g",
    orchestratorVersion: "",
    osType: "Linux",
    resourceGroupName: resourceGroup.name,
    resourceName: managedCluster.name,
    vmSize: "Standard_ND96asr_v4",
});

danielrbradley · 2024-10-16T16:54:21Z

Starting to familiarise myself with this resource. Here's the facts I think are true.

It looks like the AgentPool is a nested resource of the managed cluster. You can create AgentPoolProfiles inline with the parent, but the AgentPool itself is a standalone resource, though shares all of the same properties as the profiles - with the addition of the agentPoolName which forms part of the URL.

ARM resource docs: https://learn.microsoft.com/en-us/azure/templates/microsoft.containerservice/managedclusters/agentpools

Example VM Size supporting GPUs: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ndasra100v4-series

Our default version is using 2023-04-01 but the latest ARM docs are using 2024-06-02-preview.

The specs don't indicate how to unset this field. The empty string is not valid and as far as I can remember ARM update operations don't require all properties, so a missing field is not treated like a removal.

I expect this field might be set to a default value server-side for VMs with GPU support, but is removed if the VM doesn't support GPUs. There's no indication of this behaviour in the spec.

It's also possible that this is a field which Azure treats as immutable once created, but hasn't documented via the spec. The API might just discard the field it can't update as we've observed this behaviour in other resources.

Next step will be retrying the program above.

danielrbradley · 2024-10-17T13:21:53Z

Test Plan

Deploy the AgentPool with the smallest possible GPU count, and without specifying the gpuInstanceProfile.
- Does the create succeed without this value set explicitly?
- Does the created resource have this property set in its outputs? This would indicate that a default value is always set for GPU VMs and therefore attempting to clear the value might be impossible.
Attempt to update gpuInstanceProfile to another value (e.g. MIG1g or MIG2g)
- Does this update succeed?
- What is contained in the logs for the request being sent for the update?

Notes:

Using the standard_nc24ads_a100_v4 vmSize in EastUS results in the error: Insufficient regional vcpu quota left for location eastus. left regional vcpu quota 23, requested quota 24.
GPU availability really matters. The UK does not seem to have this feature. US East only has 23 gpus currently available which is thwarting our testing.
Use the command PULUMI_DEBUG_GRPC=grpc.log pulumi up --logtostderr --logflow -v=9 2> out.txt for capturing very verbose logs plus gRPC logs.

After 24 failed attempts to deploy an agent pool with GPUs by varying the vmSize I'm going to need to put this back on hold until we can find a provisionable VM Size.

danielrbradley · 2024-10-18T14:37:12Z

Docs on what multi-instance GPUs actually are: https://learn.microsoft.com/en-us/azure/aks/gpu-multi-instance

From the page above:

You can't change the applied GPU instance profile after node pool creation.

Therefore, it's not possible to update this value to either a new value or to remove it. Therefore, we'll mark this property as causing replacement in order to resolve this issue.

Also related in giving a little more insight on what can be updated: https://learn.microsoft.com/en-us/cli/azure/aks/nodepool?view=azure-cli-latest#az-aks-nodepool-update

As noted on https://learn.microsoft.com/en-us/azure/aks/gpu-multi-instance?tabs=azure-cli#gpu-instance-profiles > You can't change the applied GPU instance profile after node pool creation. As noted on https://learn.microsoft.com/en-us/azure/aks/manage-node-pools#limitations > You can't change the VM size of a node pool after you create it. Fixes #2462

pulumi-bot · 2024-10-22T19:55:53Z

This issue has been addressed in PR #3655 and shipped in release v2.68.0.

jaxxstorm added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels May 18, 2023

kpitzen added impact/usability Something that impacts users' ability to use the product easily and intuitively awaiting-feedback Blocked on input from the author area/providers impact/quality and removed needs-triage Needs attention from the triage team labels May 19, 2023

mikhailshilkov added language/python and removed awaiting-feedback Blocked on input from the author labels Sep 14, 2023

danielrbradley self-assigned this Oct 17, 2024

danielrbradley removed their assignment Oct 17, 2024

danielrbradley mentioned this issue Oct 18, 2024

Make AgentPool updates cause replacement #3655

Merged

danielrbradley closed this as completed in #3655 Oct 18, 2024

pulumi-bot assigned danielrbradley Oct 18, 2024

pulumi-bot added the resolution/fixed This issue was fixed label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to set property to null on AgentPool #2462

Unable to set property to null on AgentPool #2462

jaxxstorm commented May 18, 2023

kpitzen commented May 19, 2023

thomas11 commented Jan 5, 2024

thomas11 commented Jan 6, 2024

danielrbradley commented Oct 16, 2024 •

edited

Loading

danielrbradley commented Oct 17, 2024 •

edited

Loading

danielrbradley commented Oct 18, 2024 •

edited

Loading

pulumi-bot commented Oct 22, 2024

Unable to set property to null on AgentPool #2462

Unable to set property to null on AgentPool #2462

Comments

jaxxstorm commented May 18, 2023

What happened?

Expected Behavior

Steps to reproduce

Output of pulumi about

Additional context

Contributing

kpitzen commented May 19, 2023

thomas11 commented Jan 5, 2024

thomas11 commented Jan 6, 2024

danielrbradley commented Oct 16, 2024 • edited Loading

danielrbradley commented Oct 17, 2024 • edited Loading

Test Plan

danielrbradley commented Oct 18, 2024 • edited Loading

pulumi-bot commented Oct 22, 2024

Output of `pulumi about`

danielrbradley commented Oct 16, 2024 •

edited

Loading

danielrbradley commented Oct 17, 2024 •

edited

Loading

danielrbradley commented Oct 18, 2024 •

edited

Loading