-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update vm delete to not wait for datadisk detachment #95
Conversation
@kon-angelo You need rebase this pull request with latest master branch. Please check. |
d853823
to
a7538d9
Compare
/test |
if deleteErr := DeleteVM(ctx, clients, resourceGroupName, VMName); deleteErr != nil && !NotFound(deleteErr) { | ||
return deleteErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't wait for disk detachment (which could take 10 min)
then we'll directly move to the disk deletion below while disk are detaching
this could make things complicated as azure can't be trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry after taking a re-look , I realize that we are not even issuing a detach call now, and going for a direct delete.
this solution would be deterministic only if DeleteVM
returns after the detach and not before, otherwise we will go for disk deletion and it might fail and lead to inconsistencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you provide more context regarding this change, and how you think this helps?
An PR is already in progress by @unmarshall for issue #91 where this could be handled.
Do you feel this change is urgent ?
/assign @himanshu-kun |
Please note that when we move to the new Azure SDK in #91 , we will be changing the code to do cascade create and cascade delete. ie ONE call to create NIC+DISK+VM and ONE call to delete NIC+DISK+VM. |
We are currently testing these APIs. Tarun is right that there are options to cascade delete. For creation we are testing if we need to create NICs separately. Once we have done the testing then we will know more. |
Thank you all for the review/responses.
The DeleteVM should return after the VM is deleted in which case there is nothing to detach the disks from as the VM is already deleted. In practice, the Azure API should call the detach in the background but for some reason we chose to detach the disks in the foreground by ourselves.
I think that this is how it should be done (in an ideal world). The API allows to delete the machine with data disks so let it take care of the process by itself. If there is an actual reason that we did it this way (past incident or issues), then please point me to the documentation otherwise let's not optimize without a good reason/data.
Not urgent. Just curious why we handle VMs in a "special" way.
👍 . However the cascade options are part of the VM parameters and you have to consider existing VMs and how will you handle it when MCM does not support VM updates. Will you update existing VMs before deleting them ? How will you deal with failing VMs that do not allow for updates etc. But these are part of your future PR. It is relevant however if you can't reliably update all machines with cascading options and you are forced to have keep code paths (old and new) for a period of time. Regardless, merging this PR does not take away or stop you for proceeding with the "one call" methods in the future. TLDR; we don't "have to" proceed with this PR. But I would like to know why we do things the way we do since we go out of our way to do our own "optimization". |
I tried a search for the reason of this optimisation. I bumped into gardener/machine-controller-manager#248 , which introduced the change of
There is not much info in the issue A related issue was seen some time ago for Azure where in a particular situation we tried to delete a VM with PVs attached, but because of this wait for disk detachment , only downtime was seen , and not any other inconsistencies. So I think we need to try out the behaviour of Azure API , when we delete a VM with disks attached like I asked here to see if Azure API is better now or not. |
After discussing internally, we decided that this Azure APIs are not that reliable and we would not disturb the code flow which is working currently , until we move to the latest APIs and see them functioning correctly. |
What this PR does / why we need it:
Update VM delete to not wait until all data disks are detached. This may help to prevent situations where a failed VM cannot be deleted (because detach or any operation are prohibited).
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Release note: