-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container restore fails due to rootfsImageID mismatch #24307
Comments
I don't think image ids are meant to be stable with zstd:chunked, at least from reading @mtrmac comments. Looking at the pull output So overall I think this keep the image id in the dump and assume you can identify images across hosts will no longer hold. Maybe it should instead store the FQDN + digest? |
Yes, the code makes a best effort to reuse layers and images (incl. using the traditional values of image IDs), but, depending on the capabilities of the registry the image was pulled from, or on the images already present on the machine, pulls of the same image no longer result in the same image ID. I agree that using a repo@digest reference is the most natural alternative … except that the |
What does that mean? How can we then identify a image different hosts if this is not reliable, are there other ways? I must say the whole zstd:chunked switch seems like a even more problematic change to me if the meaning of image IDs is changed and if they are no longer deterministic. If seemingly unrelated things like checkpoint restore break I wonder how many other things depend on a stable image ID. Doesn't even need to be in podman, just external user scripts that we cannot know about. |
After
Pulled Image IDs should typically be stable when pulled from non-hostile registries with the same capabilities. But the “same capabilities” part can easily not be true (e.g. a mirror is quite likely a different implementation from the upstream registry), and of course the pre-chunked → chunked transition is an one-time ID change. But then, anyway, there is no standard way to pull an image by image ID. So it seems to me that the snapshot restore should always be preceded by a pull using a repo:tag/repo@digest. On the third hand, if the image were built by an intentionally malicious producer to be inconsistent in the partial / non-partial views, that could cause a restore to work with “the other” view.
I agree that it’s unsatisfactory. It’s a natural consequence of retrofitting partial pulls into a system where layers are content-addressed by their full contents, but we explicitly don’t want to read the full contents; that breaks the assumptions of the existing content-addressable identification. So, now, when we pull “the same” layer from a partial-pull-supporting registry, and a partial-pull-not-supporting registry, we can’t deduplicate and them and we must store them separately, with different layer IDs. And that implies different image IDs. Over the past year or so, I haven’t seen anything to suggest that there is a good alternative. We could do a partial pull, and then re-generate the whole layer to verify the uncompressed digest; then we would be justified in continuing to use the traditional IDs. From a consistency point of view I like that idea, but it would make partial pulls quite a bit more expensive (in CPU and, with a simple implementation, in disk I/O; not in network cost). Cc: @giuseppe |
Some more discussion about options WRT image IDs happened, recording here: Re-generating the whole layer and validating the uncompressed digest would actually be pretty expensive; right now, during partial pulls, we don’t digest reused files, only the newly pulled ones. So, with a 99% reuse, we don’t touch 99% of the files. To validate the digest, we would have to read all of them. Some other alternatives:
For now, the default recommendation is to proceed with the varying image IDs as they are, because returning to traditional image IDs would be way too costly, and the always-TOC alternative would not preserve the traditional image ID and it would force users to deal with an ID change anyway. |
Issue Description
When migrating a container from one VM to another with the latest version of Podman, container restore fails with the following error:
This error occurs because the
rootfsImageID
value stored in theconfig.dump
file in the checkpoint does not match therootfsImageID
value generated on the new VM.In the following example, we can see that running
podman pull
results in3417cee3cb7fa02b251c6157dd3d940172383b796b65aed93f3d82d490839270
onsrv1
and01086e2bbcfab816ef337fa91d8b71862e2ded41c96261ed8a5bb605f92eb9c1
onsrv2
:This checksum mismatch causes container restore to fail. This can be confirmed by manually editing the
rootfsImageID
value inconfig.dump
as shown in this video.Steps to reproduce the issue
Steps to reproduce the issue using two VMs (
A
andB
):A
, and create a container checkpointA
toB
(e.g., using NFS or scp)B
Describe the results you received
Restore fails with the error message above.
Describe the results you expected
Restore should complete successfully.
podman info output
Podman installed as follows on Fedora Rawhide:
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
No response
The text was updated successfully, but these errors were encountered: