Container restore fails due to rootfsImageID mismatch #24307

rst0git · 2024-10-17T19:32:13Z

Issue Description

When migrating a container from one VM to another with the latest version of Podman, container restore fails with the following error:

Error: creating container storage: reading image "01086e2bbcfab816ef337fa91d8b71862e2ded41c96261ed8a5bb605f92eb9c1": locating image with ID "01086e2bbcfab816ef337fa91d8b71862e2ded41c96261ed8a5bb605f92eb9c1": image not known

This error occurs because the rootfsImageID value stored in the config.dump file in the checkpoint does not match the rootfsImageID value generated on the new VM.

In the following example, we can see that running podman pull results in 3417cee3cb7fa02b251c6157dd3d940172383b796b65aed93f3d82d490839270 on srv1 and 01086e2bbcfab816ef337fa91d8b71862e2ded41c96261ed8a5bb605f92eb9c1 on srv2:

[root@srv1 ~]# podman pull quay.io/radostin/iperf3:latest
Trying to pull quay.io/radostin/iperf3:latest...
Getting image source signatures
Copying blob 4bfe8d9702cf done  2.7MiB / 2.7MiB (skipped: 224.0b = 0.01%)
Copying blob f7406d2490fe done  29.8MiB / 29.8MiB (skipped: 1.9KiB = 0.01%)
Copying config 3417cee3cb done   | 
Writing manifest to image destination
3417cee3cb7fa02b251c6157dd3d940172383b796b65aed93f3d82d490839270

[root@srv2 ~]# podman pull quay.io/radostin/iperf3:latest
Trying to pull quay.io/radostin/iperf3:latest...
Getting image source signatures
Copying blob 4bfe8d9702cf done  2.7MiB / 2.7MiB (skipped: 224.0b = 0.01%)
Copying blob f7406d2490fe done  29.8MiB / 29.8MiB (skipped: 1.9KiB = 0.01%)
Copying config 3417cee3cb done   | 
Writing manifest to image destination
01086e2bbcfab816ef337fa91d8b71862e2ded41c96261ed8a5bb605f92eb9c1

This checksum mismatch causes container restore to fail. This can be confirmed by manually editing the rootfsImageID value in config.dump as shown in this video.

Steps to reproduce the issue

Steps to reproduce the issue using two VMs (A and B):

Run a container in A, and create a container checkpoint

podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
podman container checkpoint -l --export=/tmp/chkpt.tar.gz

Transfer the checkpoint from A to B (e.g., using NFS or scp)
Restore container from checkpoint in B

podman container restore --import=/tmp/chkpt.tar.gz

Describe the results you received

Restore fails with the error message above.

Describe the results you expected

Restore should complete successfully.

podman info output

Podman installed as follows on Fedora Rawhide:

dnf copr enable -y rhcontainerbot/podman-next
dnf --repo='copr*' update  -y
dnf install -y podman

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional information

No response

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-10-18T11:42:38Z

I don't think image ids are meant to be stable with zstd:chunked, at least from reading @mtrmac comments. Looking at the pull output (skipped: 224.0b = 0.01%) this indicates the the image is zstd:chunked compressed and a skopeo inspect confirms that.

So overall I think this keep the image id in the dump and assume you can identify images across hosts will no longer hold. Maybe it should instead store the FQDN + digest?

mtrmac · 2024-10-18T19:09:41Z

Yes, the code makes a best effort to reuse layers and images (incl. using the traditional values of image IDs), but, depending on the capabilities of the registry the image was pulled from, or on the images already present on the machine, pulls of the same image no longer result in the same image ID.

I agree that using a repo@digest reference is the most natural alternative … except that the RepoDigests values returned by Podman are not reliable.

Luap99 · 2024-10-21T09:50:47Z

I agree that using a repo@digest reference is the most natural alternative … except that the RepoDigests values returned by Podman are not reliable.

What does that mean? How can we then identify a image different hosts if this is not reliable, are there other ways?

I must say the whole zstd:chunked switch seems like a even more problematic change to me if the meaning of image IDs is changed and if they are no longer deterministic. If seemingly unrelated things like checkpoint restore break I wonder how many other things depend on a stable image ID. Doesn't even need to be in podman, just external user scripts that we cannot know about.

mtrmac · 2024-10-21T10:58:22Z

I agree that using a repo@digest reference is the most natural alternative … except that the RepoDigests values returned by Podman are not reliable.

What does that mean?

After podman pull repoA@digestA; podman pull repoB@digestB, RepoDigests can contain repoA@digestB which does not exist. It is a fixable design issue (at least for new c/storage instances with no pre-existing images; I’m not sure about the transition from currently-existing stores), we just never got around to fixing it.

How can we then identify an image different hosts if this is not reliable, are there other ways?

Pulled Image IDs should typically be stable when pulled from non-hostile registries with the same capabilities. But the “same capabilities” part can easily not be true (e.g. a mirror is quite likely a different implementation from the upstream registry), and of course the pre-chunked → chunked transition is an one-time ID change.

But then, anyway, there is no standard way to pull an image by image ID. So it seems to me that the snapshot restore should always be preceded by a pull using a repo:tag/repo@digest.

On the third hand, if the image were built by an intentionally malicious producer to be inconsistent in the partial / non-partial views, that could cause a restore to work with “the other” view.

I must say the whole zstd:chunked switch seems like an even more problematic change to me if the meaning of image IDs is changed and if they are no longer deterministic. If seemingly unrelated things like checkpoint restore break I wonder how many other things depend on a stable image ID. Doesn't even need to be in podman, just external user scripts that we cannot know about.

I agree that it’s unsatisfactory. It’s a natural consequence of retrofitting partial pulls into a system where layers are content-addressed by their full contents, but we explicitly don’t want to read the full contents; that breaks the assumptions of the existing content-addressable identification.

So, now, when we pull “the same” layer from a partial-pull-supporting registry, and a partial-pull-not-supporting registry, we can’t deduplicate and them and we must store them separately, with different layer IDs. And that implies different image IDs.

Over the past year or so, I haven’t seen anything to suggest that there is a good alternative. We could do a partial pull, and then re-generate the whole layer to verify the uncompressed digest; then we would be justified in continuing to use the traditional IDs. From a consistency point of view I like that idea, but it would make partial pulls quite a bit more expensive (in CPU and, with a simple implementation, in disk I/O; not in network cost).

Cc: @giuseppe

mtrmac · 2024-10-21T13:27:26Z

Some more discussion about options WRT image IDs happened, recording here:

Re-generating the whole layer and validating the uncompressed digest would actually be pretty expensive; right now, during partial pulls, we don’t digest reused files, only the newly pulled ones. So, with a 99% reuse, we don’t touch 99% of the files. To validate the digest, we would have to read all of them.

Some other alternatives:

Add an option to re-generate the whole layer and verify the digest, so that users who want traditional image IDs can get it and opt into the cost. Downside: one more option, larger testing matrix, and it’s unclear if we could ever remove that option
Extend partial pulls so that we always use the partial pull mechanism and TOC layer identity for zstd:chunked layers (for registries which don’t support range requests, we would read the whole layer, which is not worse in network cost than not using the partial pull mechanism at all). Then the same manifest digest would always result in the same image ID — but the image ID would not be the traditional one for zstd:chunked images.

For now, the default recommendation is to proceed with the varying image IDs as they are, because returning to traditional image IDs would be way too costly, and the always-TOC alternative would not preserve the traditional image ID and it would force users to deal with an ID change anyway.

rst0git added the kind/bug Categorizes issue or PR as related to a bug. label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container restore fails due to rootfsImageID mismatch #24307

Container restore fails due to rootfsImageID mismatch #24307

rst0git commented Oct 17, 2024

Luap99 commented Oct 18, 2024

mtrmac commented Oct 18, 2024

Luap99 commented Oct 21, 2024

mtrmac commented Oct 21, 2024

mtrmac commented Oct 21, 2024

Container restore fails due to rootfsImageID mismatch #24307

Container restore fails due to rootfsImageID mismatch #24307

Comments

rst0git commented Oct 17, 2024

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented Oct 18, 2024

mtrmac commented Oct 18, 2024

Luap99 commented Oct 21, 2024

mtrmac commented Oct 21, 2024

mtrmac commented Oct 21, 2024