runtime: fail when a poststart hook fails #1262

kolyshkin · 2024-07-17T22:35:44Z

Poststart hooks exist in runc since 2015 1, and since that time until today, if a hook returned an error, runc kills the container.

In 2020, commit c166268 (PR #1008) added the following text (which became part of runtime-spec release v1.0.2):

The poststart MUST be invoked by the runtime. If any
poststart hook fails, the runtime MUST log a warning, but the
remaining hooks and lifecycle continue as if the hook had succeeded.

Now, this text conflicted with the pre-existing runtime (runc) behavior, and it still conflicts with the current runc behavior.

At this point, we can either fix runtimes or the spec.

To my mind, fixing the spec is a better approach, because:

initial implementation predates the spec wording by a few years;
the wording in the spec was never implemented (in runc);
returning an error (and stopping the container) seems like a more versatile approach, since a hook can usually choose whether to return an error or not.

Poststart hooks exist in runc since 2015 [1], and since that time until today, if a hook returned an error, runc kills the container. In 2020, commit c166268 (PR opencontainers#1008) added the following text (which became part of runtime-spec release v1.0.2): > 9. The `poststart` MUST be invoked by the runtime. If any > `poststart` hook fails, the runtime MUST log a warning, but the > remaining hooks and lifecycle continue as if the hook had succeeded. Now, this text conflicted with the pre-existing runtime (runc) behavior, and it still conflicts with the current runc behavior. At this point, we can either fix runtimes or the spec. To my mind, fixing the spec is a better approach, because: - initial implementation predates the spec wording by a few years; - the wording in the spec was never implemented (in runc); - returning an error (and stopping the container) seems like a more versatile approach, since a hook can usually choose whether to return an error or not. [1]: opencontainers/runc#392 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2024-07-17T22:43:47Z

OTOH crun does not kill the container if a poststart hook has failed (and the code is from about 2017). Cc @giuseppe

giuseppe · 2024-07-18T07:00:17Z

the spec behavior is validated by: https://github.com/opencontainers/runtime-tools/blob/master/validation/poststart_fail/poststart_fail.go and that is probably the reason I've implemented it this way for crun.

I've no preference on the two behaviors, with the spec behavior the user has more control to decide when to stop the container at the risk of just not seeing/ignoring the warning. The runc behavior instead makes easier to detect when the poststart hook fails, but it terminates the container process once it is already started (is it always safe to do so?)

EDIT: yes it is safe as we call the hooks before exec'ing the container init process

ningmingxiao · 2024-07-18T07:36:05Z

shall we refer k8s poststart https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
The Container's status is not set to RUNNING until the postStart handler completes.

rata · 2024-07-18T15:11:59Z

@ningmingxiao I think k8s poststart is unrelated. IIRC the post-start is just the kubelet doing an exec in the container.

kolyshkin · 2024-07-18T15:18:14Z

we call the hooks before exec'ing the container init process

This is another thing to fix in spec. It currently says

The poststart hooks MUST be called after the user-specified process is executed but before the start operation returns.

but practically runc call hooks before exec'ing container init. If crun is doing the same, maybe we should revise the spec.

ningmingxiao · 2024-07-19T01:47:24Z

I make a test
user-specified process is "sleep 10"
post start is "sleep 6"

TIME(s) PCOMM            PID    PPID   RET ARGS
3.459   crun             2626   2534     0 /usr/bin/crun --debug run test119
3.528   sleep            2629   2626     0 /usr/bin/sleep 6
3.531   sleep            2628   2626     0 /bin/sleep 10

crun run poststart after user-specified process. because pid is bigger than 2628

h-vetinari · 2024-07-24T03:52:57Z

While you're taking a look at the whole hook situation, you might also want to take a glimpse at #1039.

abel-von · 2024-08-02T09:35:42Z

I think the same situation also happened in poststop hook #1265

lifubang · 2024-09-04T10:55:26Z

@opencontainers/runtime-spec-maintainers PTAL
Can we come to a conclusion? Thanks.

lifubang · 2024-09-04T11:44:26Z

If crun is doing the same, maybe we should revise the spec.

Quite agree, the other reason is that maybe we can’t know the user-specified process is really executed successfully because of ‘execve’(of cause we can know it fails).

kolyshkin requested review from AkihiroSuda, crosbymichael, cyphar, dqminh, giuseppe, hqhq, mrunalp, thaJeztah, tianon and utam0k as code owners July 17, 2024 22:35

kolyshkin mentioned this pull request Jul 17, 2024

fix runc's poststart behaviour doesn't match the runtime-spec opencontainers/runc#4348

Draft

kolyshkin removed request for mrunalp, giuseppe, thaJeztah and AkihiroSuda July 17, 2024 22:36

This comment was marked as off-topic.

Sign in to view

lifubang mentioned this pull request Aug 2, 2024

The removal of container root directory should be after poststop run successfully. opencontainers/runc#4363

Open

abel-von mentioned this pull request Aug 2, 2024

runtime: fail when a poststop hook fails #1265

Open

lifubang mentioned this pull request Sep 4, 2024

[1.1] fix runc's poststart behaviour doesn't match the runtime-spec opencontainers/runc#4351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: fail when a poststart hook fails #1262

runtime: fail when a poststart hook fails #1262

kolyshkin commented Jul 17, 2024

kolyshkin commented Jul 17, 2024

This comment was marked as off-topic.

giuseppe commented Jul 18, 2024 •

edited

Loading

ningmingxiao commented Jul 18, 2024 •

edited

Loading

rata commented Jul 18, 2024 •

edited

Loading

kolyshkin commented Jul 18, 2024

ningmingxiao commented Jul 19, 2024 •

edited

Loading

h-vetinari commented Jul 24, 2024

abel-von commented Aug 2, 2024

lifubang commented Sep 4, 2024

lifubang commented Sep 4, 2024

runtime: fail when a poststart hook fails #1262

Are you sure you want to change the base?

runtime: fail when a poststart hook fails #1262

Conversation

kolyshkin commented Jul 17, 2024

kolyshkin commented Jul 17, 2024

This comment was marked as off-topic.

giuseppe commented Jul 18, 2024 • edited Loading

ningmingxiao commented Jul 18, 2024 • edited Loading

rata commented Jul 18, 2024 • edited Loading

kolyshkin commented Jul 18, 2024

ningmingxiao commented Jul 19, 2024 • edited Loading

h-vetinari commented Jul 24, 2024

abel-von commented Aug 2, 2024

lifubang commented Sep 4, 2024

lifubang commented Sep 4, 2024

giuseppe commented Jul 18, 2024 •

edited

Loading

ningmingxiao commented Jul 18, 2024 •

edited

Loading

rata commented Jul 18, 2024 •

edited

Loading

ningmingxiao commented Jul 19, 2024 •

edited

Loading