Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fail when a poststart hook fails #1262

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kolyshkin
Copy link
Contributor

Poststart hooks exist in runc since 2015 1, and since that time until today, if a hook returned an error, runc kills the container.

In 2020, commit c166268 (PR #1008) added the following text (which became part of runtime-spec release v1.0.2):

  1. The poststart MUST be invoked by the runtime. If any
    poststart hook fails, the runtime MUST log a warning, but the
    remaining hooks and lifecycle continue as if the hook had succeeded.

Now, this text conflicted with the pre-existing runtime (runc) behavior, and it still conflicts with the current runc behavior.

At this point, we can either fix runtimes or the spec.

To my mind, fixing the spec is a better approach, because:

  • initial implementation predates the spec wording by a few years;
  • the wording in the spec was never implemented (in runc);
  • returning an error (and stopping the container) seems like a more versatile approach, since a hook can usually choose whether to return an error or not.

Poststart hooks exist in runc since 2015 [1], and since that time until
today, if a hook returned an error, runc kills the container.

In 2020, commit c166268 (PR opencontainers#1008) added the following text
(which became part of runtime-spec release v1.0.2):

> 9. The `poststart` MUST be invoked by the runtime. If any
> `poststart` hook fails, the runtime MUST log a warning, but the
> remaining hooks and lifecycle continue as if the hook had succeeded.

Now, this text conflicted with the pre-existing runtime (runc) behavior,
and it still conflicts with the current runc behavior.

At this point, we can either fix runtimes or the spec.

To my mind, fixing the spec is a better approach, because:
 - initial implementation predates the spec wording by a few years;
 - the wording in the spec was never implemented (in runc);
 - returning an error (and stopping the container) seems like a more
   versatile approach, since a hook can usually choose whether to
   return an error or not.

[1]: opencontainers/runc#392

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

OTOH crun does not kill the container if a poststart hook has failed (and the code is from about 2017). Cc @giuseppe

lifubang

This comment was marked as off-topic.

@giuseppe
Copy link
Member

giuseppe commented Jul 18, 2024

the spec behavior is validated by: https://github.com/opencontainers/runtime-tools/blob/master/validation/poststart_fail/poststart_fail.go and that is probably the reason I've implemented it this way for crun.

I've no preference on the two behaviors, with the spec behavior the user has more control to decide when to stop the container at the risk of just not seeing/ignoring the warning. The runc behavior instead makes easier to detect when the poststart hook fails, but it terminates the container process once it is already started (is it always safe to do so?)

EDIT: yes it is safe as we call the hooks before exec'ing the container init process

@ningmingxiao
Copy link

ningmingxiao commented Jul 18, 2024

shall we refer k8s poststart https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
The Container's status is not set to RUNNING until the postStart handler completes.

@rata
Copy link
Member

rata commented Jul 18, 2024

@ningmingxiao I think k8s poststart is unrelated. IIRC the post-start is just the kubelet doing an exec in the container.

@kolyshkin
Copy link
Contributor Author

we call the hooks before exec'ing the container init process

This is another thing to fix in spec. It currently says

The poststart hooks MUST be called after the user-specified process is executed but before the start operation returns.

but practically runc call hooks before exec'ing container init. If crun is doing the same, maybe we should revise the spec.

@ningmingxiao
Copy link

ningmingxiao commented Jul 19, 2024

I make a test
user-specified process is "sleep 10"
post start is "sleep 6"

TIME(s) PCOMM            PID    PPID   RET ARGS
3.459   crun             2626   2534     0 /usr/bin/crun --debug run test119
3.528   sleep            2629   2626     0 /usr/bin/sleep 6
3.531   sleep            2628   2626     0 /bin/sleep 10

crun run poststart after user-specified process. because pid is bigger than 2628

@h-vetinari
Copy link
Contributor

While you're taking a look at the whole hook situation, you might also want to take a glimpse at #1039.

@abel-von
Copy link

abel-von commented Aug 2, 2024

I think the same situation also happened in poststop hook #1265

@lifubang
Copy link
Member

lifubang commented Sep 4, 2024

@opencontainers/runtime-spec-maintainers PTAL
Can we come to a conclusion? Thanks.

@lifubang
Copy link
Member

lifubang commented Sep 4, 2024

If crun is doing the same, maybe we should revise the spec.

Quite agree, the other reason is that maybe we can’t know the user-specified process is really executed successfully because of ‘execve’(of cause we can know it fails).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants