Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: allow overriding command without modifying checks config file #62

Open
jameslamb opened this issue May 28, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@jameslamb
Copy link
Member

Description

container-canary should support setting the command: for running an image without modifying the file containing validation checks.

Benefits of this work

Would separate details of how the validation checks are run from what is being validated, making it easier to share validation manifests across many different images. See "Motivation" below for details.

Acceptance Criteria

  • it is possible to set / override the command: used when container-canary starts up a container without modifying the file that check configurations are stored in

Approach

This might be accomplished by adding new command-line arguments to container-canary, like --cmd to set the command.

canary validate \
    --file tests.yaml \
    --cmd "sh -c 'sleep 9999'" \
    ${IMAGE_URI}

The more Kubernetes-thonic (is that a word?) but also more difficult to implement approach would be to allow providing multiple files and merging them all together.

For example, something like this:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    --file ./override-command.yaml \
    ubuntu:22.04

Where override-command.yaml just contains something like this:

command:
  - /bin/sh
  - -c
  - "sleep 1234"

And where the implication is that later files are patched on top of earlier files.

Following the way that some of these other tools work:

Notes

Motivation

container-canary starts up a container based on the image passed to it, then run checks inside that running container.

If that image's default CMD doesn't result in starting a process (e.g. like a webserver), then container-canary will fail to run on it.

Consider the following:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    ubuntu:22.04

That kubeflow.yaml file doesn't have a command: entry, so the container will run whatever the default CMD is on the image.

In this case, it's just a shell.

docker inspect ubuntu:22.04 \
| jq '.[0].ContainerConfig.Cmd'
[
  "/bin/sh",
  "-c",
  "#(nop) ",
  "CMD [\"/bin/bash\"]"
]

And as a result, container-canary fails to run any checks.

Error: container failed to start

In the current state of container-canary, you have the following options:

  1. change the default CMD to something long-running, like python -m http.server 80
  2. copy the contents of that remote config and add a command: block tacking on some long-running command

If instead you could override just the command: but still reuse the remote config file, then that one config could be the source of truth for something like "what characteristics does a valid Kubeflow notebook image have" for many different images spread over many repositories, without them needing to manually hold their own copies of it or add unnecessary CMD entries in their images just for the sake of this validation.

@jameslamb jameslamb added the enhancement New feature or request label May 28, 2024
@jameslamb jameslamb changed the title feature request: allow command to be provided from command-line arguments feature request: allow overriding command without modifying checks config file May 28, 2024
@jacobtomlinson
Copy link
Member

This seems totally reasonable to me. Thanks for taking the time to think the UX through.

And where the implication is that later files are patched on top of earlier files.

I don't think this would be my default assumption. I would expect multiple files to just add more checks with potentially a different command and other config (we might want to implement this though!).

Another potential API might be to match the docker run and kubectl exec behavior and allow you to specify a command after a --.

canary validate --file tests.yaml ${IMAGE_URI} -- sh -c 'sleep 9999'

@jameslamb
Copy link
Member Author

I would expect multiple files to just add more checks

You know, I started this because I specifically was interested in overriding the command, but it doesn't have to be limited to that. Thinking about this comment, I like the "provide multiple files and they get resolved to a single set of configuration" even more.

There could be rules for resolving fields based on the nature of the field, just like Helm and Flux do. So for example:

  • command: = later values overwrite earlier values
  • checks: = lists are merged
  • ports: = lists are merged
  • volumes: = lists are merged
  • env: = lists are merged

And to be determined whether "lists are merged" means recursively... I could see a case for, for example, just wanting to update the timeoutSeconds of a probe provided by another config.

@jacobtomlinson
Copy link
Member

I'm not sure I would expect anything to be merged though. I would probably expect each file to be run independently in sequence. For example if I provided a file for binder and for kubeflow I would expect it to run both, not merge them into some binder/kubeflow hybrid.

@jameslamb
Copy link
Member Author

For example if I provided a file for binder and for kubeflow I would expect it to run both, not merge them into some binder/kubeflow hybrid

If container-canary did something like the manifest-merging I proposed, it wouldn't prevent you from running separate things separately.

Those would just be separate invocations of the tool, like

# do only the binder stuff
container-canary --file binder.yaml

# do only the kubeflow stuff
container-canary --file kubeflow.yaml

I'm thinking about this manifest-merging idea specifically as a way to get more coverage from the pattern of "one location holds the source of truth for what must be true for an image to be compliant with {deployment_pattern}, and that's re-used by many other repos".

For example, let's look at this kubeflow example:

- name: NB_PREFIX
description: 🧭 Correctly routes the NB_PREFIX
probe:
httpGet:
path: /hub/jovyan/lab
port: 8888
failureThreshold: 30

The 30-second timeout isn't really a part of the Kubeflow requirements, is it? I believe the main requirement is just that a successful response is eventually returned on requests to that URL.

What if, because of some feature of my image, it sometimes takes 31 seconds to get a response on /hub/jovyan/lab:8888?

In the current state of container-canary, I'd have to do one of the following:

  • refactor my image to try to bring that response time down
  • copy the entirety of the kubeflow.yaml and modify my local copy of it
    • or achieve the same effect by carrying around a path and applying that patch

It'd be a shame if I ended up copying the whole kubeflow.yaml and vendoring it in my repo... that means that if here in NVIDIA/container-canary we update that example in response to some changes in Kubeflow or learning something new about Kubeflow, my repo wouldn't automatically get the benefit of that knowledge.


All that said... my primary reason for opening this, and the thing I most immediately want to test the RAPIDS images, is customizing the command.

I like the proposed idea from #62 (comment) of using -- to pass a command. If I try to put together a PR implementing just that, would you consider it?

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jun 18, 2024

I think my primary concern is user confusion. I totally hear the logic of what you're saying, but it sounds a lot like what users in the docker ecosystem would expect.

For example docker compose merges manifests in the way you describe.

$ docker compose -f compose.yml -f compose.override.yml up

However if you look at Kubernetes tooling like kubectl then manifests are applied independently and sequentially.

$ kubectl apply -f foo.yaml -f bar.yaml
pod/foo created
pod/bar created

To achieve overriding things in the Kubernetes space you would use kustomize instead to merge manifests ahead of them being passed to kubectl. Or at runtime using the -k flag instead of the -f flag.

Given the goal of canary is to fit into the Kubernetes ecosystem I would argue that we should be leaning more down the kubectl/kustomize path rather than the docker path.


If I try to put together a PR implementing [the -- syntax], would you consider it?

Absolutely. Given that kubectl behaves this way it makes total sense to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants