feature request: allow overriding command without modifying checks config file #62

jameslamb · 2024-05-28T22:30:50Z

Description

container-canary should support setting the command: for running an image without modifying the file containing validation checks.

Benefits of this work

Would separate details of how the validation checks are run from what is being validated, making it easier to share validation manifests across many different images. See "Motivation" below for details.

Acceptance Criteria

it is possible to set / override the command: used when container-canary starts up a container without modifying the file that check configurations are stored in

Approach

This might be accomplished by adding new command-line arguments to container-canary, like --cmd to set the command.

canary validate \
    --file tests.yaml \
    --cmd "sh -c 'sleep 9999'" \
    ${IMAGE_URI}

The more Kubernetes-thonic (is that a word?) but also more difficult to implement approach would be to allow providing multiple files and merging them all together.

For example, something like this:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    --file ./override-command.yaml \
    ubuntu:22.04

Where override-command.yaml just contains something like this:

command:
  - /bin/sh
  - -c
  - "sleep 1234"

And where the implication is that later files are patched on top of earlier files.

Following the way that some of these other tools work:

helm values files: https://helm.sh/docs/chart_template_guide/values_files/
Kustomize overlays: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays

Notes

Motivation

container-canary starts up a container based on the image passed to it, then run checks inside that running container.

If that image's default CMD doesn't result in starting a process (e.g. like a webserver), then container-canary will fail to run on it.

Consider the following:

canary validate \
    --file https://raw.githubusercontent.com/NVIDIA/container-canary/main/examples/kubeflow.yaml \
    ubuntu:22.04

That kubeflow.yaml file doesn't have a command: entry, so the container will run whatever the default CMD is on the image.

In this case, it's just a shell.

docker inspect ubuntu:22.04 \
| jq '.[0].ContainerConfig.Cmd'

[
  "/bin/sh",
  "-c",
  "#(nop) ",
  "CMD [\"/bin/bash\"]"
]

And as a result, container-canary fails to run any checks.

Error: container failed to start

In the current state of container-canary, you have the following options:

change the default CMD to something long-running, like python -m http.server 80
copy the contents of that remote config and add a command: block tacking on some long-running command

If instead you could override just the command: but still reuse the remote config file, then that one config could be the source of truth for something like "what characteristics does a valid Kubeflow notebook image have" for many different images spread over many repositories, without them needing to manually hold their own copies of it or add unnecessary CMD entries in their images just for the sake of this validation.

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2024-06-03T09:59:27Z

This seems totally reasonable to me. Thanks for taking the time to think the UX through.

And where the implication is that later files are patched on top of earlier files.

I don't think this would be my default assumption. I would expect multiple files to just add more checks with potentially a different command and other config (we might want to implement this though!).

Another potential API might be to match the docker run and kubectl exec behavior and allow you to specify a command after a --.

canary validate --file tests.yaml ${IMAGE_URI} -- sh -c 'sleep 9999'

jameslamb · 2024-06-04T14:16:26Z

I would expect multiple files to just add more checks

You know, I started this because I specifically was interested in overriding the command, but it doesn't have to be limited to that. Thinking about this comment, I like the "provide multiple files and they get resolved to a single set of configuration" even more.

There could be rules for resolving fields based on the nature of the field, just like Helm and Flux do. So for example:

command: = later values overwrite earlier values
checks: = lists are merged
ports: = lists are merged
volumes: = lists are merged
env: = lists are merged

And to be determined whether "lists are merged" means recursively... I could see a case for, for example, just wanting to update the timeoutSeconds of a probe provided by another config.

jacobtomlinson · 2024-06-04T15:17:53Z

I'm not sure I would expect anything to be merged though. I would probably expect each file to be run independently in sequence. For example if I provided a file for binder and for kubeflow I would expect it to run both, not merge them into some binder/kubeflow hybrid.

jameslamb · 2024-06-17T19:56:55Z

For example if I provided a file for binder and for kubeflow I would expect it to run both, not merge them into some binder/kubeflow hybrid

If container-canary did something like the manifest-merging I proposed, it wouldn't prevent you from running separate things separately.

Those would just be separate invocations of the tool, like

# do only the binder stuff
container-canary --file binder.yaml

# do only the kubeflow stuff
container-canary --file kubeflow.yaml

I'm thinking about this manifest-merging idea specifically as a way to get more coverage from the pattern of "one location holds the source of truth for what must be true for an image to be compliant with {deployment_pattern}, and that's re-used by many other repos".

For example, let's look at this kubeflow example:

container-canary/examples/kubeflow.yaml

Lines 46 to 52 in c473896

    
           - name: NB_PREFIX 
        
             description: 🧭 Correctly routes the NB_PREFIX 
        
             probe: 
        
               httpGet: 
        
                 path: /hub/jovyan/lab 
        
                 port: 8888 
        
               failureThreshold: 30

The 30-second timeout isn't really a part of the Kubeflow requirements, is it? I believe the main requirement is just that a successful response is eventually returned on requests to that URL.

What if, because of some feature of my image, it sometimes takes 31 seconds to get a response on /hub/jovyan/lab:8888?

In the current state of container-canary, I'd have to do one of the following:

refactor my image to try to bring that response time down
copy the entirety of the kubeflow.yaml and modify my local copy of it
- or achieve the same effect by carrying around a path and applying that patch

It'd be a shame if I ended up copying the whole kubeflow.yaml and vendoring it in my repo... that means that if here in NVIDIA/container-canary we update that example in response to some changes in Kubeflow or learning something new about Kubeflow, my repo wouldn't automatically get the benefit of that knowledge.

All that said... my primary reason for opening this, and the thing I most immediately want to test the RAPIDS images, is customizing the command.

I like the proposed idea from #62 (comment) of using -- to pass a command. If I try to put together a PR implementing just that, would you consider it?

jacobtomlinson · 2024-06-18T11:07:42Z

I think my primary concern is user confusion. I totally hear the logic of what you're saying, but it sounds a lot like what users in the docker ecosystem would expect.

For example docker compose merges manifests in the way you describe.

$ docker compose -f compose.yml -f compose.override.yml up

However if you look at Kubernetes tooling like kubectl then manifests are applied independently and sequentially.

$ kubectl apply -f foo.yaml -f bar.yaml
pod/foo created
pod/bar created

To achieve overriding things in the Kubernetes space you would use kustomize instead to merge manifests ahead of them being passed to kubectl. Or at runtime using the -k flag instead of the -f flag.

Given the goal of canary is to fit into the Kubernetes ecosystem I would argue that we should be leaning more down the kubectl/kustomize path rather than the docker path.

If I try to put together a PR implementing [the -- syntax], would you consider it?

Absolutely. Given that kubectl behaves this way it makes total sense to implement this.

jameslamb added the enhancement New feature or request label May 28, 2024

jameslamb changed the title ~~feature request: allow command to be provided from command-line arguments~~ feature request: allow overriding command without modifying checks config file May 28, 2024

jameslamb mentioned this issue Jul 8, 2024

[FEA] add tests on image characteristics rapidsai/docker#667

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: allow overriding command without modifying checks config file #62

feature request: allow overriding command without modifying checks config file #62

jameslamb commented May 28, 2024

jacobtomlinson commented Jun 3, 2024

jameslamb commented Jun 4, 2024

jacobtomlinson commented Jun 4, 2024

jameslamb commented Jun 17, 2024

jacobtomlinson commented Jun 18, 2024 •

edited

Loading

feature request: allow overriding command without modifying checks config file #62

feature request: allow overriding command without modifying checks config file #62

Comments

jameslamb commented May 28, 2024

Description

Benefits of this work

Acceptance Criteria

Approach

Notes

Motivation

jacobtomlinson commented Jun 3, 2024

jameslamb commented Jun 4, 2024

jacobtomlinson commented Jun 4, 2024

jameslamb commented Jun 17, 2024

jacobtomlinson commented Jun 18, 2024 • edited Loading

jacobtomlinson commented Jun 18, 2024 •

edited

Loading