Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Step to surface whether a manifest is optional #2552

Closed
wants to merge 1 commit into from

Conversation

fgiloux
Copy link
Contributor

@fgiloux fgiloux commented Jan 6, 2022

Signed-off-by: Frederic Giloux fgiloux@redhat.com

Description of the change:

This pull request adds some logic for avoiding failure of the InstallPlan when a manifest provided by a bundle cannot be installed and has been flagged as optional through a bundle property. Here is the enhancement proposal

This PR relies on two other PRs for the api and the operator-registry repositories.
operator-framework/api#209
operator-framework/operator-registry#893

I was not too sure how to proceed in such a case and have created the PRs at roughly the same time. It also means that for the code in this PR to work it needs to have the code of the other PRs available. For my tests I used "replace" pointing to a local repository in go.mod

Motivation for the change:

VerticalPodAutoscaler, ServiceMonitors for instances are custom resources and may not be available on all clusters. Operator authors may want to take benefit of these mechanisms for clusters having them but not having the installation fail on clusters without them as they may be seen as nice to have. This is what allows this PR.

See the enhancement proposal for details.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 6, 2022
@openshift-ci
Copy link

openshift-ci bot commented Jan 6, 2022

Hi @fgiloux. Thanks for your PR.

I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@timflannagan
Copy link
Contributor

I was not too sure how to proceed in such a case and have created the PRs at roughly the same time. It also means that for the code in this PR to work it needs to have the code of the other PRs available. For my tests I used "replace" pointing to a local repository in go.mod

You can try and cut a release in your API fork and point the go.mod to your fork using the replace directive, and we can make a TODO comment that we remove that replace pin before merging this PR.

@fgiloux fgiloux force-pushed the master branch 2 times, most recently from 48d5a35 to 8820bd9 Compare January 7, 2022 10:36
@openshift-ci
Copy link

openshift-ci bot commented Jan 7, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fgiloux
To complete the pull request process, please assign perdasilva after the PR has been reviewed.
You can assign the PR to them by writing /assign @perdasilva in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fgiloux fgiloux force-pushed the master branch 2 times, most recently from 78e7735 to e9c0772 Compare January 7, 2022 11:05
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 11, 2022
Copy link
Contributor

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the go.mod + vendor changes into their own commit, so viewing the full diff is a bit easier? One of these days we'll remove the vendor directory entirely from OLM, so this wouldn't be needed.

@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 14, 2022

/test unit

@openshift-ci
Copy link

openshift-ci bot commented Jan 14, 2022

@fgiloux: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test unit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 14, 2022

Can we separate the go.mod + vendor changes into their own commit, so viewing the full diff is a bit easier? One of these days we'll remove the vendor directory entirely from OLM, so this wouldn't be needed.

I have separated the commits and done a rebase

@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 18, 2022

Hi @timflannagan any further comment on this pull request?

@timflannagan
Copy link
Contributor

@fgiloux I'll try and get back to this sometime this week but I'm fairly tied up today. It looks like this PR is failing some of the linting and generation checks? You may need to fix the static validation errors locally and push up the results to make the checks happy.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2022
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 18, 2022

@timflannagan after the API rebase we now have: the API depending on k8s.io/api v0.23.0 whereas OLM was on k8s.io/api v0.22.2 and breaks when the dependency is bumped. This was introduced by: operator-framework/api@12a70c0
How do we move forward?

go.mod Outdated
)

replace (
// controller runtime
github.com/openshift/api => github.com/openshift/api v0.0.0-20211014063134-be2a7fb8aa44
github.com/openshift/client-go => github.com/openshift/client-go v0.0.0-20200326155132-2a6cd50aedd0 // release-4.5

// Temporary for optional manifests before the changes get merged in the API and operator-registry repo
github.com/operator-framework/api v0.11.1 => github.com/fgiloux/api v0.10.8-0.20220118162926-a11336b43231
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can likely update this now to pick up the new commit through a psuedoversion tag, or reach out to the team-operator-fw slack channel for cutting a new o-f/api minor version release.

@timflannagan
Copy link
Contributor

@fgiloux It looks like the CI failures are complaining about various breaking changes that happen in the v0.23.x kube dependencies. We can either tackle bumping OLM's kube dependencies, and making the requisite changes to the codebase in another PR, or update the root go.mod and utilize the replace stanza to ensure we're using the v0.22.x kube dependencies. WYDT?

@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 19, 2022

It looks like the CI failures are complaining about various breaking changes that happen in the v0.23.x kube dependencies.

Yes it is what I was saying earlier.

We can either tackle bumping OLM's kube dependencies, and making the requisite changes to the codebase in another PR, or update the root go.mod and utilize the replace stanza to ensure we're using the v0.22.x kube dependencies. WYDT?

I think that bumping OLM dependencies is required by any way, not just for this PR but for anything that needs to import a newer version of the API and for compatibility with newer Kubernetes versions. I will open an issue and start looking at it.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2022
@fgiloux fgiloux force-pushed the master branch 2 times, most recently from 6a991a4 to 3162316 Compare February 3, 2022 11:03
@timflannagan
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 3, 2022
@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 3, 2022

Hi @timflannagan based on o-f api v0.13.0 all e2e tests passed except 1. I ran it locally and it was successful.
One thing is that I had to downgrade grpc version to v1.40.0 in operator-registry due to the issue you identified.
Can you please take a look?

@timflannagan timflannagan linked an issue Feb 11, 2022 that may be closed by this pull request
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2022
@fgiloux fgiloux force-pushed the master branch 3 times, most recently from b444a91 to 25a9c6d Compare March 22, 2022 09:26
@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 22, 2022

@timflannagan @joelanford could you please take a look?
I have rerun the failing e2e tests locally.
"can satisfy an unassociated ClusterServiceVersion's non-ownership requirement" was successful.
The other two were not: "should surface components in its status" and "should automatically adopt components". I cloned the master branch and run the same tests without my modifications with the same results.

This latest version applies changes requested by Joe:

  • the key for identifying an optional manifest is now: group/kind/namespace/name. It works around the issue that the filename may not be available. This also means that no change to operator-registry is required anymore.
    An example of a property file:
$ cat bundle/metadata/properties.yaml 
properties:
- type: 'olm.manifests.optional'
  value:
    manifests:
    - 'monitoring.coreos.com/PrometheusRule//rule'

Note that for a cluster scoped resource or a resource where the namespace is not specified the part of the key should be left empty, hence //. The key is case sensitive.

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and added comments.

@@ -230,3 +284,39 @@ func NewServiceAccountStepResources(csv *v1alpha1.ClusterServiceVersion, catalog
}
return rbacSteps, nil
}

type optionalManifests struct {
Files []string `json:"manifests"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it supposed to be json:"files" according to the EP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per previous review and comment below: this is implemented the change that was requested


// isOptional returns a func giving whether a resource is in the list of optional manifests
// or false if any issue occurs.
func isOptional(properties []*api.Property) func(key string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func is doing operation that is not matching to what is described in the EP. In the EP, it is file-name-based identifier but for this implementation, this is a domain-based identifier. If this implementation is the intended choice, then the EP should be updated so that reviewers can look back to EP for references. Otherwise, this is very confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at a previous comment, it seems the domain-based identifier is a change that you intended to do. I think an update PR for the EP is in order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dinhxuanvu yes this is a change. It was requested by @joelanford during review. I will create an update PR for the EP but wanted to avoid too many round trips. We have already had a fair amount here.

}
steps = append(steps, operatorServiceAccountSteps...)
return steps, nil
saNamespaces := []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear for why we need the namespace here. The catalogsource namespace is used because that where the bundle is unpacked but using it for identifier is really a safe choice? What happens if multiple operators have the same resources and all in the same catalogsource?

Copy link
Contributor Author

@fgiloux fgiloux Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of an edge case and I am open to suggestions. The original intent of the enhancement is to allow operator installation when APIs of "optional resources" are not available.
ServiceAccounts are core resources and as such their API will always be available.
Wanting to make things consistent I kept what was populated as namespace in the steps for them. Another option would be to keep the namespace empty. Do you think something else would make more sense?

if step.Optional {
switch k8serrors.ReasonForError(err) {
case
metav1.StatusReasonUnauthorized,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we handle the case of StatusReasonAlreadyExists somewhere? Or do we need to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this deviates from the EP. There should not be an existing resource (I don't see a reason for it) with same namespace/name when the operator gets installed. Failing the InstallPlan would give the opportunity to hte cluster admin to remove it and retry the installation. This would guarantee that the available resource is the intended one. In that case I would update the EP and align it with the implementation. Let me know if you think that it should be done differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the EP. Having had a closer look, StatusReasonAlreadyExists should not pop up as the logic in the called function is to try an update after create if the resource already exists.

@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 30, 2022

pushed a rebase and a fix for field and var names in unit tests that were not reflecting the new approach

@fgiloux
Copy link
Contributor Author

fgiloux commented Mar 31, 2022

Created a PR to update the enhancement proposal with the new approach

@fgiloux fgiloux force-pushed the master branch 2 times, most recently from 9fc127e to d50bafa Compare April 3, 2022 15:46
Signed-off-by: Frederic Giloux <fgiloux@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Apr 15, 2022

@fgiloux: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2022
@perdasilva
Copy link
Collaborator

closing as we we'll likely want to put this effort into v1 instead

@perdasilva perdasilva closed this Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optional OLM supported resources
4 participants