Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle the missing mlmd service-ca cert gracefully #728

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HumairAK
Copy link
Contributor

@HumairAK HumairAK commented Oct 18, 2024

This will handle the following dspo log message more gracefully:

2024-10-18T16:25:29-04:00       INFO    Applying ML-Metadata (MLMD) Resources   {"namespace": "dspa2", "dspa_name": "sample"}
2024-10-18T16:25:29-04:00       INFO    Updating components endpoints   {"namespace": "dspa2", "dspa_name": "sample"}
2024-10-18T16:25:29-04:00       ERROR   Reconciler error        {"controller": "datasciencepipelinesapplication", "controllerGroup": "datasciencepipelinesapplications.opendatahub.io", "controllerKind": "DataSciencePipelinesApplication", "DataSciencePipelinesApplication": {"name":"sample","namespace":"dspa2"}, "namespace": "dspa2", "name": "sample", "reconcileID": "0a19a4b2-fd85-4094-8be8-c7a57eddd4c3", "error": "secret containing the certificate for MLMD gRPC Server was not created yet"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /home/hukhan/go/1.21.3/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /home/hukhan/go/1.21.3/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /home/hukhan/go/1.21.3/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

This is an expected state, and should not print a stack trace, but we still want to stacktrace other errors. So I've introduced a custom error for this scenario. In the future we should handle this via states for components, for example "state: loading" or "state: waitingOnDependency", instead of errors, but this should be approached consistently for all components, this pr is a simple fix until then.

Result:

2024-10-18T16:29:32-04:00       INFO    Applying ML-Metadata (MLMD) Resources   {"namespace": "dspa2", "dspa_name": "sample"}
2024-10-18T16:29:32-04:00       INFO    MLMD gRPC Server cert secret not found, this is likely because it has not been created yet      {"namespace": "dspa2", "dspa_name": "sample"}
2024-10-18T16:29:32-04:00       INFO    Updating components endpoints   {"namespace": "dspa2", "dspa_name": "sample"}

Copy link
Contributor

openshift-ci bot commented Oct 18, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from humairak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

When mlmd is being reconciled, it expects a secret with certs that is
created by the service-ca (when podtopodtls is enabled). This is an
expected outcome, so we should not be reporting stacktraces for this.
This change instead catches this scenario via a custom error for such
lagging dependencies, and logs it at info level without a stack trace.

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-728

1 similar comment
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-728

@@ -313,7 +314,14 @@ func (r *DSPAReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.
err = r.ReconcileMLMD(ctx, dspa, params)
if err != nil {
r.setStatusAsNotReady(config.MLMDProxyReady, err, dspaStatus.SetMLMDProxyStatus)
return ctrl.Result{}, err
// TODO: this (and other components) should handle these scenarios via states or statuses instead of error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should link the existing issue here.
https://issues.redhat.com/browse/RHOAIENG-13321

@VaniHaripriya
Copy link
Contributor

/lgtm

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants