Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add printing of kubectl error message #714

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MrMarvin
Copy link

What are you trying to accomplish with this PR?
Currently if there is some error applying a manifest happening at the kubectl apply call, this error will get swallowed by krane. It is reported that an error occurred, but not what the error is.

This behaviour is making it unfeasible to debug underlying issues, for example permission issues on the (service) account used to run krane in a CD environment.

Example:

[INFO][2020-04-20 14:21:46 +0000][staging][someapp-staging]	
[INFO][2020-04-20 14:21:46 +0000][staging][someapp-staging]	------------------------------------------Result: FAILURE-------------------------------------------
[FATAL][2020-04-20 14:21:46 +0000][staging][someapp-staging]	Command failed: apply -f /tmp/d20200420-9481-sf4wgt --prune --all --prune-whitelist\=/v1/configmap --prune-whitelist\=/v1/endpoints --prune-whitelist\=/v1/event --prune-whitelist\=/v1/limitrange --prune-whitelist\=/v1/persistentvolumeclaim --prune-whitelist\=/v1/pod --prune-whitelist\=/v1/podtemplate --prune-whitelist\=/v1/replicationcontroller --prune-whitelist\=/v1/resourcequota --prune-whitelist\=/v1/secret --prune-whitelist\=/v1/serviceaccount --prune-whitelist\=/v1/service --prune-whitelist\=apps/v1/daemonset --prune-whitelist\=apps/v1/deployment --prune-whitelist\=apps/v1/replicaset --prune-whitelist\=apps/v1/statefulset --prune-whitelist\=autoscaling/v2beta1/horizontalpodautoscaler --prune-whitelist\=batch/v1beta1/cronjob --prune-whitelist\=batch/v1/job --prune-whitelist\=certmanager.k8s.io/v1alpha1/certificaterequest --prune-whitelist\=certmanager.k8s.io/v1alpha1/certificate --prune-whitelist\=certmanager.k8s.io/v1alpha1/challenge --prune-whitelist\=certmanager.k8s.io/v1alpha1/issuer --prune-whitelist\=certmanager.k8s.io/v1alpha1/order --prune-whitelist\=cloud.google.com/v1beta1/backendconfig --prune-whitelist\=crd.projectcalico.org/v1/networkpolicy --prune-whitelist\=extensions/v1beta1/ingress --prune-whitelist\=internal.autoscaling.k8s.io/v1alpha1/capacityrequest --prune-whitelist\=networking.gke.io/v1beta1/managedcertificate --prune-whitelist\=nodemanagement.gke.io/v1alpha1/updateinfo --prune-whitelist\=policy/v1beta1/poddisruptionbudget --prune-whitelist\=rbac.authorization.k8s.io/v1/rolebinding --prune-whitelist\=rbac.authorization.k8s.io/v1/role --prune-whitelist\=scalingpolicy.kope.io/v1alpha1/scalingpolicy
[FATAL][2020-04-20 14:21:46 +0000][staging][someapp-staging]	

Not helpful, as even if a user would have access to the system/environment to manually replicate the kubectl apply call, krane already cleaned up the temp dir.

With this change:

[INFO][2020-04-20 15:44:40 +0000][staging][someapp-staging]	------------------------------------------Result: FAILURE-------------------------------------------
[FATAL][2020-04-20 15:44:40 +0000][staging][someapp-staging]	Command failed: apply -f /tmp/d20200420-757-1v2j9wa --prune --all --prune-whitelist\=/v1/configmap --prune-whitelist\=/v1/endpoints --prune-whitelist\=/v1/event --prune-whitelist\=/v1/limitrange --prune-whitelist\=/v1/persistentvolumeclaim --prune-whitelist\=/v1/pod --prune-whitelist\=/v1/podtemplate --prune-whitelist\=/v1/replicationcontroller --prune-whitelist\=/v1/resourcequota --prune-whitelist\=/v1/secret --prune-whitelist\=/v1/serviceaccount --prune-whitelist\=/v1/service --prune-whitelist\=apps/v1/daemonset --prune-whitelist\=apps/v1/deployment --prune-whitelist\=apps/v1/replicaset --prune-whitelist\=apps/v1/statefulset --prune-whitelist\=autoscaling/v2beta1/horizontalpodautoscaler --prune-whitelist\=batch/v1beta1/cronjob --prune-whitelist\=batch/v1/job --prune-whitelist\=certmanager.k8s.io/v1alpha1/certificaterequest --prune-whitelist\=certmanager.k8s.io/v1alpha1/certificate --prune-whitelist\=certmanager.k8s.io/v1alpha1/challenge --prune-whitelist\=certmanager.k8s.io/v1alpha1/issuer --prune-whitelist\=certmanager.k8s.io/v1alpha1/order --prune-whitelist\=cloud.google.com/v1beta1/backendconfig --prune-whitelist\=crd.projectcalico.org/v1/networkpolicy --prune-whitelist\=extensions/v1beta1/ingress --prune-whitelist\=internal.autoscaling.k8s.io/v1alpha1/capacityrequest --prune-whitelist\=networking.gke.io/v1beta1/managedcertificate --prune-whitelist\=nodemanagement.gke.io/v1alpha1/updateinfo --prune-whitelist\=policy/v1beta1/poddisruptionbudget --prune-whitelist\=rbac.authorization.k8s.io/v1/rolebinding --prune-whitelist\=rbac.authorization.k8s.io/v1/role --prune-whitelist\=scalingpolicy.kope.io/v1alpha1/scalingpolicy:
error: error pruning namespaced object networking.gke.io/v1beta1, kind=managedcertificate: managedcertificates.networking.gke.io is forbidden: user "someserviceaccount@shopify-someapp-staging.iam.gserviceaccount.com" cannot list resource "managedcertificates" in api group "networking.gke.io" in the namespace "someapp-staging"
[FATAL][2020-04-20 15:44:40 +0000][staging][someapp-staging]	

How is this accomplished?
Add the collected error to the message presented to the user.

What could go wrong?
Leaking sensitive information, maybe, somehow, eventually?

@MrMarvin MrMarvin requested a review from a team as a code owner April 21, 2020 09:16
@timothysmith0609
Copy link
Contributor

Thanks for your contribution!

Error reporting is tricky in Krane, since we need to avoid the scenario where we leak secret information (this can happen if kubectl apply fails and dumps an unencrypted secret resource, for example). Krane tries to be smart about this, only suppressing output when there is a risk of exposing sensitive information, but there are fundamental difficulties that constrain our ability to guarantee correct behaviour. This logic exists in record_apply_failure (the line preceding your change). If you are not deploying any sensitive content, then the solution here is to figure out why output is being suppressed in the first place. Can you confirm whether or not output should be suppressed in your stated case? If not, we can explore that.

@MrMarvin
Copy link
Author

Thanks @timothysmith0609, this helps to clarify the situation.
While I agree that there's its a fundamental issue to filter for secrets from downstream kubectl invocations, I do not completely agree with the assumptions made.
Not potentially exposing secrets must be evaluated against blocking users from using krane at all. In this particular case, the only option left would be to ditch using krane, fall back to managing kubectl applys manually (shell script or whatever) again, including the ability to shoot yourself in the foot with leaking secrets. It is unusable as it is right now.

We should at least

  1. Clearly state that additional error context is available but withheld for security reasons.
  2. Offer a way to get that output or allow an administrator to re-run the exact same kubectl command manually, i.e. not wiping tmp dirs in error case.

Can you confirm whether or not output should be suppressed in your stated case? If not, we can explore that.

To be honest, I don't know. We do deploy a secrets.ejson along our resources file with krane. So I guess yes? With my understanding of how krane splits all resources into multiple steps, this particular step only reference to 'secret' would be in it's prune-whitelists.
I posted to complete and unmodified output above, it does not contain anything sensitive. So maybe not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants