Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netpols block kubeapi in long lived EKS cluster #821

Open
ntwkninja opened this issue Sep 25, 2024 · 3 comments
Open

Netpols block kubeapi in long lived EKS cluster #821

ntwkninja opened this issue Sep 25, 2024 · 3 comments
Labels
possible-bug Something may not be working

Comments

@ntwkninja
Copy link
Member

ntwkninja commented Sep 25, 2024

Environment

Device and OS: Bottlerocket
App version: 1.30
Kubernetes distro being used: AWS EKS
Other:

Steps to reproduce

  1. Deploy UDS Core with standard accoutrements
  2. Wait a few days for API IPs to change
  3. Try to do something that triggers an api action
  4. Check metrics server, neuvector, monitoring, promtail, etc. for errors

Expected result

netpols for kubeapi update as AWS updates

Actual Result

kubepi addresses are not updated after being initially set

Visual Proof (screenshots, videos, text, etc)

NAMESPACE              LAST SEEN                  TYPE      REASON                    OBJECT                                          MESSAGE
istio-admin-gateway    31m (x56 over 21d)         Normal    SuccessfullyReconciled    Service/admin-ingressgateway                    Successfully reconciled
istio-login-gateway    31m (x55 over 21d)         Normal    SuccessfullyReconciled    Service/login-ingressgateway                    Successfully reconciled
istio-tenant-gateway   31m (x56 over 21d)         Normal    SuccessfullyReconciled    Service/tenant-ingressgateway                   Successfully reconciled
metrics-server         29m (x2451 over 4d19h)     Warning   Unhealthy                 Pod/metrics-server-59c9dddf69-8l4fk             Liveness probe failed: Get "http://100.64.75.152:15020/app-health/metrics-server/livez": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
metrics-server         4m1s (x29499 over 4d19h)   Warning   BackOff                   Pod/metrics-server-59c9dddf69-8l4fk             Back-off restarting failed container metrics-server in pod metrics-server-59c9dddf69-8l4fk_metrics-server(f619eae8-61d7-420c-a104-0c786e51242a)
istio-admin-gateway    2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/admin-ingressgateway    failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
istio-login-gateway    2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/login-ingressgateway    failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
istio-system           2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/istiod                  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
istio-tenant-gateway   2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/tenant-ingressgateway   failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
keycloak               2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/keycloak                failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
zarf                   2m47s (x3259 over 13h)     Warning   FailedGetResourceMetric   HorizontalPodAutoscaler/zarf-docker-registry    failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Severity/Priority

Additional Context

# get new endpoint ips
IP1=$(kubectl get endpointslices.discovery.k8s.io -o json | jq -r '.items[0].endpoints[0] | select(.addresses != null) | .addresses[]' | head -n 1)
IP2=$(kubectl get endpointslices.discovery.k8s.io -o json | jq -r '.items[0].endpoints[1] | select(.addresses != null) | .addresses[]' | head -n 1)
@ntwkninja ntwkninja added the possible-bug Something may not be working label Sep 25, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Sep 25, 2024

Does this resolve itself after a pepr watcher pod restart? I think in the past we've seen this issue when pepr "stops watching" the endpoints.

We have also floated the idea of adding a config option for end users to specify a CIDR range instead of relying on the pepr watch. We should probably just add that at this point given the inconsistency seen with the watch.

@ntwkninja
Copy link
Member Author

ntwkninja commented Sep 25, 2024

Does this resolve itself after a pepr watcher pod restart? I think in the past we've seen this issue when pepr "stops watching" the endpoints.

I'll try restarting the watcher and report back That worked

@mjnagel mjnagel modified the milestone: 0.29.0 Sep 25, 2024
@joelmccoy
Copy link
Contributor

Want to call out that we ran into this in our internal clusters as well during an k8s upgrade. Kicking the pepr watcher pod reconciled all the netpols.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible-bug Something may not be working
Projects
None yet
Development

No branches or pull requests

3 participants