Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm安装Crane-scheduler 作为第二个调度器,使用官网示例测试pod没有被调度,一直卡在”Pending“状态 #50

Open
xucq07 opened this issue Aug 24, 2023 · 13 comments

Comments

@xucq07
Copy link

xucq07 commented Aug 24, 2023

helm安装Crane-scheduler 作为第二个调度器,使用官网示例测试pod没有被调度,一直卡在”Pending“状态:
1、部署yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpu-stress
spec:
selector:
matchLabels:
app: cpu-stress
replicas: 1
template:
metadata:
labels:
app: cpu-stress
spec:
schedulerName: crane-scheduler
hostNetwork: true
tolerations:
- key: node.kubernetes.io/network-unavailable
operator: Exists
effect: NoSchedule
containers:
- name: stress
image: docker.io/gocrane/stress:latest
command: ["stress", "-c", "1"]
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "1"
2、pod详情:
Name: cpu-stress-cc8656b6c-b5hhz
Namespace: default
Priority: 0
Node:
Labels: app=cpu-stress
pod-template-hash=cc8656b6c
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/cpu-stress-cc8656b6c
Containers:
stress:
Image: docker.io/gocrane/stress:latest
Port:
Host Port:
Command:
stress
-c
1
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 1
memory: 1Gi
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9nwd5 (ro)
Volumes:
kube-api-access-9nwd5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors:
Tolerations: node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
3、crane-scheduler日志:
I0824 00:50:47.247851 1 serving.go:331] Generated self-signed cert in-memory
W0824 00:50:48.025758 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work.
W0824 00:50:48.073470 1 authorization.go:47] Authorization is disabled
W0824 00:50:48.073495 1 authentication.go:40] Authentication is disabled
I0824 00:50:48.073517 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0824 00:50:48.080823 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0824 00:50:48.080862 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0824 00:50:48.080915 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.080927 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.080957 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.080968 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.081199 1 secure_serving.go:197] Serving securely on [::]:10259
I0824 00:50:48.081270 1 tlsconfig.go:240] Starting DynamicServingCertificateController
W0824 00:50:48.091287 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 00:50:48.146624 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
I0824 00:50:48.182865 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0824 00:50:48.183903 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0824 00:50:48.184059 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0824 00:50:48.284088 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...
W0824 00:57:30.128689 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:02:45.130884 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:08:48.133483 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:14:31.135801 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:20:24.138959 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0824 01:30:10.141873 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
4、crane-scheduler-controlle日志:
I0824 08:46:16.647776 1 server.go:61] Starting Controller version v0.0.0-master+$Format:%H$
I0824 08:46:16.648237 1 leaderelection.go:248] attempting to acquire leader lease crane-system/crane-scheduler-controller...
I0824 08:46:16.706891 1 leaderelection.go:258] successfully acquired lease crane-system/crane-scheduler-controller
I0824 08:46:16.807546 1 controller.go:72] Caches are synced for controller
I0824 08:46:16.807631 1 node.go:46] Start to reconcile node events
I0824 08:46:16.807653 1 event.go:30] Start to reconcile EVENT events
I0824 08:46:16.885698 1 node.go:75] Finished syncing node event "node6/cpu_usage_avg_5m" (77.952416ms)
I0824 08:46:16.973162 1 node.go:75] Finished syncing node event "node4/cpu_usage_avg_5m" (87.371252ms)
I0824 08:46:17.045250 1 node.go:75] Finished syncing node event "master2/cpu_usage_avg_5m" (72.023298ms)
I0824 08:46:17.109260 1 node.go:75] Finished syncing node event "master3/cpu_usage_avg_5m" (63.673389ms)
I0824 08:46:17.192332 1 node.go:75] Finished syncing node event "node1/cpu_usage_avg_5m" (83.005155ms)
I0824 08:46:17.529495 1 node.go:75] Finished syncing node event "node2/cpu_usage_avg_5m" (337.099052ms)
I0824 08:46:17.927163 1 node.go:75] Finished syncing node event "node3/cpu_usage_avg_5m" (397.603044ms)
I0824 08:46:18.327978 1 node.go:75] Finished syncing node event "node5/cpu_usage_avg_5m" (400.749476ms)
I0824 08:46:18.746391 1 node.go:75] Finished syncing node event "master1/cpu_usage_avg_5m" (418.360885ms)
I0824 08:46:19.129081 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1h" (382.635495ms)
I0824 08:46:19.524508 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1h" (395.361539ms)
I0824 08:46:19.948035 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1h" (423.453672ms)
I0824 08:46:20.332014 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1h" (383.909395ms)
I0824 08:46:20.737296 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1h" (405.102002ms)
I0824 08:46:21.245055 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1h" (507.697871ms)
I0824 08:46:21.573490 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1h" (328.368489ms)
I0824 08:46:21.937814 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1h" (364.254837ms)
I0824 08:46:22.335988 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1h" (397.952357ms)
I0824 08:46:22.724851 1 node.go:75] Finished syncing node event "master2/cpu_usage_max_avg_1d" (388.771915ms)
I0824 08:46:23.126059 1 node.go:75] Finished syncing node event "master3/cpu_usage_max_avg_1d" (401.156708ms)
I0824 08:46:23.528329 1 node.go:75] Finished syncing node event "node6/cpu_usage_max_avg_1d" (402.208827ms)
I0824 08:46:23.937560 1 node.go:75] Finished syncing node event "node4/cpu_usage_max_avg_1d" (409.165081ms)
I0824 08:46:24.331730 1 node.go:75] Finished syncing node event "node5/cpu_usage_max_avg_1d" (394.024206ms)
I0824 08:46:24.730137 1 node.go:75] Finished syncing node event "master1/cpu_usage_max_avg_1d" (398.33551ms)
I0824 08:46:25.127074 1 node.go:75] Finished syncing node event "node1/cpu_usage_max_avg_1d" (396.798913ms)
I0824 08:46:25.528844 1 node.go:75] Finished syncing node event "node2/cpu_usage_max_avg_1d" (401.701104ms)
I0824 08:46:25.932684 1 node.go:75] Finished syncing node event "node3/cpu_usage_max_avg_1d" (403.762529ms)
I0824 08:46:26.330458 1 node.go:75] Finished syncing node event "node4/mem_usage_avg_5m" (397.710372ms)
I0824 08:46:26.736576 1 node.go:75] Finished syncing node event "master2/mem_usage_avg_5m" (406.060927ms)

@qmhu
Copy link
Member

qmhu commented Aug 28, 2023

请检查下crane-scheduler的pod的状态,是否running

@xucq07
Copy link
Author

xucq07 commented Aug 28, 2023

kubectl get pods -n crane-system
NAME READY STATUS RESTARTS AGE
crane-scheduler-b84489958-6jdj6 1/1 Running 0 4d1h
crane-scheduler-controller-6987688d8d-6wr7c 1/1 Running 0 4d1h
再次确认pod已经Running

@qmhu
Copy link
Member

qmhu commented Aug 28, 2023

kubectl get pods -n crane-system NAME READY STATUS RESTARTS AGE crane-scheduler-b84489958-6jdj6 1/1 Running 0 4d1h crane-scheduler-controller-6987688d8d-6wr7c 1/1 Running 0 4d1h 再次确认pod已经Running

从日志没看到异常。
你可以把pod的defaultScheduler改成空,试试默认调度器是否可以工作。

@xucq07
Copy link
Author

xucq07 commented Aug 28, 2023

测试过了,默认调度器没有问题可以正常调度

@qmhu
Copy link
Member

qmhu commented Aug 28, 2023

测试过了,默认调度器没有问题可以正常调度

能否把完整的日志发上来,包括crane-scheduler-controller-6987688d8d-6wr7c和crane-scheduler-b84489958-6jdj6

@xucq07
Copy link
Author

xucq07 commented Aug 28, 2023

crane-scheduler.log
crane-scheduler-controller.log
日志信息如下

@mobeixiaoxin
Copy link

遇到了同样的问题,使用的k8s版本为1.27
scheduler中报错如下:
0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

请大佬帮忙指点一下吧,感谢

@qmhu
Copy link
Member

qmhu commented Sep 6, 2023

遇到了同样的问题,使用的k8s版本为1.27 scheduler中报错如下: 0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

请大佬帮忙指点一下吧,感谢

应该是高版本的兼容性问题,目前1.25以下的集群没有问题,更高的集群可能要额外支持。

@mobeixiaoxin
Copy link

遇到了同样的问题,使用的k8s版本为1.27 scheduler中报错如下: 0905 05:42:20.346742 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:01.852683 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:01.852729 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:43:34.262887 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:43:34.262932 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:44:33.675140 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:44:33.675182 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:20.214073 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:20.214163 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:45:56.034526 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:45:56.034592 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:46:48.730711 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:46:48.730757 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource W0905 05:47:24.823783 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource E0905 05:47:24.823828 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource
请大佬帮忙指点一下吧,感谢

应该是高版本的兼容性问题,目前1.25以下的集群没有问题,更高的集群可能要额外支持。

好的,感谢

@redtee123
Copy link

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

crane-scheduler日志:
I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory
W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work.
W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled
W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled
I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259
I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...

crane-scheduler-controller日志:
I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms)
I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms)
I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms)
I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms)
I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms)
I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms)
I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms)
I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms)
I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms)
I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms)
I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms)
I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms)
I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

@qmhu
Copy link
Member

qmhu commented Oct 19, 2023

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

crane-scheduler日志: I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259 I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...

crane-scheduler-controller日志: I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms) I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms) I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms) I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms) I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms) I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms) I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms) I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms) I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms) I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms) I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms) I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms) I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

可能是没有关闭第二调度器的leaderelection。
helm/chart中安装的scheduler关闭了leaderelection,可以参考下:
https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

@redtee123
Copy link

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态
crane-scheduler日志: I1018 14:19:17.775925 1 serving.go:331] Generated self-signed cert in-memory W1018 14:19:18.105223 1 options.go:330] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. W1018 14:19:18.116946 1 authorization.go:47] Authorization is disabled W1018 14:19:18.116959 1 authentication.go:40] Authentication is disabled I1018 14:19:18.116979 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251 I1018 14:19:18.119411 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I1018 14:19:18.119430 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I1018 14:19:18.119461 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119469 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.119489 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119498 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.119562 1 secure_serving.go:197] Serving securely on [::]:10259 I1018 14:19:18.119635 1 tlsconfig.go:240] Starting DynamicServingCertificateController I1018 14:19:18.219523 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I1018 14:19:18.219544 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I1018 14:19:18.219982 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I1018 14:19:18.320414 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...
crane-scheduler-controller日志: I1018 22:16:26.114291 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (277.013756ms) I1018 22:19:25.740401 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (34.500361ms) I1018 22:19:25.764618 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (24.178999ms) I1018 22:19:25.798566 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (33.90647ms) I1018 22:19:25.826773 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (28.169613ms) I1018 22:19:25.848814 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (22.005738ms) I1018 22:19:26.117118 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (268.264709ms) I1018 22:22:25.737763 1 node.go:75] Finished syncing node event "kube-node-01/mem_usage_avg_5m" (32.338992ms) I1018 22:22:25.765262 1 node.go:75] Finished syncing node event "kube-node-02/mem_usage_avg_5m" (27.45828ms) I1018 22:22:25.794327 1 node.go:75] Finished syncing node event "kube-master-01/mem_usage_avg_5m" (29.029129ms) I1018 22:22:25.818029 1 node.go:75] Finished syncing node event "kube-node-02/cpu_usage_avg_5m" (23.666818ms) I1018 22:22:25.841672 1 node.go:75] Finished syncing node event "kube-master-01/cpu_usage_avg_5m" (23.603915ms) I1018 22:22:26.125154 1 node.go:75] Finished syncing node event "kube-node-01/cpu_usage_avg_5m" (283.438566ms)

可能是没有关闭第二调度器的leaderelection。 helm/chart中安装的scheduler关闭了leaderelection,可以参考下: https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

确实是第二调度器没有关闭leaderelection导致的。但不是scheduler-deployment.yaml中的leaderelection,是scheduler-configmap.yaml中的leaderelection没关闭
6861013c12857c5bd3823fe3add269ab

@lesserror
Copy link

lesserror commented Dec 26, 2023

我的kubernetes版本为1.20.7,使用的crane-scheduler镜像版本为0.0.20,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。当我创建新的pod测试调度时,pod一直处于pending状态

可能是没有关闭第二调度器的leaderelection。 helm/chart中安装的scheduler关闭了leaderelection,可以参考下: https://github.com/gocrane/helm-charts/blob/main/charts/scheduler/templates/scheduler-deployment.yaml#L23

确实是第二调度器没有关闭leaderelection导致的。但不是scheduler-deployment.yaml中的leaderelection,是scheduler-configmap.yaml中的leaderelection没关闭 6861013c12857c5bd3823fe3add269ab

我的kubernetes版本为1.22.12,使用的crane-scheduler镜像版本为scheduler-0.2.2,作为第二个调度器使用。节点的annotation中已经有了聚合指标·。也将leaderelection改为false了,但是当我创建新的pod测试调度时,pod一直处于pending状态。
pod信息:

Events:
  Type     Reason            Age   From             Message
  ----     ------            ----  ----             -------
  Warning  FailedScheduling  15s   crane-scheduler  0/1 nodes are available: 1 Insufficient cpu.

leaderelection:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
    - schedulerName: crane-scheduler
      plugins:
        filter:
          enabled:
          - name: Dynamic
        score:
          enabled:
          - name: Dynamic
            weight: 3

crane-scheduler日志:

I1226 09:47:56.595597       1 serving.go:348] Generated self-signed cert in-memory
W1226 09:47:57.035592       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1226 09:47:57.041561       1 server.go:139] "Starting Kubernetes Scheduler" version="v0.0.0-master+$Format:%H$"
I1226 09:47:57.044642       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1226 09:47:57.044658       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1226 09:47:57.044666       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1226 09:47:57.044679       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1226 09:47:57.044699       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1226 09:47:57.044715       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1226 09:47:57.045160       1 secure_serving.go:200] Serving securely on [::]:10259
I1226 09:47:57.045218       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I1226 09:47:57.145093       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I1226 09:47:57.145152       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I1226 09:47:57.145100       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

crane-scheduler-controller日志:

root@master:/home/ubuntu/kube-prometheus/manifests# kubectl logs -n crane-system crane-scheduler-controller-6f6b94c8f7-79vff 
I1226 17:47:56.187263       1 server.go:61] Starting Controller version v0.0.0-master+$Format:%H$
I1226 17:47:56.188316       1 leaderelection.go:248] attempting to acquire leader lease crane-system/crane-scheduler-controller...
I1226 17:48:12.646241       1 leaderelection.go:258] successfully acquired lease crane-system/crane-scheduler-controller
I1226 17:48:12.747072       1 controller.go:72] Caches are synced for controller
I1226 17:48:12.747174       1 node.go:46] Start to reconcile node events
I1226 17:48:12.747208       1 event.go:30] Start to reconcile EVENT events
I1226 17:48:12.773420       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (26.154965ms)
I1226 17:48:12.794854       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1h" (21.278461ms)
I1226 17:48:12.818035       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1d" (23.146517ms)
I1226 17:48:12.837222       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (19.151134ms)
I1226 17:48:13.055018       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1h" (217.762678ms)
I1226 17:48:13.455442       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1d" (400.366453ms)
I1226 17:51:12.788539       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (41.092765ms)
I1226 17:51:12.810824       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (22.248821ms)
I1226 17:54:12.771140       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (22.840662ms)
I1226 17:54:12.789918       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (18.740179ms)
I1226 17:57:12.773735       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (26.395777ms)
I1226 17:57:12.792897       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (19.124323ms)
I1226 18:00:12.772243       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (24.369461ms)
I1226 18:00:12.804297       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (32.008004ms)
I1226 18:03:12.774690       1 node.go:75] Finished syncing node event "master/mem_usage_max_avg_1h" (27.291591ms)
I1226 18:03:12.795145       1 node.go:75] Finished syncing node event "master/mem_usage_avg_5m" (20.350165ms)
I1226 18:03:12.813508       1 node.go:75] Finished syncing node event "master/cpu_usage_avg_5m" (18.32638ms)
I1226 18:03:12.833109       1 node.go:75] Finished syncing node event "master/cpu_usage_max_avg_1h" (19.549029ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants