Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display the remaining number of gpu in node resources #477

Open
devenami opened this issue Sep 6, 2024 · 1 comment
Open

Display the remaining number of gpu in node resources #477

devenami opened this issue Sep 6, 2024 · 1 comment

Comments

@devenami
Copy link
Contributor

devenami commented Sep 6, 2024

1. Issue or feature description

$ kubectl describe node gpu-000-001
Name:               gpu-000-001
Capacity:
  cpu:                128
  ephemeral-storage:  13119414984Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1055950504Ki
  nvidia.com/gpu:     80
  pods:               110
Allocatable:
  cpu:                124
  ephemeral-storage:  12907602632Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1017153192Ki
  nvidia.com/gpu:     70
  pods:               110
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                4885m (3%)    75 (60%)
  memory             68663Mi (6%)  151Gi (15%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
  nvidia.com/gpu     5             5

In the above information, there are a total of 8 GPUs on the physical node. The gpu resources in Allocatable have been expanded by 10 times. I deployed 5 Pods on it, and each pod occupies a whole GPU

I want to see the remaining number of available GPUs and some other related information on node (such as the maximum available virtual existing per card, the remaining number of complete GPUs).

When the Pod is in Pending state, we can determine the reason for Pending by checking the Node information.

@archlitchi
Copy link
Collaborator

refer to this: https://github.com/Project-HAMi/HAMi?tab=readme-ov-file#monitor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants