Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: optimize GPU allocate logic #2221

Merged

Conversation

ZiMengSheng
Copy link
Contributor

Ⅰ. Describe what this PR does

  1. 提前将 Pod GPU 申请要求计算好,并存储到 cycleState.GPURequirements
  2. 明确 Joint Allocate的逻辑,即先调度主设备,然后在主设备同 PCIE 调度副设备,这里不再耦合 GPU 拓扑感知调度的逻辑, GPU 拓扑感知调度逻辑放到 GPUAllocator 中处理
  3. 优化 GPU 按照Partition 分配的逻辑,节省性能
  4. 优化 GPU 按照拓扑分配的逻辑,节省性能

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 90.27778% with 35 lines in your changes missing coverage. Please review.

Project coverage is 67.32%. Comparing base (0466bf3) to head (799393f).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/scheduler/plugins/deviceshare/allocator_gpu.go 87.19% 16 Missing and 10 partials ⚠️
...eduler/plugins/deviceshare/allocator_gpu_helper.go 96.59% 2 Missing and 1 partial ⚠️
pkg/scheduler/plugins/deviceshare/device_cache.go 81.81% 1 Missing and 1 partial ⚠️
pkg/scheduler/plugins/deviceshare/utils.go 91.30% 1 Missing and 1 partial ⚠️
.../scheduler/plugins/deviceshare/device_allocator.go 94.44% 1 Missing ⚠️
pkg/scheduler/plugins/deviceshare/plugin.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2221      +/-   ##
==========================================
+ Coverage   67.18%   67.32%   +0.13%     
==========================================
  Files         451      453       +2     
  Lines       43480    43681     +201     
==========================================
+ Hits        29214    29409     +195     
- Misses      11718    11726       +8     
+ Partials     2548     2546       -2     
Flag Coverage Δ
unittests 67.32% <90.27%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZiMengSheng ZiMengSheng force-pushed the p_gpu_allocate_algorithm branch 4 times, most recently from 5781a13 to 7ed6637 Compare October 11, 2024 14:22
Signed-off-by: wangjianyu.wjy <wangjianyu.wjy@alibaba-inc.com>
@ZiMengSheng ZiMengSheng force-pushed the p_gpu_allocate_algorithm branch 2 times, most recently from 3485ea3 to c476eb7 Compare October 11, 2024 15:24
Signed-off-by: wangjianyu.wjy <wangjianyu.wjy@alibaba-inc.com>
@hormes
Copy link
Member

hormes commented Oct 14, 2024

/lgtm
/approve

@koordinator-bot koordinator-bot bot merged commit d272b4e into koordinator-sh:main Oct 15, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants