Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-resource-reservation-proposal #3775

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

molei20021
Copy link

Volcano node resource reservation proposal

Signed-off-by: molei20021 <molei21st@hotmail.com>
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 16, 2024
@JesseStutler
Copy link
Contributor

Hi @molei20021, thanks for your contribution! /cc @Monokaix @hwdef @lowang-bh

reserve.define.label2: {"business_type": "computer"}
reserve.resources.label2: [{"start_hour": 7, "end_hour": 9, "cpu": 24, "memory": 96, "start_reserve_ago": "30m", "pod_num": 15, "cron": "weekly 1,2,5"}]
```
In the configuration, nodeLabel represent a node list which nodeselector satisfy the nodeLabel, resources represent a list of resource reservation configuration, for example, reserve.resources.label1 means in hour 3 to 4 everyday, 32 cpu, 64 memory need to be reserved for label1, and start to reserve 2h ago, if 10 reserve pods are scheduled or after hour 4, stop reserve.
Copy link
Member

@Monokaix Monokaix Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve, recover?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node reserve means during reserve time interval, if other task will make the reserved resources less than reserve config, deny other task to schedule temporary until reserve tasks is scheduled.

@Monokaix
Copy link
Member

Queue also has the ability to reserve resources, how about use different queues to associate with different jobs, and adjust the queue's guarantee resources regularly?

@molei20021
Copy link
Author

Queue also has the ability to reserve resources, how about use different queues to associate with different jobs, and adjust the queue's guarantee resources regularly?

queue can not reserve for specified time interval, after the specified time when no reserve tasks will be created, it will still limit the resource of not guaranteed queues.

@Monokaix
Copy link
Member

Queue also has the ability to reserve resources, how about use different queues to associate with different jobs, and adjust the queue's guarantee resources regularly?

queue can not reserve for specified time interval, after the specified time when no reserve tasks will be created, it will still limit the resource of not guaranteed queues.

How about adjust the queue guaranteed resources dynamically?

@molei20021
Copy link
Author

resources dynamically?

we need queue to limit the quota of different users, if I put big reserve task of different users to one queue, quota can not be limited and even of the quota of one queue is reserved, big reserve task may still fail to schedule immediately if there are many fragmented resources in the cluster. In the document, I use predicate plugin to order top idle node to forbid tasks which are not reserved to schedule and this is helpful to reduce fragmented resources.

Queue also has the ability to reserve resources, how about use different queues to associate with different jobs, and adjust the queue's guarantee resources regularly?

queue can not reserve for specified time interval, after the specified time when no reserve tasks will be created, it will still limit the resource of not guaranteed queues.

How about adjust the queue guaranteed resources dynamically?

I see guarantee of queue is used in proportion plugin and capacity plugin, in proportion plugin, it can influence deserved value of queue, but if I set guarantee in many queues at some time, the deserved value may be less than guarantee. In my situation one queue not only have small unimportant tasks but also have some big reserve tasks. Different department should have different queues to restrict the quota so we need to reserve import big tasks globally and the reserved resources should be unfragmented to be scheduled by big reserved pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
retest-not-required-docs-only size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants