Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Parallelism configuration option to lakectl #8283

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ItamarYuran
Copy link
Contributor

Closes #8159

Change Description

Added the option to configure the amount of parallelism for lakectl.

See feature request for more backround.

Currently, the configuration nests in a new section "Options"

@ItamarYuran ItamarYuran added area/lakectl Issues related to lakeFS' command line interface (lakectl) include-changelog PR description should be included in next release changelog labels Oct 13, 2024
Copy link

E2E Test Results - Quickstart

11 passed

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Concise fix :-)

Confused about the precedence of the 2 settings for parallelism, so asking for a change in case I am right about the required precedence.

defaultSyncParallelism = 25
defaultSyncPresign = true
defaultNoProgress = false
defaultParallelismConfig = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use 0? If not, we need to document what "-1" means.

Comment on lines 77 to 81
parallelismToUse := s.cfg.Parallelism
if parallelismToUse <= 0 {
parallelismToUse = s.cfg.SyncFlags.Parallelism
}
for i := 0; i < parallelismToUse; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 changes, the first is more important:

  • AFAIU SyncFlags.Parallelism is more specific than the new setting, so if both are specified we should use it and not the new more global setting.
  • It is enough to call the var "parallelism", it will obviously be used.
Suggested change
parallelismToUse := s.cfg.Parallelism
if parallelismToUse <= 0 {
parallelismToUse = s.cfg.SyncFlags.Parallelism
}
for i := 0; i < parallelismToUse; i++ {
parallelism := s.cfg.SyncFlags.Parallelism
if parallelism <= 0 {
parallelism = s.cfg.Parallelism
}
for i := 0; i < parallelism; i++ {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I understand, problem is, the default for s.cfg.SyncFlags.Parallelism is 25, so unless the user flagged with a 0 or less, the used value will be the default flag and not the configuration.

Copy link

github-actions bot commented Oct 14, 2024

🎊 PR Preview 665c8e1 has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-8283.surge.sh

🕐 Build time: 0.009s

🤖 By surge-preview

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the flag redundancy here.
You can have both a default param value and override it with a config value if exists it's not mutually exclusive.

Comment on lines +159 to +160
defaultParallelismConfig = 25
defaultSyncParallelism = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why both are needed.
We should:

  1. Use configuration param if provided
  2. Use modified param value if passed as part of command
  3. Fallback to default param value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was, that positive default value should come from one source only. In case both have positive default values how will we determine which one to use? I chose the config val to be positive by default, following Ariel's change request to go from the more general case to specific case.
In any case, this code run according to the priorities:

  • no flag no config val - uses 25 as config default
  • no flag with config - uses config
  • flag with no config - uses the flag,
  • flag with config - uses flag

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your priorities seem correct. We have a way to determine whether a flag in Cobra was explicitly provided by the user or not (there's at least one place in the code where we use it AFAIR).
You can use the same method to do that and determine whether we need to use the config value or the param value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lakectl Issues related to lakeFS' command line interface (lakectl) include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for the lakectl local --parallelism flag to have persistent default values through env vars/config
3 participants