Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't create scancel commands for users without jobs #137

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

wdpypere
Copy link
Contributor

@wdpypere wdpypere commented Jun 30, 2022

  • make limit of scancel commands configurable
  • set limit of amount of user del
  • improve logging about the scancel and user del limits
  • don't create scancel commands for users that don't exist or don't have any jobs
  • fix a silent failure

fixes #127

@@ -373,7 +373,9 @@ def slurm_user_accounts(vo_members, active_accounts, slurm_user_info, clusters,
])

for user in remove_users:
job_cancel_commands[user].append(create_remove_user_jobs_command(user=user, cluster=cluster))
active_jobs = get_slurm_sacct_active_jobs_for_user(user)
if active_jobs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of boilerplate, but this is the main change.

@wdpypere
Copy link
Contributor Author

so I'm running into a failed test because of:

======================================================================
ERROR: test_slurm_user_accounts (test.slurm_sync.SlurmSyncTestGent)
Test that the commands to create, change and remove users are correctly generated.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wdpypere/workspace/vsc-administration/test/slurm_sync.py", line 212, in test_slurm_user_accounts
    (job_cancel_commands, commands, remove_user_commands) = slurm_user_accounts(vo_members, active_accounts, slurm_user_info, ["banette"])
  File "lib/vsc/administration/slurm/sync.py", line 376, in slurm_user_accounts
    active_jobs = get_slurm_sacct_active_jobs_for_user(user)
  File "lib/vsc/administration/slurm/sacctmgr.py", line 543, in get_slurm_sacct_active_jobs_for_user
    (exitcode, contents) = asyncloop([SLURM_SACCT, "-L", "-P", "-s", "r", "-u", user])
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 151, in run
    return r._run()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 257, in _run
    self._run_pre()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 275, in _run_pre
    self._init_process()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 370, in _init_process
    self._process = self._process_module.Popen(self._shellcmd, **self._popen_named_args)
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/sacct': '/usr/bin/sacct'

@wdpypere
Copy link
Contributor Author

I would like mock or patch or whatever get_slurm_sacct_active_jobs_for_user to give a fixed output instead of call sacct but I don't know how to do that.

test/slurm_sync.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sync_slurm_acct issue with too many cancel jobs
2 participants