You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is caused by 50-slurm-pytorch.sh hook, which hardcodes OMP_NUM_THREADS to 1; I have opened a PR (#174) with a fix that is based on current Pytorch Multiprocessing best practices.
The text was updated successfully, but these errors were encountered:
I'm surprised that nproc has this behavior, to be honest.
I'll review the PR, but it's a little bit of a sensitive topic: setting the wrong number of threads can quickly cause performance issues one way or another (not enough cores in use VS too many threads). I'll check with my colleagues what they think.
@flx42 I agree that surprised me too. For completeness I'm referencing the issue (a cpu oversubscription NVIDIA/NeMo#8141) that led me into this investigation. It turned out there is an issue in numba (numba/numba#9387), which resets the value of torch num_threads on numba num_threads get or set.
My proposal is to keep the behaviour consistent. Especially since torch proposes setting num_threads to nCPU/nTasks, as well as because nproc is updated too (and some bash scripts are based on that value). Do check with colleagues, please.
When running a PyTorch container in slurm with cpus-per-task set,
nproc
reports a wrong value (1).and
but
This is caused by 50-slurm-pytorch.sh hook, which hardcodes
OMP_NUM_THREADS
to 1; I have opened a PR (#174) with a fix that is based on current Pytorch Multiprocessing best practices.The text was updated successfully, but these errors were encountered: