Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong number of nproc, when running PyTorch container with cpus-per-task set #175

Open
itzsimpl opened this issue Jan 9, 2024 · 3 comments

Comments

@itzsimpl
Copy link

itzsimpl commented Jan 9, 2024

When running a PyTorch container in slurm with cpus-per-task set, nproc reports a wrong value (1).

$ srun --ntasks-per-node=3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
2/6
2/6
2/6

$ srun --exclusive --ntasks-per-node=3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
4/4
4/4
4/4

$ srun --cpus-per-task 10 --ntasks-per-node=3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
10/30
10/30
10/30

$ srun -cpus-per-task 32 --overcommit --ntasks-per-node=3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
32/32
32/32
32/32

and

$ srun --ntasks-per-node=3 --container-image=ubuntu:22.04 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: ubuntu:22.04
2/6
2/6
2/6

$ srun --exclusive --ntasks-per-node=3 --container-image=ubuntu:22.04 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
4/4
4/4
4/4

$ srun --cpus-per-task=10 --ntasks-per-node=3 --container-image=ubuntu:22.04 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: ubuntu:22.04
10/30
10/30
10/30

$ srun -cpus-per-task 32 --overcommit --ntasks-per-node=3 --container-image=ubuntu:22.04 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: ubuntu:22.04
32/32
32/32
32/32

but

$ srun --mem=48G --ntasks-per-node=3 --container-image=nvcr.io/nvidia/pytorch:23.12-py3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: nvcr.io/nvidia/pytorch:23.12-py3
1/6
1/6
1/6

$ srun --mem=48G --exclusive --ntasks-per-node=3 --container-image=nvcr.io/nvidia/pytorch:23.12-py3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: nvcr.io/nvidia/pytorch:23.12-py3
1/4
1/4
1/4

$ srun --mem=48G --cpus-per-task=10 --ntasks-per-node=3 --container-image=nvcr.io/nvidia/pytorch:23.12-py3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: nvcr.io/nvidia/pytorch:23.12-py3
1/30
1/30
1/30

$ srun --mem=48G --cpus-per-task=32 --overcommit --ntasks-per-node=3 --container-image=nvcr.io/nvidia/pytorch:23.12-py3 bash -c 'echo "`nproc`/$SLURM_CPUS_ON_NODE"'
pyxis: imported docker image: nvcr.io/nvidia/pytorch:23.12-py3
1/32
1/32
1/32

This is caused by 50-slurm-pytorch.sh hook, which hardcodes OMP_NUM_THREADS to 1; I have opened a PR (#174) with a fix that is based on current Pytorch Multiprocessing best practices.

@flx42
Copy link
Member

flx42 commented Jan 9, 2024

I'm surprised that nproc has this behavior, to be honest.

I'll review the PR, but it's a little bit of a sensitive topic: setting the wrong number of threads can quickly cause performance issues one way or another (not enough cores in use VS too many threads). I'll check with my colleagues what they think.

@itzsimpl
Copy link
Author

@flx42 I agree that surprised me too. For completeness I'm referencing the issue (a cpu oversubscription NVIDIA/NeMo#8141) that led me into this investigation. It turned out there is an issue in numba (numba/numba#9387), which resets the value of torch num_threads on numba num_threads get or set.

My proposal is to keep the behaviour consistent. Especially since torch proposes setting num_threads to nCPU/nTasks, as well as because nproc is updated too (and some bash scripts are based on that value). Do check with colleagues, please.

@itzsimpl
Copy link
Author

FWW. nproc does take into account both OMP_NUM_THREADS and OMP_THREAD_LIMIT
https://www.gnu.org/software/coreutils/manual/html_node/nproc-invocation.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants