You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been running NCCL_TESTS on a multi-node, multi-GPU environment with NCCL 2.19.3-1 and OpenMPI 4.1.6. Each node has 4 NVIDIA V100 GPUs interconnected with NVLink and PCIe.
How is the NCCL_ALGO chosen by default, and what is the decision logic for choosing the algorithms for inter-node and intra-node communications?
If I specify NCCL_ALGO=Ring and at the same time set the OMPI_MCA_coll_tuned_use_dynamic_rules=1 and set an algorithm for coll_tuned_allreduce_algorithm, how the final algorithm will be chosen? Does it go with the NCCL one or the MCA one? Or maybe one is chosen for inter-node and the other for intra-node?
The text was updated successfully, but these errors were encountered:
We have an internal model which compares the performance of the different algorithms and (hopefully) chooses the best one.
You're mixing up NCCL and MPI. The OMPI_ setting controls MPI and NCCL does not use MPI (even for inter-node communication). MPI is only used by the NCCL tests to spawn tasks and help with the CPU-CPU synchronization, but it's not required by NCCL, at all.
Hi.
I have been running NCCL_TESTS on a multi-node, multi-GPU environment with NCCL 2.19.3-1 and OpenMPI 4.1.6. Each node has 4 NVIDIA V100 GPUs interconnected with NVLink and PCIe.
How is the
NCCL_ALGO
chosen by default, and what is the decision logic for choosing the algorithms for inter-node and intra-node communications?If I specify
NCCL_ALGO=Ring
and at the same time set theOMPI_MCA_coll_tuned_use_dynamic_rules=1
and set an algorithm forcoll_tuned_allreduce_algorithm
, how the final algorithm will be chosen? Does it go with the NCCL one or the MCA one? Or maybe one is chosen for inter-node and the other for intra-node?The text was updated successfully, but these errors were encountered: