You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The upcoming UCX 1.16 release will default to using UCP protov2 (UCX_PROTO_ENABLE=y). We have been testing Dask-CUDA and UCX-Py with protov2 for a while and are not aware of any issues, in fact some known issues related to non-fully-NVLink connected systems (e.g., DGX-1) are now fixed. However, historically, large changes like this have been associated with increased risk of new problems. To mitigate potential risks, we strongly encourage all UCX-related testing to be duplicated with UCX 1.15.0 (released September 28, 2023, already available in conda-forge), running duplicates with UCX_PROTO_ENABLE=y. If possible, testing with latest UCX master changes is preferred (although that requires building UCX from source).
Mitigation strategy: if we encounter any blocking issues it is still possible to fallback to protov1 by explicitly setting UCX_PROTO_ENABLE=n, which can be setup as default in UCX-Py, this is a last resort as protov2 should become the new norm and we need to adapt and resolve any issues with it we may encounter.
The text was updated successfully, but these errors were encountered:
The upcoming UCX 1.16 release will default to using UCP protov2 (
UCX_PROTO_ENABLE=y
). We have been testing Dask-CUDA and UCX-Py with protov2 for a while and are not aware of any issues, in fact some known issues related to non-fully-NVLink connected systems (e.g., DGX-1) are now fixed. However, historically, large changes like this have been associated with increased risk of new problems. To mitigate potential risks, we strongly encourage all UCX-related testing to be duplicated with UCX 1.15.0 (released September 28, 2023, already available in conda-forge), running duplicates withUCX_PROTO_ENABLE=y
. If possible, testing with latest UCX master changes is preferred (although that requires building UCX from source).Mitigation strategy: if we encounter any blocking issues it is still possible to fallback to protov1 by explicitly setting
UCX_PROTO_ENABLE=n
, which can be setup as default in UCX-Py, this is a last resort as protov2 should become the new norm and we need to adapt and resolve any issues with it we may encounter.The text was updated successfully, but these errors were encountered: