Releases: ROCm/rccl
Releases · ROCm/rccl
rccl 2.20.5 for ROCm 6.2.2
RCCL code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rccl 2.20.5 for ROCm 6.2.1
RCCL code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.
RCCL 2.20.5 for ROCm 6.2.0
Changed
- Compatibility with NCCL 2.20.5
- Compatibility with NCCL 2.19.4
- Performance tuning for some collective operations on MI300
- Enabled NVTX code in RCCL
- Replaced rccl_bfloat16 with hip_bfloat16
- NPKit updates:
- Removed warm-up iteration removal by default, need to opt in now
- Doubled the size of buffers to accommodate for more channels
- Modified rings to be rail-optimized topology friendly
- Replaced ROCmSoftwarePlatform links with ROCm links
Added
- Support for fp8 and rccl_bfloat8
- Support for using HIP contiguous memory
- Implemented ROC-TX for host-side profiling
- Enabled static build
- Added new rome model
- Added fp16 and fp8 cases to unit tests
- New unit test for main kernel stack size
- New -n option for topo_expl to override # of nodes
- Improved debug messages of memory allocations
- Channel shuffling for IB systems
Fixed
- Bug when configuring RCCL for only LL128 protocol
- Scratch memory allocation after API change for MSCCL
- Incorrect minNchannels in multi-node
RCCL 2.18.6 for ROCm 6.1.2
Changed
- Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid overflow
rccl 2.18.6 for ROCm 6.1.1
RCCL code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.
RCCL 2.18.6 for ROCm 6.1.0
Changed
- Compatibility with NCCL 2.18.6
rccl 2.18.3 for ROCm 6.0.2
RCCL code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.
rccl 2.18.3 for ROCm 6.0.0
Changed
- Compatibility with NCCL 2.18.3
rccl 2.17.1 for ROCm 5.7.1
RCCL code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.
rccl 2.17.1 for ROCm 5.7.0
Changed
- Compatibility with NCCL 2.17.1-1
- Performance tuning for some collective operations
Added
- Minor improvements to MSCCL codepath
- NCCL_NCHANNELS_PER_PEER support
- Improved compilation performance
- Support for gfx94x
Fixed
- Potential race-condition during ncclSocketClose()