Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dst_weight for hierarchical neighbor allreduce #81

Open
hanbinhu opened this issue Mar 26, 2021 · 0 comments
Open

Add dst_weight for hierarchical neighbor allreduce #81

hanbinhu opened this issue Mar 26, 2021 · 0 comments

Comments

@hanbinhu
Copy link
Collaborator

hanbinhu commented Mar 26, 2021

No description provided.

hanbinhu pushed a commit that referenced this issue Apr 11, 2021
* Fixed the self_weight under emtpy receiving case

* Enable empty send neighbors and fix HalfTensor for recv_size==0

* Fixed the self_weight under emtpy receiving case

* Enable empty send neighbors and fix HalfTensor for recv_size==0

* Rename neighbor_weights to src_weights, and send_neighbors to dst_weights for neighbor_allreduce

* A script to test existing examples

* Accept dst_weights as Dict, and reorganize DoNeighborAllreduce

* Reorganize CheckNeighborSendRecvPattern

* Fix timeline_ptr for NCCL

* Fix timeline_ptr for NCCL

* Put dst_weights information into TensorTableEntry

* First Version of neighbor_allreduce dst_weight, existing problem: Fusion Not Implemented, CUDA data_weight problem

* Add some delay after data_weight as a temporary solution

* CPU Fusion for dst_weighted added

* Add ReadyEvent for dst_weight for single entry neighbor_allreduce

* Remove const identifier for tensor dtype as it is meaningless

* Add cuda source for scalebuffer

* Scale buffer to modify itself

* Add .o file to .gitignore

* dst_weight using CUDA for fused entry & compile flow in Python setup.py

* make clean *.o files generated by nvcc

* Add fix for NCCL single entry

* Make setup.py more robust

* Add timeout and cuda check

* Move test example

* Fix NCCL side dst_weight fusion bug

* Add agg to make matplotlib more stable

* Address comments for setup.py

* Simpler logic for dst_weighting_enabled and weighted_average_computation

* Better consideration for weight buffer size

* Make src_weights as std::map, and simplify logic for PerformNeighborAllreduceCallback

* Add TODO #80 and #81, and simplify the logic for dst_weight

* Wrap CheckNeighborSendRecvPattern again

* Add two more TODOs

* Address review comments

* Add condition variable to control the loop (#88)

* Add condition variable to control the loop

* Minor update on topology_setting in global_state

* Add missing <condition_variable> header

* Change cv.wait to cv.wait_for 10 seconds

* Address comment and remove adjusting resetVersionWinMem in ibfrun

Co-authored-by: ybc <bichengying@gmail.com>
BichengYing added a commit that referenced this issue May 6, 2021
* Fixed the self_weight under emtpy receiving case

* Enable empty send neighbors and fix HalfTensor for recv_size==0

* Fixed the self_weight under emtpy receiving case

* Enable empty send neighbors and fix HalfTensor for recv_size==0

* Rename neighbor_weights to src_weights, and send_neighbors to dst_weights for neighbor_allreduce

* A script to test existing examples

* Accept dst_weights as Dict, and reorganize DoNeighborAllreduce

* Reorganize CheckNeighborSendRecvPattern

* Fix timeline_ptr for NCCL

* Fix timeline_ptr for NCCL

* Put dst_weights information into TensorTableEntry

* First Version of neighbor_allreduce dst_weight, existing problem: Fusion Not Implemented, CUDA data_weight problem

* Add some delay after data_weight as a temporary solution

* CPU Fusion for dst_weighted added

* Add ReadyEvent for dst_weight for single entry neighbor_allreduce

* Remove const identifier for tensor dtype as it is meaningless

* Add cuda source for scalebuffer

* Scale buffer to modify itself

* Add .o file to .gitignore

* dst_weight using CUDA for fused entry & compile flow in Python setup.py

* make clean *.o files generated by nvcc

* Add fix for NCCL single entry

* Make setup.py more robust

* Add timeout and cuda check

* Move test example

* Fix NCCL side dst_weight fusion bug

* Add agg to make matplotlib more stable

* Address comments for setup.py

* Simpler logic for dst_weighting_enabled and weighted_average_computation

* Better consideration for weight buffer size

* Make src_weights as std::map, and simplify logic for PerformNeighborAllreduceCallback

* Add TODO #80 and #81, and simplify the logic for dst_weight

* Wrap CheckNeighborSendRecvPattern again

* Add two more TODOs

* Address review comments

* Add condition variable to control the loop

* Minor update on topology_setting in global_state

* Add missing <condition_variable> header

* Change cv.wait to cv.wait_for 10 seconds

* Address comment and remove adjusting resetVersionWinMem in ibfrun

* Add lock to protect loop_cv notify_one

Co-authored-by: Hanbin Hu <hanbinhu2016@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant