Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
Binyang2014 committed Oct 11, 2024
1 parent 43cee6d commit 585aac5
Show file tree
Hide file tree
Showing 1,016 changed files with 65,033 additions and 13 deletions.
3 changes: 2 additions & 1 deletion README.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
<p class="caption" role="heading"><span class="caption-text">Design</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="design/design.html">MSCCL++ Design Document</a></li>
<li class="toctree-l1"><a class="reference internal" href="design/nccl-over-mscclpp.html">NCCL Over MSCCL++</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Performance</span></p>
<ul>
Expand Down Expand Up @@ -88,7 +89,7 @@
<h1>How to build docs<a class="headerlink" href="#how-to-build-docs" title="Permalink to this heading"></a></h1>
<ol class="arabic">
<li><p>Install <code class="docutils literal notranslate"><span class="pre">doxygen</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>doxygen
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>doxygen<span class="w"> </span>graphviz
</pre></div>
</div>
</li>
Expand Down
Binary file added _images/size_boundary_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion _sources/README.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
1. Install `doxygen`.

```bash
$ sudo apt-get install doxygen
$ sudo apt-get install doxygen graphviz
```

2. Install Python packages below. If you install them on the user's local, you need to include `~/.local/bin` to `$PATH` (to use `sphinx-build`).
Expand Down
71 changes: 71 additions & 0 deletions _sources/design/nccl-over-mscclpp.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# NCCL Over MSCCL++

(limitations)=
## Limitations

Current NCCL over MSCCL++ has a few limitations.

* We do not cover all APIs yet. See the [API Support Table](#api-support-table) for details.
* Multi-node communication is not supported yet.
* Currently, collective communication functions may not work correctly if the buffer address is differed from that of previous function calls while sharing the same base address (returned by [cuMemGetAddressRange](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g64fee5711274a2a0573a789c94d8299b)) with the previous address. This is because the current implementation performs zero-copy communication over user buffers, and it is difficult to efficiently inform all ranks if the buffer address dynamically changes.

(api-support-table)=
## API Support Table

The table below lists all NCCL APIs (v2.21). We may cover more APIs in the future.

| API Name | Supported |
| :----------------------- | :-------: |
| ncclGetLastError | X |
| ncclGetErrorString | O |
| ncclGetVersion | O |
| ncclGetUniqueId | O |
| ncclCommInitRank | O |
| ncclCommInitAll | X |
| ncclCommInitRankConfig | X |
| ncclCommSplit | X |
| ncclCommFinalize | O |
| ncclCommDestroy | O |
| ncclCommAbort | X |
| ncclCommGetAsyncError | O |
| ncclCommCount | O |
| ncclCommCuDevice | O |
| ncclCommUserRank | O |
| ncclCommRegister | X |
| ncclCommDeregister | X |
| ncclMemAlloc | X |
| ncclMemFree | X |
| ncclAllReduce | O |
| ncclBroadcast | X |
| ncclReduce | X |
| ncclAllGather | O |
| ncclReduceScatter | X |
| ncclGroupStart | O |
| ncclGroupEnd | O |
| ncclSend | X |
| ncclRecv | X |
| ncclRedOpCreatePreMulSum | X |
| ncclRedOpDestroy | X |

## Executor Support

The executor is a versatile tool designed to specify how mscclpp executes algorithms. Currently, only the allReduce operation allows for algorithm customization. The following environment variables can be managed:

- ALLREDUCEPKT_IP_JSON_FILE: Specifies the path to the JSON file that defines the algorithm for small-sized, in-place operations.
- ALLREDUCEPKT_OP_JSON_FILE: Specifies the path to the JSON file that defines the algorithm for small-sized, out-of-place operations.
- ALLREDUCE_IP_JSON_FILE: Specifies the path to the JSON file that defines the algorithm for larger-sized, in-place operations.
- ALLREDUCE_OP_JSON_FILE: Specifies the path to the JSON file that defines the algorithm for larger-sized, out-of-place operations.
- ALLREDUCE_SMALL_MSG_BOUNDARY: Defines the size threshold at which the algorithm will switch between fallback code and the customized algorithm for small messages.
- ALLREDUCE_LARGE_MSG_BOUNDARY: Defines the size threshold at which the algorithm will switch between the customized algorithm for small messages and that for larger messages.

```{figure} ../figs/size_boundary_diagram.png
:name: MMSCCL++ Abstractions
:alt: MSCCL++ Abstractions
:align: center

Decision Flowchart for Message Size-Based Algorithm Execution
```

This is an example of executing the interface with the executor:
``` bash
mpirun -np 8 -x ALLREDUCEPKT_IP_JSON_FILE=/root/azure-mscclpp/nccl/test/execution-files/allreducepacket.json -x ALLREDUCE_IP_JSON_FILE=/root/azure-mscclpp/nccl/test/execution-files/allreducesm.json -x ALLREDUCE_SMALL_MSG_BOUNDARY=16K -x ALLREDUCE_LARGE_MSG_BOUNDARY=1M ./apps/nccl/test/nccl_api_test
4 changes: 3 additions & 1 deletion _sources/getting-started/quickstart.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ $ docker run -it --privileged --net=host --ipc=host --gpus all ghcr.io/microsoft

See all available images [here](https://github.com/microsoft/mscclpp/pkgs/container/mscclpp%2Fmscclpp).

(build-from-source)=
## Build from Source

CMake 3.25 or later is required.
Expand Down Expand Up @@ -64,6 +65,7 @@ $ make -j mscclpp mscclpp_static
$ sudo make install/fast
```

(install-from-source-python-module)=
## Install from Source (Python Module)

Python 3.8 or later is required.
Expand Down Expand Up @@ -173,4 +175,4 @@ mpirun -np 8 --bind-to numa --allow-run-as-root -x LD_PRELOAD=$MSCCLPP_BUILD/app

If MSCCL++ is built on AMD platforms, `libmscclpp_nccl.so` would replace the [RCCL](https://github.com/ROCm/rccl) library (i.e., `librccl.so`).

See limitations of the current NCCL over MSCCL++ from [here](../apps/nccl/README.md#limitations).
See limitations of the current NCCL over MSCCL++ from [here](../design/nccl-over-mscclpp.md#limitations).
2 changes: 2 additions & 0 deletions _sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,15 @@ Getting Started
Design
-------
- :doc:`Design <design/design>` doc for those who want to understand the internals of MSCCL++.
- :doc:`NCCL over MSCCL++ <design/nccl-over-mscclpp>` doc for those who want to understand how to use NCCL over MSCCL++.

.. toctree::
:maxdepth: 1
:caption: Design
:hidden:

design/design
design/nccl-over-mscclpp

Performance
---------------
Expand Down
1 change: 1 addition & 0 deletions api/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
<p class="caption" role="heading"><span class="caption-text">Design</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../design/design.html">MSCCL++ Design Document</a></li>
<li class="toctree-l1"><a class="reference internal" href="../design/nccl-over-mscclpp.html">NCCL Over MSCCL++</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Performance</span></p>
<ul>
Expand Down
5 changes: 3 additions & 2 deletions design/design.html
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<script src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="NDmv4 Performance" href="../performance/performance-ndmv4.html" />
<link rel="next" title="NCCL Over MSCCL++" href="nccl-over-mscclpp.html" />
<link rel="prev" title="Working with Python API" href="../getting-started/tutorials/python-api.html" />
</head>

Expand Down Expand Up @@ -79,6 +79,7 @@
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="nccl-over-mscclpp.html">NCCL Over MSCCL++</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Performance</span></p>
<ul>
Expand Down Expand Up @@ -279,7 +280,7 @@ <h3>Implementing customized collective communication algorithms<a class="headerl
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../getting-started/tutorials/python-api.html" class="btn btn-neutral float-left" title="Working with Python API" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../performance/performance-ndmv4.html" class="btn btn-neutral float-right" title="NDmv4 Performance" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="nccl-over-mscclpp.html" class="btn btn-neutral float-right" title="NCCL Over MSCCL++" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>

<hr/>
Expand Down
Loading

0 comments on commit 585aac5

Please sign in to comment.