-
Notifications
You must be signed in to change notification settings - Fork 71
BlueFog Development Guide
Please follow the listed steps to start the development.
Due to limited time and the preliminary phase of this project, we didn't extensively test our project on the various environments. In order to avoid some potential pitfall, please make sure that python>=3.7
and gcc>=4.0
and supporting std=C++11
in your development environment. We recommend using conda as python environment control. Also, prefer to use Mac OS or Linux-base system. Lastly, we relied on the MPI implementation, please install open-mpi>=4.0
[Download and Instruction]. (MPI-CH may be fine but not fully tested yet.)
First time to pull third_party submodules, please run:
git submodule update --init --recursive
In order to get full support of GPU, you have to install CUDA )(>=10.0) and install pytorch
and/or tensorflow
with the GPU support version. To maximize the efficiency of GPU and MPI, we implementation assumes the MPI installed is GPU-aware if GPU is available. It will avoid the extra cost that copy and moving the data from the GPU and host memory, i.e. the address of GPU location can be used directly. However, if MPI built is not GPU-aware, there will be a segmentation fault. To do that, you can call this command after the download of OpenMPI:
./configure --prefix={YOUR_OWN_PREFIX} --with-cuda
WARNING: Making sure that there is only one MPI implementation is installed on your machine.
It is highly recommended to use NCCL instead of OpenMPI as GPU communication implementation. We require the NCCL>=2.7 since our implementation heavily relied on the new ncclSend and ncclRecv API introduced after version 2.7.
Under the root folder, i.e. python
folder, run the this command first.
pip install -e . --verbose
The -e means “editable”, so changes you make to files in the bluefog directory will take effect without reinstalling the package. In contrast, if you do python setup.py install, files will be copied from the bluefog directory to a directory of Python packages (often something like /home/{USER}/anaconda3/lib/python3.7/site-packages/bluefog
). This means that changes you make to files in the bluefog directory will not have any effect.
If your system installed with NCCL >= 2.7, you should install BLUEFOG with NCCl by following command:
BLUEFOG_WITH_NCCL=1 pip install -e . --verbose
Note: By default, the above pip install
command already included the building C extension step. Following is a more detailed explanation for that.
We heavily relied on the C extension in this project. Unlike python file, whenever you modified the C files, you have to re-compile it and generate the shared library out.
python setup.py build_ext -i
where -i means the in-place build. If your environment is fine, it should be able to generate a file like mpi_lib.cpython-37m-darwin.so
under /bluefog/torch
folder. (You may have different "middle" name based on your system and enviroment).
To check the setup and build is correct or not, run
make test
to see if all tests can pass or not. It tests all cases. If you want to test some particular cases, you can run the command like
BLUEFOG_LOG_LEVEL=debug bfrun -np 4 pytest -s test/torch_win_ops_test.py::WinOpsTests:test_func
where the environment variable BLUEFOG_LOG_LEVEL=debug
changes the print logs in C++ from default error
level to debug
level, -s
is the flag in pytest
to disable capturing the logs, and the pattern for test path is file_path::ClassName::FuncName
.
It is important to keep the code style and lint consistent throughout the whole project.
For python, we use normal pylint, which is specified in the .pylintrc
file. Python docstring style is google style. It is recommended to have an editor to run pylint
easily (But do not turn on format-on-save.) Otherwise, remember to run pylint {FILENANE}
on your changes.
For C++, we use clang-tidy
? I am not very familiar with C++ format. Right now, we have a simple .clang-format
file. I just use vscode clang-format
plugin.
- If my default python version is 2, how to set the default python version to 3?
Answer: alias python=python3
- If my pytorch is not well installed, how to reinstall pytorch?
Answer: Uninstall torch: pip uninstall torch
Install torch: pip3 install torch torchvision
- I got the following error when executing "BLUEFOG_LOG_LEVEL=debug mpirun -n 2 python bluefog/torch/c_hello_world.py". How to address this issue? Error: File "bluefog/torch/c_hello_world.py", line 36 print(f"Rank: {rank}, local rank: {local_rank} Size: {size}, local size: {local_size}") ^ SyntaxError: invalid syntax
Answer: you should precise python3 using the following command: BLUEFOG_LOG_LEVEL=debug mpirun -n 2 python3 bluefog/torch/c_hello_world.py
- If I get the following error when executing "make test" command, how to address this issue? Error: Test error: There are not enough slots available in the system to satisfy the 4
Answer: Reason: there are not enough physical CPU cores (the test requires 4) in your machine. In order to address this issue, you should first use "sysctl hw.physicalcpu hw.logicalcpu" command to know the number of physical CPU cores. Assume that you have 2 physical CPU cores in your machine, you need to modify 4 in python/Makefile to 2. Then, the issue is resolved.