Skip to content

BlueFog Development Guide

YBC edited this page Feb 15, 2021 · 6 revisions

Please follow the listed steps to start the development.

0. Check your development environment.

Due to limited time and the preliminary phase of this project, we didn't extensively test our project on the various environments. In order to avoid some potential pitfall, please make sure that python>=3.7 and gcc>=4.0 and supporting std=C++11 in your development environment. We recommend using conda as python environment control. Also, prefer to use Mac OS or Linux-base system. Lastly, we relied on the MPI implementation, please install open-mpi>=4.0 [Download and Instruction]. (MPI-CH may be fine but not fully tested yet.)

First time to pull third_party submodules, please run:

git submodule update --init --recursive

0.1. If your development environment have GPUs (Optional)

In order to get full support of GPU, you have to install CUDA )(>=10.0) and install pytorch and/or tensorflow with the GPU support version. To maximize the efficiency of GPU and MPI, we implementation assumes the MPI installed is GPU-aware if GPU is available. It will avoid the extra cost that copy and moving the data from the GPU and host memory, i.e. the address of GPU location can be used directly. However, if MPI built is not GPU-aware, there will be a segmentation fault. To do that, you can call this command after the download of OpenMPI:

 ./configure --prefix={YOUR_OWN_PREFIX} --with-cuda

WARNING: Making sure that there is only one MPI implementation is installed on your machine.

It is highly recommended to use NCCL instead of OpenMPI as GPU communication implementation. We require the NCCL>=2.7 since our implementation heavily relied on the new ncclSend and ncclRecv API introduced after version 2.7.

1. Install the package locally

Under the root folder, i.e. python folder, run the this command first.

pip install -e . --verbose

The -e means “editable”, so changes you make to files in the bluefog directory will take effect without reinstalling the package. In contrast, if you do python setup.py install, files will be copied from the bluefog directory to a directory of Python packages (often something like /home/{USER}/anaconda3/lib/python3.7/site-packages/bluefog). This means that changes you make to files in the bluefog directory will not have any effect.

If your system installed with NCCL >= 2.7, you should install BLUEFOG with NCCl by following command:

BLUEFOG_WITH_NCCL=1 pip install -e . --verbose

2. Build customer C extension (Optional)

Note: By default, the above pip install command already included the building C extension step. Following is a more detailed explanation for that.

We heavily relied on the C extension in this project. Unlike python file, whenever you modified the C files, you have to re-compile it and generate the shared library out.

python setup.py build_ext -i

where -i means the in-place build. If your environment is fine, it should be able to generate a file like mpi_lib.cpython-37m-darwin.so under /bluefog/torch folder. (You may have different "middle" name based on your system and enviroment).

3. Run Test

To check the setup and build is correct or not, run

make test

to see if all tests can pass or not. It tests all cases. If you want to test some particular cases, you can run the command like

BLUEFOG_LOG_LEVEL=debug bfrun -np 4 pytest -s test/torch_win_ops_test.py::WinOpsTests:test_func

where the environment variable BLUEFOG_LOG_LEVEL=debug changes the print logs in C++ from default error level to debug level, -s is the flag in pytest to disable capturing the logs, and the pattern for test path is file_path::ClassName::FuncName.

4. Code Style And Lint

It is important to keep the code style and lint consistent throughout the whole project.

For python, we use normal pylint, which is specified in the .pylintrc file. Python docstring style is google style. It is recommended to have an editor to run pylint easily (But do not turn on format-on-save.) Otherwise, remember to run pylint {FILENANE} on your changes.

For C++, we use clang-tidy? I am not very familiar with C++ format. Right now, we have a simple .clang-format file. I just use vscode clang-format plugin.

5. FAQ for Mac users:

  1. If my default python version is 2, how to set the default python version to 3?

Answer: alias python=python3

  1. If my pytorch is not well installed, how to reinstall pytorch?

Answer: Uninstall torch: pip uninstall torch

Install torch: pip3 install torch torchvision

  1. I got the following error when executing "BLUEFOG_LOG_LEVEL=debug mpirun -n 2 python bluefog/torch/c_hello_world.py". How to address this issue? Error: File "bluefog/torch/c_hello_world.py", line 36 print(f"Rank: {rank}, local rank: {local_rank} Size: {size}, local size: {local_size}") ^ SyntaxError: invalid syntax

Answer: you should precise python3 using the following command: BLUEFOG_LOG_LEVEL=debug mpirun -n 2 python3 bluefog/torch/c_hello_world.py

  1. If I get the following error when executing "make test" command, how to address this issue? Error: Test error: There are not enough slots available in the system to satisfy the 4

Answer: Reason: there are not enough physical CPU cores (the test requires 4) in your machine. In order to address this issue, you should first use "sysctl hw.physicalcpu hw.logicalcpu" command to know the number of physical CPU cores. Assume that you have 2 physical CPU cores in your machine, you need to modify 4 in python/Makefile to 2. Then, the issue is resolved.