CPU support? #4

pablogranolabar · 2022-01-07T05:29:22Z

Hi, very neat project.

Question: is it possible to use FTPipe with massively parallel CPU clusters? Say for example 256 VMs?

saareliad · 2022-03-07T12:32:15Z

Hi @pablogranolabar, tweaks will be needed, but it can be made possible.

Should consider the following parts:

Distributed execution should work out of the box (I did a small PoC of a distirbuted execution with 2 machines via openMPI)
All partitioned configurations can be returned on CPU using DEBUG option, e.g.,:

FTPipe/models/partitioned/t5_3b_tied_lmheads_320_8_8p_bw12_async_squad1_mpipe.py

Line 45 in c3d8530

def create_pipeline_configuration(DEBUG=False, batch_size=4):
The pipeline runtime can work with CPU:. add a line with "cpu": true to the json config

FTPipe/pipe/prepare_pipeline.py

Line 302 in c3d8530

device = torch.device('cpu' if args.cpu else f"cuda:{local_device_id}")

I kept a file with all options here, e.g.,

FTPipe/pipe/configs/all_options.json

Line 67 in c3d8530

"cpu": false,
partitioning Analysis can run on cpus. see here
profiling is currently written to be hardcoded just for GPUs, but it should be very easy to change.
Would need to change several functions here so profiling would be done on CPU.

Finally, there are some partitioning heuristics which would need to be changed according to your system, e.g., memory threshold in the master branch is hardcoded to 11GB for RTX2080ti:

FTPipe/autopipe/autopipe/model_partitioning/heuristics.py

Line 327 in c3d8530

THRESHOLD = 11 * 1e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU support? #4

CPU support? #4

pablogranolabar commented Jan 7, 2022

saareliad commented Mar 7, 2022

CPU support? #4

CPU support? #4

Comments

pablogranolabar commented Jan 7, 2022

saareliad commented Mar 7, 2022