mamba create -n <your-env-name> python=3.10 # or -p <your-env-path> python=3.10
mamba activate <your-env-path>
cd <base-path-to-repo>
bash shell/install-packages.sh --env_path <your-env-path>
Edit the submit-batch.sh
file to set your account
From: #SBATCH -A xxyyyyyy
To: #SBATCH -A <your-credit-account>
sbatch -N <node-count> submit-batch.sh
To set the default options, you can create the pre-submit.sh
file in the base directory of the repository.
These are the required options that needs to be set before submitting the job.
export PROJ_PATH=<base-path-to-repo>
export SHARED_PATH=<your-shared-path>
export CACHE_PATH=<your-cache-path>
export ENV_PATH=<your-env-path>
# and export more options below...
The following options are used for arguments
and pre-submit file
for when using arguments with the script.
(N/A means that the option isn't available for either arguments
or pre-submit file
.)
Arguments |
Pre-Submit file |
Default Value |
Description |
---|---|---|---|
--nthreads |
NTHREADS |
8 |
Specifies the number of CPU helper threads used per network connection for socket transport. |
--pthreads |
PTHREADS |
2 |
Specifies the number of sockets opened by each helper thread of the socket transport. |
--batch_size |
BATCH_SIZE |
4 |
Specifies the batch size. |
--deepspeed_stage |
DEEPSPEED_STAGE |
2 |
Specifies the deepspeed stage. |
--model_size |
MODEL_SIZE |
7b |
Specifies the model size. |
--task |
TASK |
default |
Specifies the task. More Details |
--run_with |
RUN_WITH |
conda |
Specifies the environment to run the script. |
N/A |
PROJ_PATH |
Specifies the path to the repository. | |
N/A |
SHARED_PATH |
Specifies the shared path. | |
N/A |
CACHE_PATH |
Specifies the cache path. | |
--env_path |
ENV_PATH |
Specifies the path to the environment. |
Options are:
nccl
: To name the log folder as NCCL testing structureNTHREADS
nth-PTHREADS
pth-SLURM_JOB_ID
scaling
: To name the log folder for scalingSTAGE
-MODEL_SIZE
/COUNT_NODE
n-BATCH_SIZE
b-SLURM_JOB_ID
default
: To name the log folder as defaultCOUNT_NODE
n-BATCH_SIZE
b-SLURM_JOB_ID
This repository is created and hard-forked based on SuperAI_LLM_FineTune