Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various fixes and GPU challenge update #880

Merged
merged 27 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c006be4
Update run-nvidia-gpu-gpt-j-6b-ref-pytorch.md
arjunsuresh Jul 27, 2023
c930a2c
Update run-nvidia-gpu-gpt-j-6b-ref-pytorch.md
arjunsuresh Jul 27, 2023
9140284
added cm rerun experiment
gfursin Jul 27, 2023
f53cda8
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Jul 27, 2023
bf9887c
clean up docs
gfursin Jul 27, 2023
ef3b91c
clean up
gfursin Jul 27, 2023
b8ce2e3
Update README_reference.md
arjunsuresh Jul 27, 2023
8845c65
Update run-nvidia-gpu-gpt-j-6b-ref-pytorch.md
arjunsuresh Jul 27, 2023
cffc581
Dont need cudnn for pytorch
arjunsuresh Jul 27, 2023
4dc3a68
Added more server scenario tuning options for nvidia-harness
arjunsuresh Jul 27, 2023
f285985
clean up
gfursin Jul 27, 2023
c6f0b38
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Jul 27, 2023
c98bcc6
Added FixMe option for system desc fields
arjunsuresh Jul 27, 2023
70f82e7
Update run-nvidia-gpu-gpt-j-6b-ref-pytorch.md
arjunsuresh Jul 27, 2023
2e0284f
extra fixes
gfursin Jul 27, 2023
a40419a
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Jul 27, 2023
2653ca4
extra clean up
gfursin Jul 27, 2023
a72ff67
clean up
gfursin Jul 27, 2023
5e51369
Merge branch 'mlcommons:master' into master
gfursin Jul 27, 2023
4c983f0
minor fixes
gfursin Jul 27, 2023
685624c
Fix for some SUT desc fields missing
arjunsuresh Jul 27, 2023
8951a23
Fix for missing system fields
arjunsuresh Jul 27, 2023
5419d81
Fixes for pacman
arjunsuresh Jul 27, 2023
36a385c
Fixes for pacman
arjunsuresh Jul 27, 2023
c318520
Fixes for pacman
arjunsuresh Jul 27, 2023
acdc2c1
Fixes for pacman
arjunsuresh Jul 27, 2023
f56051a
Fixes for pacman
arjunsuresh Jul 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions cm-mlops/automation/experiment/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ def run(self, i):
(script_tags) (str): find and run CM script by tags
(stags)

(rerun) (bool): if True, rerun experiment in a given entry/directory instead of creating a new one...

(explore) (dict): exploration dictionary

...
Expand Down Expand Up @@ -119,6 +121,44 @@ def run(self, i):
# Get directory with datetime
datetime = i.get('dir','')

if datetime == '' and i.get('rerun', False):
# Check if already some dir exist

directories = os.listdir(experiment_path)

datetimes = sorted([f for f in directories if os.path.isfile(os.path.join(experiment_path, f, self.CM_RESULT_FILE))], reverse=True)

if len(datetimes)==1:
datetime = datetimes[0]
elif len(datetimes)>1:
print ('')
print ('Select experiment:')

datetimes = sorted(datetimes)

num = 0
print ('')
for d in datetimes:
print ('{}) {}'.format(num, d.replace('.',' ')))
num += 1

if not console:
return {'return':1, 'error':'more than 1 experiment found.\nPlease use "cm rerun experiment --dir={date and time}"'}

print ('')
x=input('Make your selection or press Enter for 0: ')

x=x.strip()
if x=='': x='0'

selection = int(x)

if selection < 0 or selection >= num:
selection = 0

datetime = datetimes[selection]


if datetime!='':
experiment_path2 = os.path.join(experiment_path, datetime)
else:
Expand Down Expand Up @@ -373,9 +413,17 @@ def run(self, i):



############################################################
def rerun(self, i):
"""
Rerun experiment



cm run experiment --rerun=True ...
"""

i['rerun']=True

return self.run(i)



Expand Down Expand Up @@ -479,7 +527,7 @@ def replay(self, i):
num += 1

if not console:
return {'return':1, 'error':'more than 1 experiment found.\nPlease use "cm run experiment --datetime={date and time}"'}
return {'return':1, 'error':'more than 1 experiment found.\nPlease use "cm run experiment --dir={date and time}"'}

print ('')
x=input('Make your selection or press Enter for 0: ')
Expand Down
1 change: 1 addition & 0 deletions cm-mlops/challenge/run-mlperf@home-v3.1-cpu/_cm.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"mlperf-inference-v3.1-simple-cpu"
],
"title": "MLPerf@home: help the community find the most efficient CPUs (Intel/AMD/Arm) for BERT and MobileNets/EfficientNets (latency, throughput, accuracy, number of cores, frequency, memory size, cost and other metrics)",
"skip": true,
"trophies": true,
"uid": "498f33f3dac647c1"
}
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,31 @@ cm pull repo mlcommons@ck
We suggest you to setup a Python virtual environment via CM to avoid contaminating your existing Python installation:

```bash
cm run script "install python-venv" --name=crowd-mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=crowd-mlperf"
cm run script "install python-venv" --name=mlperf --version_min=3.8
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
```

CM will install a new Python virtual environment in CM cache and will install all Python dependencies there:
```bash
cm show cache --tags=python-venv
```

Note that CM downloads and/or installs models, data sets, packages, libraries and tools in this cache.

You can clean it at any time and start from scratch using the following command:
```bash
cm rm cache -f
```

Alternatively, you can remove specific entries using tags:
```bash
cm show cache
cm rm cache --tags=tag1,tag2,...
```




### Do a test run to detect and record the system performance

```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,26 @@ cm pull repo mlcommons@ck
We suggest you to setup a Python virtual environment via CM to avoid contaminating your existing Python installation:

```bash
cm run script "install python-venv" --name=crowd-mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=crowd-mlperf"
cm run script "install python-venv" --name=mlperf --version_min=3.8
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
```

CM will install a new Python virtual environment in CM cache and will install all Python dependencies there:
```bash
cm show cache
cm show cache --tags=python-venv
```

Note that CM downloads and/or installs models, data sets, packages, libraries and tools in this cache.

You can clean it at any time and start from scratch using the following command:
```bash
cm rm cache -f
```

Alternatively, you can remove specific entries using tags:
```bash
cm show cache
cm rm cache --tags=tag1,tag2,...
```


10 changes: 3 additions & 7 deletions cm-mlops/challenge/run-mlperf@home-v3.1-gpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ in a native environment or Docker container using the portable and technology-ag
[MLCommons Collective Mind automation language (CM)](https://doi.org/10.5281/zenodo.8105339).

Your name and benchmark submissions will be published in the official MLCommons inference v3.1 results
on September 1, 2023 (submission deadline: August 4, 2023),
on September 1, 2023 (**submission deadline: August 3, 2023**),
will be published in the [official leaderboard](https://access.cknowledge.org/playground/?action=contributors),
will be included to the prize draw, and will be presented in our upcoming ACM/HiPEAC events
and joint white paper about crowd-benchmarking AI/ML systems similar to SETI@home.
Expand Down Expand Up @@ -44,12 +44,8 @@ Thank you in advance for helping the community find Pareto-efficient AI/ML Syste

### Instructions to run benchmarks and submit results

You can run any of these benchmarks or all depending on available time:

* [GPT-J 6B model; Reference MLPerf implementation; native environment or Docker; PyTorch+CUDA](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-gpt-j-6b-ref-pytorch.md)
* [BERT-99 model; Nvidia MLPerf implementation; Docker; TensorRT](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-bert-99-nvidia-docker-tensorrt.md)
* [BERT-99 model; Reference MLPerf implementation; native environment; PyTorch/ONNX/TensorFlow](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-bert-99-ref-native-onnx-pytorch-tf.md)
* [BERT-99 model; Nvidia MLPerf implementation; native environment; TensorRT](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-bert-99-nvidia-native-tensorrt.md)
* [GPT-J 6B model (24GB min GPU memory); PyTorch+CUDA; native environment](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-gpt-j-6b-ref-pytorch.md)
* [BERT-99 model (8GB min GPU memory); TensorRT; Docker](https://github.com/mlcommons/ck/blob/master/cm-mlops/challenge/run-mlperf%40home-v3.1-gpu/run-nvidia-gpu-bert-99-nvidia-docker-tensorrt.md)

### Results

Expand Down
3 changes: 1 addition & 2 deletions cm-mlops/challenge/run-mlperf@home-v3.1-gpu/_cm.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
"date_open": "20230725",
"experiments": [],
"points": 2,
"prize": "200$ for the top 3 submitters",
"prize_short": "co-authoring white paper , $$$",
"prize_short": "helping the community , co-authoring white paper , $$$",
"sort": -30,
"tags": [
"run",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# Introduction

This guide will help you automatically run the Nvidia implementation of the MLPerf inference benchmark v3.1
This guide will help you run the Nvidia implementation of the MLPerf inference benchmark v3.1
with BERT-99 model and TensorRT on any Linux-based system with Nvidia GPU (8..16GB min memory required)
and Docker.

This benchmark is automated by the MLCommons CM language and you should be able to submit official MLPerf v3.1 inference results
for all scenarios in closed division and edge category.
This benchmark is semi-automated by the [MLCommons CM language](https://doi.org/10.5281/zenodo.8105339)
and you should be able to submit official MLPerf v3.1 inference results
for all scenarios in closed division and edge category
(**deadline to send us results for v3.1 submission: August 3, 2023**).


It will require ~30GB of disk space and can take ~2 hours to run on 1 system.

Expand Down Expand Up @@ -140,17 +143,29 @@ to an Amazon S3 bucket containing all the needed files to automatically download

```
cmr "generate-run-cmds inference _find-performance _all-scenarios" \
--model=bert-99 --implementation=nvidia-original --device=cuda --backend=tensorrt \
--category=edge --division=open --quiet --test_query_count=1000
--model=bert-99 \
--implementation=nvidia-original \
--device=cuda \
--backend=tensorrt \
--category=edge \
--division=closed \
--test_query_count=1000 \
--quiet
```

### Do full accuracy and performance runs

```
cmr "generate-run-cmds inference _submission _allscenarios" --model=bert-99 \
--device=cuda --implementation=nvidia-original --backend=tensorrt \
--execution-mode=valid --results_dir=$HOME/results_dir \
--category=edge --division=open --quiet
cmr "generate-run-cmds inference _submission _allscenarios" \
--model=bert-99 \
--device=cuda \
--implementation=nvidia-original \
--backend=tensorrt \
--execution-mode=valid \
--results_dir=$HOME/results_dir \
--category=edge \
--division=closed \
--quiet
```

* `--offline_target_qps` and `--singlestream_target_latency` can be used to override the determined performance numbers
Expand All @@ -159,12 +174,23 @@ cmr "generate-run-cmds inference _submission _allscenarios" --model=bert-99 \

```
cmr "generate-run-cmds inference _populate-readme _all-scenarios" \
--model=bert-99 --device=cuda --implementation=nvidia-original --backend=tensorrt \
--execution-mode=valid --results_dir=$HOME/results_dir \
--category=edge --division=open --quiet
--model=bert-99 \
--device=cuda \
--implementation=nvidia-original \
--backend=tensorrt \
--execution-mode=valid \
--results_dir=$HOME/results_dir \
--category=edge \
--division=closed \
--quiet
```

### Generate and upload MLPerf submission

Follow [this guide](../Submission.md) to generate the submission tree and upload your results.


## Questions? Suggestions?

Don't hesitate to get in touch with the community and organizers
via [public Discord server](https://discord.gg/JjWNWXKxwT).
Loading
Loading