Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from CTuning #1077

Merged
merged 31 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
07f335d
Support multistream runs for MIL
arjunsuresh Jan 27, 2024
c13d81f
Minor fixes
arjunsuresh Jan 27, 2024
96cb63d
Fix benchmark-any script
arjunsuresh Jan 27, 2024
4e29797
Fix benchmark-any script
arjunsuresh Jan 27, 2024
bcd65fb
- fixed yaml generation when adding new scripts
gfursin Jan 27, 2024
38cb2df
Merge branch 'master' of https://github.com/ctuning/mlcommons-ck
gfursin Jan 27, 2024
fd1e215
Merge branch 'mlcommons:master' into master
gfursin Jan 27, 2024
707472b
cleaning up meta
gfursin Jan 27, 2024
83b7ec7
Added CM automation recipe "fail" to check various conditions such as
gfursin Jan 27, 2024
dd42772
clean up
gfursin Jan 27, 2024
86abad0
cleaned up reproduce IPOL paper script
gfursin Jan 27, 2024
c2e7050
added "create-patch" automation recipe; cleaned up some reproducibili…
gfursin Jan 27, 2024
9cd7f7c
clean up
gfursin Jan 27, 2024
9f22fd4
Added possibility to download CM automation recipes snapshots
gfursin Jan 27, 2024
f642856
clean up
gfursin Jan 28, 2024
540ae08
added support to find scripts using simplified CLI
gfursin Jan 28, 2024
c498a97
Fixes for mlperf power server on windows (works now)
arjunsuresh Jan 28, 2024
b2c2cfe
Fix GH auth in mlperf-power-server
arjunsuresh Jan 28, 2024
f45837c
Support device and ports in CM docker
arjunsuresh Jan 28, 2024
e36ec38
Fix typo
arjunsuresh Jan 28, 2024
2bd9d25
Support gh_token in cm docker
arjunsuresh Jan 28, 2024
1abe0ee
Intel GPTJ WIP
arjunsuresh Jan 28, 2024
b03c04f
Cache option for cuda devices
arjunsuresh Jan 28, 2024
6f25772
Fixes for benchmark any script
arjunsuresh Jan 28, 2024
a698d58
Added some default variations for sdxl
arjunsuresh Jan 28, 2024
327cddc
Fix for tflite power
arjunsuresh Jan 28, 2024
8664c77
Added win32 for windows power server
arjunsuresh Jan 28, 2024
844c159
Merge branch 'mlcommons:master' into master
arjunsuresh Jan 28, 2024
2dbf70a
Update mlperf-inference-power-measurement.md
arjunsuresh Jan 28, 2024
83dd52b
Support default_variations in combined variations
arjunsuresh Jan 28, 2024
31b7743
Update mlperf inference version to 4.0
arjunsuresh Jan 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 27 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,40 +23,47 @@ from [MLCommons projects](https://mlcommons.org) and [research papers](https://c
in a unified way on any operating system with any software and hardware
either natively or inside containers.

Here are some most commonly used examples from the community:
Here are a few most commonly used examples from the CM users
that should run in the same way on Linux, MacOS, Windows and other platforms
(see [Gettings Started Guide](docs/getting-started.md) to understand
how they work and how to reuse them in your projects):

```bash
pip install cmind

cm pull repo mlcommons@ck

cm run script "python app image-classification onnx"
cmr "python app image-classification onnx"

cm run script "download file _wget" --url=https://cKnowledge.org/ai/data/computer_mouse.jpg --verify=no --env.CM_DOWNLOAD_CHECKSUM=45ae5c940233892c2f860efdf0b66e7e

cm run script "python app image-classification onnx" --input=computer_mouse.jpg
cmr "download file _wget" --url=https://cKnowledge.org/ai/data/computer_mouse.jpg --verify=no --env.CM_DOWNLOAD_CHECKSUM=45ae5c940233892c2f860efdf0b66e7e
cmr "python app image-classification onnx" --input=computer_mouse.jpg

cm show cache
cm rm cache -f

cmr "python app image-classification onnx _cuda" --input=computer_mouse.jpg

cmr "cm gui" --script="python app image-classification onnx"
cm find script "python app image-classification onnx"

cm docker script "python app image-classification onnx" --input=computer_mouse.jpg
cm docker script "python app image-classification onnx" --input=computer_mouse.jpg -j -docker_it
cm docker script "python app image-classification onnx" --input=computer_mouse.jpg -j -docker_it
cmr "get python" --version_min=3.8.0 --name=mlperf-experiments
cmr "install python-venv" --version_max=3.10.11 --name=mlperf

cmr "get generic-python-lib _package.onnxruntime"
cmr "get coco dataset _val _2014"
cmr "get ml-model stable-diffusion"
cmr "get ml-model huggingface zoo _model-stub.alpindale/Llama-2-13b-ONNX" --model_filename=FP32/LlamaV2_13B_float32.onnx --skip_cache
cmr "get coco dataset _val _2014"

cm show cache
cm show cache "get ml-model stable-diffusion"

cmr "get generic-python-lib _package.onnxruntime" --version_min=1.16.0
cmr "python app image-classification onnx" --input=computer_mouse.jpg

cm rm cache -f
cmr "python app image-classification onnx" --input=computer_mouse.jpg --adr.onnxruntime.version_max=1.16.0


cmr "python app image-classification onnx _cuda" --input=computer_mouse.jpg

cmr "cm gui" --script="python app image-classification onnx"

cm docker script "python app image-classification onnx" --input=computer_mouse.jpg
cm docker script "python app image-classification onnx" --input=computer_mouse.jpg -j -docker_it

cmr "run common mlperf inference" --implementation=nvidia --model=bert-99 --category=datacenter --division=closed
cm find script "run common mlperf inference"
Expand All @@ -65,8 +72,6 @@ cm pull repo ctuning@cm-reproduce-research-projects
cmr "reproduce paper micro-2023 victima _install_deps"
cmr "reproduce paper micro-2023 victima _run"

...

```

```python
Expand All @@ -78,8 +83,9 @@ if output['return']==0: print (output)
```


Collective Mind is a community project being developed by the [MLCommons Task Force on Automation and Reproducibility](https://github.com/mlcommons/ck/blob/master/docs/taskforce.md)
with great help from [MLCommons (70+ AI organizations)](https://mlcommons.org/),
Collective Mind is a community project being developed by the
[MLCommons Task Force on Automation and Reproducibility](https://github.com/mlcommons/ck/blob/master/docs/taskforce.md)
with great help from [MLCommons (70+ AI organizations)](https://mlcommons.org/,
[research community]( https://www.youtube.com/watch?v=7zpeIVwICa4 )
and [individual contributors](https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md) -
we want to have a simple, non-intrusive, technology-agnostic, portable and easily-extensible interface
Expand All @@ -89,7 +95,7 @@ running experiments, processing logs, and reproducing results
without thinking where and how they run.

That is why we implemented CM as a [small Python library](https://github.com/mlcommons/ck/tree/master/cm)
with minimal dependencies (Python 3.7+, git, wget), simple Python API, StreamLit GUI and human-friendly command line.
with minimal dependencies (Python 3.7+, git, wget), simple Python API, GUI and human-friendly command line.
CM simply searches for CM scripts by tags or Unique IDs in all pulled Git repositories, automatically generates command lines
for a given script or tool on a given platform, updates all paths and environment variables,
runs a given automation either natively or inside automatically-generated containers
Expand All @@ -115,7 +121,7 @@ or have questions and suggestions.

### Documentation

* [Getting Started tutorial](docs/getting-started.md)
* [Getting Started Guide](docs/getting-started.md)
* [CM interface for MLPerf benchmarks](docs/mlperf)
* [CM interface for ML and Systems conferences](docs/tutorials/common-interface-to-reproduce-research-projects.md)
* [CM automation recipes for MLOps and DevOps](cm-mlops/script)
Expand Down
120 changes: 90 additions & 30 deletions cm-mlops/automation/script/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -1704,6 +1704,14 @@ def _update_variation_tags_from_variations(self, variation_tags, variations, var
v_static = self._get_name_for_dynamic_variation_tag(v)
tmp_variation_tags_static[v_i] = v_static

combined_variations = [ t for t in variations if ',' in t ]
# We support default_variations in the meta of cmbined_variations
combined_variations.sort(key=lambda x: x.count(','))
''' By sorting based on the number of variations users can safely override
env and state in a larger combined variation
'''
tmp_combined_variations = {k: False for k in combined_variations}

# Recursively add any base variations specified
if len(variation_tags) > 0:
tmp_variations = {k: False for k in variation_tags}
Expand Down Expand Up @@ -1748,28 +1756,34 @@ def _update_variation_tags_from_variations(self, variation_tags, variations, var
tag_to_append = None

# default_variations dictionary specifies the default_variation for each variation group. A default variation in a group is turned on if no other variation from that group is turned on and it is not excluded using the '-' prefix
if "default_variations" in variations[variation_name]:
default_base_variations = variations[variation_name]["default_variations"]
for default_base_variation in default_base_variations:
tag_to_append = None
r = self._get_variation_tags_from_default_variations(variations[variation_name], variations, variation_groups, tmp_variation_tags_static, excluded_variation_tags)
if r['return'] > 0:
return r

if default_base_variation not in variation_groups:
return {'return': 1, 'error': 'Default variation "{}" is not a valid group. Valid groups are "{}" '.format(default_base_variation, variation_groups)}
variations_to_add = r['variations_to_add']
for t in variations_to_add:
tmp_variations[t] = False
variation_tags.append(t)

unique_allowed_variations = variation_groups[default_base_variation]['variations']
# add the default only if none of the variations from the current group is selected and it is not being excluded with - prefix
if len(set(unique_allowed_variations) & set(tmp_variation_tags_static)) == 0 and default_base_variations[default_base_variation] not in excluded_variation_tags and default_base_variations[default_base_variation] not in tmp_variation_tags_static:
tag_to_append = default_base_variations[default_base_variation]
tmp_variations[variation_name] = True

if tag_to_append:
if tag_to_append not in variations:
variation_tag_static = self._get_name_for_dynamic_variation_tag(tag_to_append)
if not variation_tag_static or variation_tag_static not in variations:
return {'return': 1, 'error': 'Invalid variation "{}" specified in default variations for the variation "{}" '.format(tag_to_append, variation_name)}
variation_tags.append(tag_to_append)
tmp_variations[tag_to_append] = False
for combined_variation in combined_variations:
if tmp_combined_variations[combined_variation]:
continue
v = combined_variation.split(",")
all_present = set(v).issubset(set(variation_tags))
if all_present:
combined_variation_meta = variations[combined_variation]
tmp_combined_variations[combined_variation] = True

tmp_variations[variation_name] = True
r = self._get_variation_tags_from_default_variations(combined_variation_meta, variations, variation_groups, tmp_variation_tags_static, excluded_variation_tags)
if r['return'] > 0:
return r

variations_to_add = r['variations_to_add']
for t in variations_to_add:
tmp_variations[t] = False
variation_tags.append(t)

all_base_processed = True
for variation_name in variation_tags:
Expand All @@ -1785,6 +1799,31 @@ def _update_variation_tags_from_variations(self, variation_tags, variations, var
return {'return': 0}


def _get_variation_tags_from_default_variations(self, variation_meta, variations, variation_groups, tmp_variation_tags_static, excluded_variation_tags):
# default_variations dictionary specifies the default_variation for each variation group. A default variation in a group is turned on if no other variation from that group is turned on and it is not excluded using the '-' prefix

tmp_variation_tags = []
if "default_variations" in variation_meta:
default_base_variations = variation_meta["default_variations"]
for default_base_variation in default_base_variations:
tag_to_append = None

if default_base_variation not in variation_groups:
return {'return': 1, 'error': 'Default variation "{}" is not a valid group. Valid groups are "{}" '.format(default_base_variation, variation_groups)}

unique_allowed_variations = variation_groups[default_base_variation]['variations']
# add the default only if none of the variations from the current group is selected and it is not being excluded with - prefix
if len(set(unique_allowed_variations) & set(tmp_variation_tags_static)) == 0 and default_base_variations[default_base_variation] not in excluded_variation_tags and default_base_variations[default_base_variation] not in tmp_variation_tags_static:
tag_to_append = default_base_variations[default_base_variation]

if tag_to_append:
if tag_to_append not in variations:
variation_tag_static = self._get_name_for_dynamic_variation_tag(tag_to_append)
if not variation_tag_static or variation_tag_static not in variations:
return {'return': 1, 'error': 'Invalid variation "{}" specified in default variations for the variation "{}" '.format(tag_to_append, variation_name)}
tmp_variation_tags.append(tag_to_append)

return {'return': 0, 'variations_to_add': tmp_variation_tags}

############################################################
def version(self, i):
Expand Down Expand Up @@ -1824,6 +1863,15 @@ def search(self, i):

console = i.get('out') == 'con'

# Check simplified CMD: cm run script "get compiler"
# If artifact has spaces, treat them as tags!
artifact = i.get('artifact','')
if ' ' in artifact: # or ',' in artifact:
del(i['artifact'])
if 'parsed_artifact' in i: del(i['parsed_artifact'])
# Force substitute tags
i['tags']=artifact.replace(' ',',')

############################################################################################################
# Process tags to find script(s) and separate variations
# (not needed to find scripts)
Expand Down Expand Up @@ -2125,6 +2173,16 @@ def add(self, i):
tags_list = utils.convert_tags_to_list(i)
if 'tags' in i: del(i['tags'])

if len(tags_list)==0:
if console:
x=input('Please specify a combination of unique tags separated by comma for this script: ')
x = x.strip()
if x!='':
tags_list = x.split(',')

if len(tags_list)==0:
return {'return':1, 'error':'you must specify a combination of unique tags separate by comman using "--new_tags"'}

# Add placeholder (use common action)
ii['out']='con'
ii['common']=True # Avoid recursion - use internal CM add function to add the script artifact
Expand All @@ -2150,18 +2208,19 @@ def add(self, i):

# Check if preloaded meta exists
meta = {
'cache':False,
'new_env_keys':[],
'new_state_keys':[],
'input_mapping':{},
'docker_input_mapping':{},
'deps':[],
'prehook_deps':[],
'posthook_deps':[],
'post_deps':[],
'versions':{},
'variations':{},
'input_description':{}
'cache':False
# 20240127: Grigori commented that because newly created script meta looks ugly
# 'new_env_keys':[],
# 'new_state_keys':[],
# 'input_mapping':{},
# 'docker_input_mapping':{},
# 'deps':[],
# 'prehook_deps':[],
# 'posthook_deps':[],
# 'post_deps':[],
# 'versions':{},
# 'variations':{},
# 'input_description':{}
}

fmeta = os.path.join(template_path, self.cmind.cfg['file_cmeta'])
Expand Down Expand Up @@ -2200,6 +2259,7 @@ def add(self, i):
ii['yaml']=True

ii['automation']='script,5b4e0237da074764'

r_obj=self.cmind.access(ii)
if r_obj['return']>0: return r_obj

Expand Down
21 changes: 19 additions & 2 deletions cm-mlops/automation/script/module_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1384,6 +1384,8 @@ def dockerfile(i):
fake_run_deps = i.get('fake_run_deps', docker_settings.get('fake_run_deps', False))
docker_run_final_cmds = docker_settings.get('docker_run_final_cmds', [])

gh_token = i.get('docker_gh_token')

if i.get('docker_real_run', False):
fake_run_option = " "
fake_run_deps = False
Expand Down Expand Up @@ -1436,6 +1438,9 @@ def dockerfile(i):
'real_run': True
}

if gh_token:
cm_docker_input['gh_token'] = gh_token

r = self_module.cmind.access(cm_docker_input)
if r['return'] > 0:
return r
Expand Down Expand Up @@ -1646,8 +1651,12 @@ def docker(i):

all_gpus = i.get('docker_all_gpus', docker_settings.get('all_gpus'))

device = i.get('docker_device', docker_settings.get('device'))

gh_token = i.get('docker_gh_token')

port_maps = i.get('docker_port_maps', docker_settings.get('port_maps', []))



# # Regenerate run_cmd
# if i.get('cmd'):
Expand Down Expand Up @@ -1678,7 +1687,6 @@ def docker(i):
print (run_cmd)
print ('')


cm_docker_input = {'action': 'run',
'automation': 'script',
'tags': 'run,docker,container',
Expand Down Expand Up @@ -1709,6 +1717,15 @@ def docker(i):
if all_gpus:
cm_docker_input['all_gpus'] = True

if device:
cm_docker_input['device'] = device

if gh_token:
cm_docker_input['gh_token'] = gh_token

if port_maps:
cm_docker_input['port_maps'] = port_maps

print ('')


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ See [this tutorial](https://github.com/mlcommons/ck/blob/master/docs/tutorials/m

# Collaborative testing

## Windows 11

* CUDA 11.8; cuDNN 8.7.0; ONNX GPU 1.16.1

## Windows 10

* ONNX Runtime 1.13.1 with CPU and CUDA
* CUDA 11.6
* cuDNN 8.5.0.96
* CUDA 11.6; cuDNN 8.6.0.96; ONNX GPU 1.13.1

## Ubuntu 22.04

* ONNX Runtime 1.12.0 with CPU and CUDA
* CUDA 11.3
* CUDA 11.3; ONNX 1.12.0
Loading
Loading