Skip to content

Commit

Permalink
Resuming CM dev meetings (#1277)
Browse files Browse the repository at this point in the history
  • Loading branch information
gfursin authored Aug 1, 2024
2 parents 433a733 + 478f4e1 commit faf462a
Show file tree
Hide file tree
Showing 4 changed files with 128 additions and 9 deletions.
17 changes: 8 additions & 9 deletions ck/ck/kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@


# We use 3 digits for the main (released) version and 4th digit for development revision
__version__ = "2.6.3"
__version__ = "2.6.4"
# Do not use characters (to detect outdated version)!

# Import packages that are global for the whole kernel
Expand Down Expand Up @@ -6745,9 +6745,8 @@ def short_help(i):
# .replace(' ','')+'\n'
h += '\n'+cfg['help_web'].replace('\n', '').strip()+'\n'

h += 'CK Google group: https://bit.ly/ck-google-group\n'
h += 'CK Slack channel: https://cKnowledge.org/join-slack\n'
h += 'Stable CK components: https://cknow.io'
h += 'CK white paper: https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0211\n'
h += 'CK ACM TechTalk: https://learning.acm.org/techtalks/reproducibility\n'

if o == 'con':
out(h)
Expand Down Expand Up @@ -12385,11 +12384,11 @@ def access(i):
o = i.get('out', '')

# Print message that this framework was discontinued
if o == 'con':
out('')
out('WARNING: this framework was discontinued in favor of the new CK2 framework aka CM being developed by the open taskforce on automation and reproducibility at MLCommons:')
out(' https://bit.ly/mlperf-edu-wg')
out('')
# if o == 'con':
# out('')
# out('WARNING: this framework was discontinued in favor of the new CK2 framework aka CM being developed by the open taskforce on automation and reproducibility at MLCommons:')
# out(' https://bit.ly/mlperf-edu-wg')
# out('')

# If profile
cp = i.get('ck_profile', '')
Expand Down
1 change: 1 addition & 0 deletions dev/common-paper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
118 changes: 118 additions & 0 deletions dev/meetings/20240731.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Topic

Syncing about the next steps for CM, CM4MLOps, CM4MLPerf, CM4ABTF, etc.

# People

* Grigori Fursin
* Arjun Suresh

# Discussion

## Need proper attribution

Need to agree on the common text in the CM documentation and CM, CM4MLOps and CM4MLPerf GitHub
using these examples:

* https://github.com/spack/spack?tab=readme-ov-file#authors
* https://cython.org (see Financial contributions section)

* Author/creator
* Core developers
* Contributors (from the community, MLCommons and the Automation and Reproducibility TaskForce):
See https://github.com/mlcommons/ck/blob/master/CONTRIBUTING.md .
* Sponsorship & financial Contributions

Should add to main GitHub and docs.mlcommons.org ...

## Remove/reduce dependencies on non-MLCommons GitHub repositories

At this moment, various non-MLCommons GATEOverflow GitHub repositories are used
in the official MLPerf workflows by default - that creates many possible legal issues
for CM and MLPerf users.

We should either move all such repositories to MLCommons
or, if it's not easily possible, create another neutral GitHub ID
such as mlcommons-aux with a clear governances and agreement
with MLCommons to keep all dependencies in the MLCommons space.

## Improve cm4mlops package

Current cm4mlops package hides extra installation of various system dependencies
and CM repositories while using non-default branches and is difficult to debug if something goes wrong.

A most standard way is to install cmind package and have a function to bootstrap cm4mlops
with a proper control over the flow, CM repositories and branches that can be changed via flags.

For example:
```bash
pip install cmind
cm bootstrap cm4mlperf
cm bootstrap cm4mlops --branch=mlperf-inference
...
```

That will perform the same functions as cm4mlops package but will be easy to debug and will have easy to trace errors
that can be used in GitHub actions or other CI

To be brainstormed further ...

## Coordinate further developments

* Document the roadmap and responsibilities for Q3-Q4 2024
* Regular dev meetings (once a week or every two weeks)?
* Resume Discord channel discussions or mailing list (to be able to track discussions)?

## CM4MLPerf inference v4.1 & v5.0 automation

* Add CM for as many v4.1 submissions as possible to make it easier for everyone to reproduce results shortly after publication of results.
* Sync on the plans for inf v5.0 with MLCommons

## CM4ABTF automation

* Sync on the next steps during next meetings

## Collaboration with Croissant

* Sync on the next steps during next meetings

## Testing infrastructure for CM4MLOps and CM4MLPerf

* GitHub actions are not enough to test all dependencies and their versions for diverse hardware for CM-MLPerf workflows.
Brainstorm infrastructure for continuous testing (Grigori started prototyping some infrastructure).

## Optimize MLPerf inference reference implementations

* We need to add known optimizations to the MLPerf inference implementations

## Support MLPerf training

* We should start prototyping the unified CM interface and automation for MLPerf training and wrap existing MLCube tasks

## Prepare tutorials

* Sync on the tutorial about CM internals and scripts
* Sync on the tutorial for SCC'24

## Common paper

* Start preparing a common paper about CM on GitHub

## Collect feedback from companies

* There were various discussions with MLCommons companies about using CM for reproducibility.
We need to collect and aggregate all the feedback in one place.

## Next generation of CM

Grigori started testing some ideas and prototyping the next generation of CM, CM4MLOps and CM4MLPerf
bsaed on 3 years of using CM to modularize and automate MLPerf and will share notes in the future dev meetings.

## Sync with MLCommons

* Prepare official CM page - should we do it with the MLPerf in v4.1 release?
* Prepare Press-release about CM with MLPerf inf v4.1 release?
* Where to host CM developments and discussions within MLCommons?
* Infra WG?
* Create a new *official* taskforce or WG on automation and reproducibility?

1 change: 1 addition & 0 deletions dev/reproducibility/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD

0 comments on commit faf462a

Please sign in to comment.