Before sending your pull requests, make sure you followed this list.
- Read contributing guidelines
- Ensure you have signed the Contributor License Agreement (CLA).
- (Note: additional technical details TBD by community.)
We'd love to accept your patches! Before we can take them, we have to jump a couple of legal hurdles.
Please fill out either the individual or corporate Contributor License Agreement (CLA).
- If you are an individual writing original source code and you're sure you own the intellectual property, then you'll need to sign an individual CLA.
- If you work for a company that wants to allow you to contribute your work, then you'll need to sign a corporate CLA.
Follow either of the two links above to access the appropriate CLA and instructions for how to sign and return it. Once we receive it, we'll be able to accept your pull requests.
NOTE: Only original source code from you and other people that have signed the CLA can be accepted into the main repository. (Note: we need to modify this to allow third party code under Apache2 or MIT license with additional review.)
If you have improvements to MLPerf, send us your pull requests! For those just getting started, Github has a howto.
(Note: Technical details TBD by community.)
(Note: Technical details TBD by community.)
Include a license at the top of new files.
(Note: Technical details TBD by community.)
(Note: Technical details TBD by community.)
(Note: Technical details TBD by community.)
(Note: Technical details TBD by community.)
-
Reference repository code must run without error on reference hardware (1xV100) on day of benchmark reference freeze.
a. The Reference Platform(s) will be reviewed and updated as part of the MLPerf benchmark roadmapping process.
-
Compute must be done in full fp32 precision for any math.
-
Max runtime is 7 days on 1x V100, fp32.
a. An exception from the 7-day @ 1 GPU rule can only come from the Submitter's Working Group.
-
Implementation should be minimalistic.
a. Remove redundant files and features not relevant to the reference
b. Minimal set of dependencies
c. Avoid not obvious or hacky solutions (e.g. monkey patching), code should be easy to read and straightforward
-
Command-line arguments:
a. There must be a command line parameter for every tunable hyperparameter.
b. Constraints on tunable hyperparameters must be reflected in command line parameter setup (e.g. hyperparameters that must be integers take only integer command line args, not floats) to minimize risk of accidentally running an illegal config.
c. There may be command line params for non-tunable parameters, but those parameters must be set to the correct default value when not set with the command line.
d. Hyperparameters may also come from a JSON file, but command line settings take precedent over the file, or a warning could be raised.
-
This document applies to new references, in v1.0 and after. Existing references from v0.7 and earlier should try to adhere as well, but are not required to.
a. For example, Mini-Go was a v0.7 benchmark so it does not need to adhere to the new gradient accumulation requirement.
-
There must be an explicit list of hyperparameters which can be tuned by submitters, along with tuning rules (e.g. "any positive integer", "grid of allowed values", "one of a few choices" etc.), and allowed optimizers (if more than one). This should show up in the README and MLPerf rules doc.
-
The target accuracy threshold needs to be explicit in the README and MLPerf rules doc.
-
The code must be in a docker container, based on the official upstream docker container.
a. Use the latest public upstream container if you are preparing a new reference model.
-
All dependencies must be frozen, with version specified in requirements.txt or in the Dockerfile.
-
Proposal: reference docker image could be uploaded to dockerhub (under mlperf account) to improve reproducibility.
-
MLPerf-compliant RNG seeding must adhere to RNG rules.
-
Gradient Accumulation (to emulate large batch training on a few GPUs)
a. Basic experiments must be performed to verify that gradient accumulation closely emulates large-batch training.
b. Benchmarks that were established before v1.0, such as Mini-Go, are exempt from this.
-
Support for single-node multi-GPU training is optional, but encouraged.
a. Each GPU should get its own process (to reduce overheads) and data parallel is preferred over model parallel (or other techniques).
b. The reference may support multi-gpu validation, but this step needs to be implemented carefully (e.g. batch norm statistics should be all-reduced across workers to make sure that all replicas are evaluating the same model).
-
Support for MLPerf logging is required.
a. Initial support, at least, must be ready by reference freeze time. The final list of logged hyperparameters depends on what would be modifiable by submitters.
b. When the final list of tunable hyperparameters is ready, the final implementation of reference MLPerf logging must be made available. This likely also require changes to the compliance checker to enforce legal values of hyperparameters.
-
Execution should be deterministic if possible, following rules established in the convergence document.
-
Support for multi-node training is optional, but encouraged. This support does not have to be documented in the public README.
-
Support for mixed precision training w/ AMP is optional, but encouraged.
-
Justification for setting target accuracy must be provided.
a. Training to target must be reasonably stable. Many random seeds should reach the target with similar number of steps/epochs
b. Target should be as close to state-of-the-art as possible
-
Given a proposed target accuracy on a few (around 10 - 100) random seeds, all seeds must reach target accuracy. Steps-to-convergence variance should be as low as possible
-
Convergence curves as specified by Bounded Convergence Document must be reviewed by the Submitter's Working Group.
-
Any datasets or checkpoints needed to run the benchmark must be provided so others can run the reference for the life of that benchmark until it is retired, plus 1 year after the last usage.
-
run_and_time.sh
script - to execute the benchmark -
download_dataset.sh
script - to download dataset and do the initial data preprocessing -
verify_dataset.sh
script - to verify correctness of preprocessed data, usually checks md5 sums -
if training starts from pretrained checkpoint (or backbone):
a. script to download pretrained checkpoint (or backbone)
b. scripts to convert pretrained checkpoint (or backbone) to other popular frameworks must be available
-
brief description of problem, requirements, environment, preprocessing steps, training data, model, optimizer and target metric (description of the metric, target value, evaluation frequency, size of eval dataset). See this section from the rules
-
Three summaries are expected.
a. Section 1, Summary, of the readme should be a very high level description of the task, for a reader with zero background of machine learning.b. Following the high level description in section 1 should be a description for technical press, who have some machine learning context, so would be interested in more details.
c. Section 4, Model, should describe the problem to a machine learning practitioner, and also include a link to the paper describing that network.