Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update lczero.org benchmarks #75

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 26 additions & 19 deletions content/dev/wiki/Benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,32 @@ weight: 500
wikiname: "Benchmarks"
# Warning: File is automatically generated from GitHub wiki, do not edit by hand.
---
Run go infinite from start position and abort after depth 26 and report NPS output.
Run `lc0.exe benchmark --nncache=2000000` and report nps output and binary version, please use latest release or current master.
Google docs of bench results here. Easier to maintain/prettier? https://docs.google.com/spreadsheets/d/1i4ymeCO7SH1vQ5gS7ZcBChjQaiv1dNItwXwLqvNC-r4/preview

_I put some sample ones from memory. Please put your own bench scores here in sorted NPS order if you can. If you don't know what engine type, gpu is opencl and cpu is openblas_
# Ampere Cards
| GPU model | Engine version | Neural Net size | Backend | Speed |
| ------------- | ---- | ------------- | ------------- | ------------- |
|A100 40GB | v0.28.2 | 30x384 | cuda-fp16 | 71560 nps|
|RTX 3090 | v0.28| 30x384 | cuda-fp16 | xxx nps|
|RTX 3080 | v0.29.0-dev 3/3 | 40x512 | cuda-fp16 | 15159 nps|
|RTX 3080 | v0.28.2 | 30x384 | cuda-fp16 | 32289 nps|
|RTX 3070 | v0.28.2 | 30x384 | cuda-fp16 | xxx nps|
|RTX 3060 | v0.29.0-dev 3/3 | 40x512 | cuda-fp16 | 6659 nps|
|RTX 3060 | v0.29.0-dev 3/3 | 30x384 | cuda-fp16 | 14639 nps|
# Turing Cards
| GPU model | Engine version | Neural Net size | Backend | Speed |
| ------------- | ---- | ------------- | ------------- | ------------- |
|Tesla V100 | v0.28.2 | 30x384 | cuda-fp16 | xxx nps|
|RTX 2080 | v0.28.2 | 30x384 | cuda-fp16 | xxx nps|
|RTX 2070 | v0.28.0 | 30x384 | cuda-fp16 | xxx nps|
|RTX 2060 | v0.28.2 | 30x384 | cuda-fp16 | xxx nps|
# CPUs
| CPU model + # of threads | Engine version | Neural Net size | Backend | Speed |
| ------------- | ---- | ------------- | ------------- | ------------- |
|3990x -128th | v0.28.2 | 10x128 | DNNL-BLAS | xxx nps|
|5950x -32th | v0.28.2 | 15x192 | Open-BLAS | xxx nps|
|11900k -16th | v0.28.2 | 15x192 | onednn | xxx nps|
|5600x -12th | v0.28.2 | 10x128 | onednn | xxx nps|

Google docs of bench results here. Easier to maintain/prettier? https://docs.google.com/spreadsheets/d/1lGFf6PLGmBUSMan-YP7Vul4DpRNfn6K8oeCjBILe6uA/edit#gid=0

# GPU
| GPU @ stock or OC frequency| Engine version/type | Neural Net size | Username | Speed |
| ------------- | ---- | ------------- | ------------- | ------------- |
|GTX 1060 @ stock -t 3 | v7 Linux openCL | 10x128 | | 2650 nps|
|1080 ti @ 2ghz -t 3 | v7 Windows openCL | 10x128 | | 2500 nps|
|GTX 1050 Ti @ stock | v7 Windows openCL | 20x256 | go infinite | 2300 nps|
|GTX 1050 Ti @ stock | v7 Windows openCL | 20x256 | benchmark | 1690 nps|
|GTX 470 @ stock -t 2 | v7 Windows openCL | 10x128 | | 600 nps|
# CPU
| CPU @ stock or OC frequency| Engine version/type | Neural Net size | Username | Speed |
| ------------- | ---- | ------------- | ------------- |------------- |
|i7-6800K @ 3.6GHz -t 12 | v7 Linux openblas | 10x128 | | 1010 nps|
|i7-8700 stock -t 12 | v7 Windows openblas | 10x128 | | 818 nps|
|i6700 stock -t 4 | v7 Windows intel_mkl | 10x128 | | 500 nps|
|i6700 stock -t 4 | v7 Windows openblas | 10x128 | | 320 nps|
|Ryzen 3 1200 stock -t 4 | v7 Windows openblas | 10x128 | | 300 nps|