Speed up scoring #67

ChandlerSwift · 2019-12-06T05:01:05Z

With ~35 players and 9 teams, grading escalated to roughly 10 minutes with a queue of upwards of 200 tasks, and this was without anyone spamming the attacks list. We need to do something to speed up scoring. (And, how could we handle the case where someone is maliciously spamming the attacks?)

ChandlerSwift · 2019-12-06T05:02:09Z

From the other issue:

Scoring threads are disk bound. Keep the DB on the disk but run workers on a ramdisk.

ChandlerSwift · 2019-12-06T18:25:18Z

Running this on a machine with an SSD would likely fix many of these problems.

ChandlerSwift · 2020-04-29T04:40:37Z

https://github.com/UMDLARS/dtanm/wiki/Hardware-Notes adds some notes on performance. This should be expanded with better quantitative detail.

ChandlerSwift · 2020-05-04T04:37:41Z

The most recent competition, of a larger size (>20 teams), went much better. SSDs somewhat improved run time and allowed us to do several in parallel for a >5x speedup. I guess it looks like simply using a faster machine can (mostly) solve this!

ChandlerSwift · 2020-05-05T20:24:05Z

Yeah, we should have more benchmarks.

Results:

chandler@xenon:~$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 4373 seconds
  "Average score time (seconds)": 1.792,
RESULTS FOR 2 WORKERS
Elapsed: 2508 seconds
  "Average score time (seconds)": 1.989,
RESULTS FOR 3 WORKERS
Elapsed: 1776 seconds
  "Average score time (seconds)": 2.137,
RESULTS FOR 4 WORKERS
Elapsed: 1424 seconds
  "Average score time (seconds)": 2.3,
RESULTS FOR 6 WORKERS
Elapsed: 1081 seconds
  "Average score time (seconds)": 2.663,
RESULTS FOR 8 WORKERS
Elapsed: 908 seconds
  "Average score time (seconds)": 3.025,
RESULTS FOR 12 WORKERS
Elapsed: 765 seconds
  "Average score time (seconds)": 4.033,
RESULTS FOR 16 WORKERS
Elapsed: 721 seconds
  "Average score time (seconds)": 5.491,
RESULTS FOR 24 WORKERS
Elapsed: 684 seconds
  "Average score time (seconds)": 8.555,
RESULTS FOR 32 WORKERS
Elapsed: 603 seconds
  "Average score time (seconds)": 10.394,

results.txt

Generated using:

#!/bin/sh                                                                                    
                                                                                             
CURL() {                                                                                     
        curl -H 'Cookie: session=<session_cookie>' $@                      
}                                                                                            
echo === INITIAL STATS ===                                                                   
CURL localhost:5000/stats.json                                                               
echo =====================                                                                   
                                                                                             
for num_workers in 1 2 3 4 6 8 12 16 24 32; do                                               
        # clear results                                                                      
        docker exec -it dtanm_db_1 psql --dbname postgres --username=postgres -c "DELETE FROM result;"

        # scale workers
        docker-compose up -d --scale worker=$num_workers

        # re-run all tests
        CURL localhost:5000/admin/rescore_all -s >/dev/null
        start_date_human=$(date)
        start_date_unix=$(date +%s)
        # wait for scoring to be done
        queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        while [ $queue_depth -ne 0 ] && sleep 1; do
                /bin/echo -ne "\e[0K\r$queue_depth remaining"
                queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        done

        cat <<EOF
========================
RESULTS FOR $num_workers WORKERS
Started at  $start_date_human
Finished at $(date)
Elapsed: $(expr $(date +%s) - $start_date_unix) seconds
========================
$(CURL localhost:5000/stats.json -s | jq .)
========================

EOF
        docker image prune -f
        docker container prune -f
done

ChandlerSwift · 2020-05-13T18:37:37Z

@pahp @ATR2600 @jnowaczek Benchmarks finished! https://github.com/UMDLARS/dtanm/wiki/Hardware-Notes

Looks like last year's hardware was indeed pretty bad at parallelization--- 1->2 cores was an improvement, but improvements were relatively slight after that, with increased time per run offsetting benefits of parallel execution.

Is there any more data you want? Otherwise, I'm ready to close this.

ChandlerSwift mentioned this issue Dec 6, 2019

Run scoring worker threads in a ramdisk #65

Closed

ChandlerSwift closed this as completed May 4, 2020

ChandlerSwift reopened this May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up scoring #67

Speed up scoring #67

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Apr 29, 2020

ChandlerSwift commented May 4, 2020

ChandlerSwift commented May 5, 2020 •

edited

Loading

ChandlerSwift commented May 13, 2020

Speed up scoring #67

Speed up scoring #67

Comments

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Dec 6, 2019

ChandlerSwift commented Apr 29, 2020

ChandlerSwift commented May 4, 2020

ChandlerSwift commented May 5, 2020 • edited Loading

ChandlerSwift commented May 13, 2020

ChandlerSwift commented May 5, 2020 •

edited

Loading