Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up scoring #67

Open
ChandlerSwift opened this issue Dec 6, 2019 · 6 comments
Open

Speed up scoring #67

ChandlerSwift opened this issue Dec 6, 2019 · 6 comments

Comments

@ChandlerSwift
Copy link
Member

With ~35 players and 9 teams, grading escalated to roughly 10 minutes with a queue of upwards of 200 tasks, and this was without anyone spamming the attacks list. We need to do something to speed up scoring. (And, how could we handle the case where someone is maliciously spamming the attacks?)

@ChandlerSwift
Copy link
Member Author

From the other issue:

Scoring threads are disk bound. Keep the DB on the disk but run workers on a ramdisk.

@ChandlerSwift
Copy link
Member Author

Running this on a machine with an SSD would likely fix many of these problems.

@ChandlerSwift
Copy link
Member Author

https://github.com/UMDLARS/dtanm/wiki/Hardware-Notes adds some notes on performance. This should be expanded with better quantitative detail.

@ChandlerSwift
Copy link
Member Author

The most recent competition, of a larger size (>20 teams), went much better. SSDs somewhat improved run time and allowed us to do several in parallel for a >5x speedup. I guess it looks like simply using a faster machine can (mostly) solve this!

@ChandlerSwift
Copy link
Member Author

ChandlerSwift commented May 5, 2020

Yeah, we should have more benchmarks.

Results:

chandler@xenon:~$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 4373 seconds
  "Average score time (seconds)": 1.792,
RESULTS FOR 2 WORKERS
Elapsed: 2508 seconds
  "Average score time (seconds)": 1.989,
RESULTS FOR 3 WORKERS
Elapsed: 1776 seconds
  "Average score time (seconds)": 2.137,
RESULTS FOR 4 WORKERS
Elapsed: 1424 seconds
  "Average score time (seconds)": 2.3,
RESULTS FOR 6 WORKERS
Elapsed: 1081 seconds
  "Average score time (seconds)": 2.663,
RESULTS FOR 8 WORKERS
Elapsed: 908 seconds
  "Average score time (seconds)": 3.025,
RESULTS FOR 12 WORKERS
Elapsed: 765 seconds
  "Average score time (seconds)": 4.033,
RESULTS FOR 16 WORKERS
Elapsed: 721 seconds
  "Average score time (seconds)": 5.491,
RESULTS FOR 24 WORKERS
Elapsed: 684 seconds
  "Average score time (seconds)": 8.555,
RESULTS FOR 32 WORKERS
Elapsed: 603 seconds
  "Average score time (seconds)": 10.394,

results.txt

Generated using:

#!/bin/sh                                                                                    
                                                                                             
CURL() {                                                                                     
        curl -H 'Cookie: session=<session_cookie>' $@                      
}                                                                                            
echo === INITIAL STATS ===                                                                   
CURL localhost:5000/stats.json                                                               
echo =====================                                                                   
                                                                                             
for num_workers in 1 2 3 4 6 8 12 16 24 32; do                                               
        # clear results                                                                      
        docker exec -it dtanm_db_1 psql --dbname postgres --username=postgres -c "DELETE FROM result;"

        # scale workers
        docker-compose up -d --scale worker=$num_workers

        # re-run all tests
        CURL localhost:5000/admin/rescore_all -s >/dev/null
        start_date_human=$(date)
        start_date_unix=$(date +%s)
        # wait for scoring to be done
        queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        while [ $queue_depth -ne 0 ] && sleep 1; do
                /bin/echo -ne "\e[0K\r$queue_depth remaining"
                queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        done

        cat <<EOF
========================
RESULTS FOR $num_workers WORKERS
Started at  $start_date_human
Finished at $(date)
Elapsed: $(expr $(date +%s) - $start_date_unix) seconds
========================
$(CURL localhost:5000/stats.json -s | jq .)
========================

EOF
        docker image prune -f
        docker container prune -f
done

@ChandlerSwift ChandlerSwift reopened this May 5, 2020
@ChandlerSwift
Copy link
Member Author

@pahp @ATR2600 @jnowaczek Benchmarks finished! https://github.com/UMDLARS/dtanm/wiki/Hardware-Notes

Looks like last year's hardware was indeed pretty bad at parallelization--- 1->2 cores was an improvement, but improvements were relatively slight after that, with increased time per run offsetting benefits of parallel execution.

Is there any more data you want? Otherwise, I'm ready to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant