You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running large-scale benchmarks in AWS mode and finding that there are files being saved in results/backup/ that take up significant space (leading to >1 TB of files that cause the host machine to run out of disk during the benchmark run).
Where in the code are these files being specified and how can I disable them? Are they necessary for anything? I would assume not.
The problem is that each file in backup is concatenating all the results of the benchmark together into a CSV file, causing it to take N^2 space where N is the number of instances being spun up (and in my case, N > 20,000).
As an example:
-rw-rw-r-- 1 ubuntu ubuntu 108684498 Aug 24 17:34 results.20230824T173419.csv
-rw-rw-r-- 1 ubuntu ubuntu 108687827 Aug 24 17:34 results.20230824T173421.csv
-rw-rw-r-- 1 ubuntu ubuntu 108690007 Aug 24 17:34 results.20230824T173440.csv
-rw-rw-r-- 1 ubuntu ubuntu 108694343 Aug 24 17:34 results.20230824T173442.csv
-rw-rw-r-- 1 ubuntu ubuntu 108696534 Aug 24 17:34 results.20230824T173445.csv
-rw-rw-r-- 1 ubuntu ubuntu 108700835 Aug 24 17:34 results.20230824T173447.csv
-rw-rw-r-- 1 ubuntu ubuntu 108702942 Aug 24 17:34 results.20230824T173451.csv
-rw-rw-r-- 1 ubuntu ubuntu 108705127 Aug 24 17:35 results.20230824T173500.csv
-rw-rw-r-- 1 ubuntu ubuntu 108709478 Aug 24 17:35 results.20230824T173506.csv
-rw-rw-r-- 1 ubuntu ubuntu 108711667 Aug 24 17:35 results.20230824T173509.csv
-rw-rw-r-- 1 ubuntu ubuntu 108715990 Aug 24 17:35 results.20230824T173512.csv
-rw-rw-r-- 1 ubuntu ubuntu 108718171 Aug 24 17:35 results.20230824T173516.csv
-rw-rw-r-- 1 ubuntu ubuntu 108720361 Aug 24 17:35 results.20230824T173521.csv
-rw-rw-r-- 1 ubuntu ubuntu 108722544 Aug 24 17:35 results.20230824T173524.csv
-rw-rw-r-- 1 ubuntu ubuntu 108724739 Aug 24 17:35 results.20230824T173526.csv
-rw-rw-r-- 1 ubuntu ubuntu 108726929 Aug 24 17:35 results.20230824T173528.csv
-rw-rw-r-- 1 ubuntu ubuntu 108729124 Aug 24 17:35 results.20230824T173531.csv
-rw-rw-r-- 1 ubuntu ubuntu 108731314 Aug 24 17:35 results.20230824T173546.csv
-rw-rw-r-- 1 ubuntu ubuntu 108733514 Aug 24 17:35 results.20230824T173549.csv
-rw-rw-r-- 1 ubuntu ubuntu 108735701 Aug 24 17:36 results.20230824T173558.csv
-rw-rw-r-- 1 ubuntu ubuntu 108737904 Aug 24 17:36 results.20230824T173610.csv
-rw-rw-r-- 1 ubuntu ubuntu 108740102 Aug 24 17:36 results.20230824T173627.csv
-rw-rw-r-- 1 ubuntu ubuntu 108743879 Aug 24 17:36 results.20230824T173633.csv
-rw-rw-r-- 1 ubuntu ubuntu 108746069 Aug 24 17:36 results.20230824T173635.csv
-rw-rw-r-- 1 ubuntu ubuntu 108748269 Aug 24 17:36 results.20230824T173638.csv
-rw-rw-r-- 1 ubuntu ubuntu 108752563 Aug 24 17:36 results.20230824T173643.csv
There are around 10 of these files being written a minute, each one larger than the last (currently 108MB per file), meaning 1 GB of disk space is being taken up a minute.
The text was updated successfully, but these errors were encountered:
The backup should be made in amlb/results.py#L112, called from the Benchmark, if I am not mistaken. Having an option is call save with append=True should be all it takes.
In the meantime, you could disable results.global_save. Then no results/results.csv will be written at all which should also mean no backup is made.
I am running large-scale benchmarks in AWS mode and finding that there are files being saved in
results/backup/
that take up significant space (leading to >1 TB of files that cause the host machine to run out of disk during the benchmark run).Where in the code are these files being specified and how can I disable them? Are they necessary for anything? I would assume not.
The problem is that each file in
backup
is concatenating all the results of the benchmark together into a CSV file, causing it to take N^2 space where N is the number of instances being spun up (and in my case, N > 20,000).As an example:
There are around 10 of these files being written a minute, each one larger than the last (currently 108MB per file), meaning 1 GB of disk space is being taken up a minute.
The text was updated successfully, but these errors were encountered: