You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the memory and cores in the database are a bit arbitrary - luckily, we have all the data to reason about better mem values (and cores should probably typically be mem/4 as discussed in #60 except in cases where tools use little memory but see significant speedup with more cores).
If we all pushed our memory usage and input sizes to a centralized database we could both visualize it (similar to how I have done it one-off in this gist) and hopefully automatically make some decisions about memory values in the shared DB.
However there are some things for consideration by people who are good at statistics:
Cutoffs for high and low memory usage (or just use 95th percentile?) since there are outliers
Cutoffs for high and low size inputs, since there is usually a lower bound on memory that does not correlate to inputs at all
Input compression - the mixture of compressed and uncompressed data makes input sizes as a ratio to memory usage kind of a lie
The current memory limit can arbitrarily cut off what would be valid successful jobs and thus skew the data, and this varies by server, although we do know what the limit was for each job
How input size affects memory usage, and of course it is rarely just input size, but also the actual data itself
How recent of jobs to consider, since newer data is typically more useful than older data
Tool versions to consider, since this can drastically effect memory usage, but we also don't necessarily want each +galaxyN version to be separate
The text was updated successfully, but these errors were encountered:
For example, we could use the data from the GRT once the project is resurrected. Last week, I tried to put together some thoughts (please feel free to share your feedback/suggestions) on the project so we could get a master's student working on the project.
@natefoo Have you seen this PR? #64
It was a preliminary attempt at this. The biggest problem so far has been the inconsistency of the data in the federation (e.g. invalid cgroup metrics). But some of these issues have since been fixed I believe, so we may be able to make a fresh pass soon.
Right now the memory and cores in the database are a bit arbitrary - luckily, we have all the data to reason about better
mem
values (andcores
should probably typically bemem/4
as discussed in #60 except in cases where tools use little memory but see significant speedup with more cores).If we all pushed our memory usage and input sizes to a centralized database we could both visualize it (similar to how I have done it one-off in this gist) and hopefully automatically make some decisions about memory values in the shared DB.
However there are some things for consideration by people who are good at statistics:
+galaxyN
version to be separateThe text was updated successfully, but these errors were encountered: