osprey plot on the cli dumps entire database, ignores project_name variable #238

nhstanley · 2017-09-18T04:31:50Z

Hard to say whether this falls under "bug" or "would be nice to have", but when one runs the osprey plot config.yaml command from the command line, it just dumps the entire database, and ignores that you may have set a project_name in your config such as:

trials:
  uri: sqlite:///osprey-trials.db
  project_name: experiment_group1

I'm feeding osprey a lot of complex featurizations that are not really amenable to pipelining, so that means each one has to have its own config.yaml (unless there's a better way?). I suppose I could send each featurization to its own database instead but then that defeats the purpose of having a project_name option.

Alternatively, the plot command could just take the database as the input file and the split things out on a per-project basis. Not sure which is better.

If I have time I can look into making this happen, though I'm swamped with my projects right now and I don't know the osprey code very well. Just wanted to put this out there.

The text was updated successfully, but these errors were encountered:

jeiros · 2018-02-01T17:33:37Z

I've run into the same problem when trying out different clustering algorithms in Pipelines.
Is there a better solution than having each one on its own config file?
The dump command also does this, dumping the whole database and not just the pertinent project_name as specified in the config file.

Not sure how to know to separate the results from each run since everything is mixed in the json file.

Edit: Thinking about this, my contribution from awhile ago where the hyperparameters went into columns instead of a dedicated parameters one might complicate things on this end.

brookehus · 2018-02-02T03:06:58Z

You can also use Osprey dump the results to a csv file (osprey dump -o csv > filename.csv), which is jankier, but you should be able to look through hyperparams and stuff by concatenating csv files (add a \n between them) and inserting a column for the run or clusterer at the beginning of each line. You could also contribute a ClusterSelector to MSMBuilder in the spirit of the FeatureSelector which was designed for the same purpose. See the FeatureSelector code here and an example using a pipeline here.

jeiros · 2018-02-02T14:51:02Z

Thanks for the help @brookehus , the feature selector looks very useful!

cxhernandez added the bug label Jan 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osprey plot on the cli dumps entire database, ignores project_name variable #238

osprey plot on the cli dumps entire database, ignores project_name variable #238

nhstanley commented Sep 18, 2017

jeiros commented Feb 1, 2018 •

edited

Loading

brookehus commented Feb 2, 2018 •

edited

Loading

jeiros commented Feb 2, 2018

osprey plot on the cli dumps entire database, ignores project_name variable #238

osprey plot on the cli dumps entire database, ignores project_name variable #238

Comments

nhstanley commented Sep 18, 2017

jeiros commented Feb 1, 2018 • edited Loading

brookehus commented Feb 2, 2018 • edited Loading

jeiros commented Feb 2, 2018

jeiros commented Feb 1, 2018 •

edited

Loading

brookehus commented Feb 2, 2018 •

edited

Loading