Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osprey plot on the cli dumps entire database, ignores project_name variable #238

Open
nhstanley opened this issue Sep 18, 2017 · 3 comments
Labels

Comments

@nhstanley
Copy link

Hard to say whether this falls under "bug" or "would be nice to have", but when one runs the osprey plot config.yaml command from the command line, it just dumps the entire database, and ignores that you may have set a project_name in your config such as:

trials:
  uri: sqlite:///osprey-trials.db
  project_name: experiment_group1

I'm feeding osprey a lot of complex featurizations that are not really amenable to pipelining, so that means each one has to have its own config.yaml (unless there's a better way?). I suppose I could send each featurization to its own database instead but then that defeats the purpose of having a project_name option.

Alternatively, the plot command could just take the database as the input file and the split things out on a per-project basis. Not sure which is better.

If I have time I can look into making this happen, though I'm swamped with my projects right now and I don't know the osprey code very well. Just wanted to put this out there.

@jeiros
Copy link
Contributor

jeiros commented Feb 1, 2018

I've run into the same problem when trying out different clustering algorithms in Pipelines.
Is there a better solution than having each one on its own config file?
The dump command also does this, dumping the whole database and not just the pertinent project_name as specified in the config file.

Not sure how to know to separate the results from each run since everything is mixed in the json file.

Edit: Thinking about this, my contribution from awhile ago where the hyperparameters went into columns instead of a dedicated parameters one might complicate things on this end.

@brookehus
Copy link
Member

brookehus commented Feb 2, 2018

You can also use Osprey dump the results to a csv file (osprey dump -o csv > filename.csv), which is jankier, but you should be able to look through hyperparams and stuff by concatenating csv files (add a \n between them) and inserting a column for the run or clusterer at the beginning of each line. You could also contribute a ClusterSelector to MSMBuilder in the spirit of the FeatureSelector which was designed for the same purpose. See the FeatureSelector code here and an example using a pipeline here.

@jeiros
Copy link
Contributor

jeiros commented Feb 2, 2018

Thanks for the help @brookehus , the feature selector looks very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants