Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: multi-output class weights #173

Open
adriangb opened this issue Jan 20, 2021 · 5 comments
Open

ENH: multi-output class weights #173

adriangb opened this issue Jan 20, 2021 · 5 comments

Comments

@adriangb
Copy link
Owner

Currently, we handle single-output class weights. Keras itself should have support for multi-output class weights, but that feature is broken (keras-team/keras#4735) and there doesn't seem to be any plan to fix it.

Since we have access to all of sklearn's tools, we can relatively easily implement class_weights for multiple outputs, specifically, we can implement class weights via sample weights (Keras also supports a sample weight vector per output). In fact, sklearn implements a utility to convert class weights to sample weights, and it even supports multiple outputs, but it assumes n_outputs = y.shape[1], which isn't generally true with Keras data. This can be remedied by taking sklearn's compute_sample_weights function and modifying it slightly; it shouldn't be too hard.

Do you think this is worth implementing in SciKeras @stsievert ?

@stsievert
Copy link
Collaborator

I'm not seeing the need to write any code. This sounds like an upstream implementation issue, but there is some (small) value in providing an implementation to convert between class weights to sample weights. If I were developing this library, I'd provide that implementation in the docs.

Why are multi-output models worth focusing on? I've only seen single output models, and to me, the value of SciKeras is in providing a Scikit-learn API to simple Keras models and handling serialization.

@adriangb
Copy link
Owner Author

Thank you for the quick reply.

This sounds like an upstream implementation issue

I do really agree on this point: this should be fixed in Keras itself. I also don't like the idea of using SciKeras as a crutch for Keras features.

I'd provide that implementation in the docs

That's certainly an option, it does reduce the amount of helper code in the library.

I've only seen single output models

Do you mean that you've only seen SciKeras users use single output models? I've seen SciKeras users use multi-output models (eg: dask/dask-ml#764).

@stsievert
Copy link
Collaborator

I forgot about that use case. I meant that in my day job, I've only seen single-output models. Even with that, I suspect a large majority of users only care about single-output models, and I suspect the small minority who care about multiple-output models won't care about class weights not working.

Does that sound right to you?

That's certainly an option, it does reduce the amount of helper code in the library.

It also reduces the number of tests. That can help speed up development (which can save hours of time) because it's one less feature to maintain.

@adriangb
Copy link
Owner Author

That does make sense to me. You've convinced me that it's worth putting into the docs, but not the library. Thank you for the feedback!

@adriangb
Copy link
Owner Author

The implementation looks pretty easy. The main problem is going to be that computing sample weights requires having the target y already split up into a list of inputs. Currently, we do this before any user-defined transformers, so this would not be possible. I think this could be resolved by #167 since the sample weights could then be yielded as a Dataset (like here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants