-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add loss="auto" as the default loss #210
base: master
Are you sure you want to change the base?
Conversation
scikeras/wrappers.py
Outdated
try: | ||
default_val = loss_name(default_val) | ||
except ValueError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: figure out what to do here, or even refactor this check like in #208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think loss_name
returning None
says "the provided loss has no name/is not recognized."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to put that change in. I still think this check needs to be refactored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked out. Only small doc and test changes needed.
📝 Docs preview for commit dde0112 at: https://www.adriangb.com/scikeras/refs/pull/210/merge/ |
scikeras/wrappers.py
Outdated
try: | ||
default_val = loss_name(default_val) | ||
except ValueError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think loss_name
returning None
says "the provided loss has no name/is not recognized."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I incorporated most of the feedback. I think the main outstanding issue is #210 (comment)
scikeras/wrappers.py
Outdated
try: | ||
default_val = loss_name(default_val) | ||
except ValueError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked out. Only small doc and test changes needed.
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
==========================================
+ Coverage 99.71% 99.86% +0.15%
==========================================
Files 6 6
Lines 693 732 +39
==========================================
+ Hits 691 731 +40
+ Misses 2 1 -1
Continue to review full report at Codecov.
|
tests/test_simple_usage.py
Outdated
("binary_classification", True), | ||
("binary_classification_w_one_class", True), | ||
("classification_w_1d_targets", True), | ||
("classification_w_onehot_targets", False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't classification with one-hot targets a really important use case that should be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be supported, but only if the user explicitly passes the loss function. That is tested elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a change, since this was not supported by loss=None
either.
tests/test_simple_usage.py
Outdated
y = np.random.randint(0, N_CLASSES, size=(n_eg,)) | ||
est = KerasClassifier( | ||
shallow_net, | ||
model__compile=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("compile", [True, False])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth collapsing these tests. They're all very similar, and I'm having a hard time telling the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to collapse these tests. Let me know if it is clearer now.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Scott Sievert <stsievert@users.noreply.github.com>
@stsievert do you think we should move forward with this PR? |
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
==========================================
+ Coverage 99.71% 99.86% +0.15%
==========================================
Files 6 6
Lines 693 740 +47
==========================================
+ Hits 691 739 +48
+ Misses 2 1 -1
Continue to review full report at Codecov.
|
if compile_kwargs["loss"] == "auto": | ||
if len(self.model_.outputs) > 1: | ||
raise ValueError( | ||
'Only single-output models are supported with `loss="auto"`' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this agree with the documentation?
Regression always defaults to mean squared error. For multi-output models, Keras will use the sum of each output's loss.
—https://github.com/adriangb/scikeras/pull/210/files#diff-a330a0112e60c2872ba1c9bd84f85a963f9edc44a273d883fed5b59c5e8b4a98R167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think the documentation is wrong. I think we shouldn't support multi-output at all, there's nothing to say that the sum is the right way to aggregate them (although that is what Keras does by default...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SciKeras should mirror Keras as closely as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that is, I think the total loss should be the sum of outputs; that's what Keras does).
.............. | ||
|
||
+-----------+-----------+----------+---------------------------------+ | ||
| # outputs | # classes | encoding | loss | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this table. Let's say I have two classes, one "output," and I don't know my "encoding" (I'm not sure a naive user would know what that means). What loss is chosen?
Maybe it'd be simpler to say "KerasClassifier has loss="sparse_categorical_crossentropy"
by default. It works for one dimensional labels like est.fit(X, [2, 3, 4, 5])
). If you have binary labels like y=[-1, 1, -1, -1]
, specify binary_crossentropy
. If you have one-hot encoded labels, use LOSS."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
KerasClassifier will automatically determine an appropriate loss function for binary (
[0, 1, 0]
/["car", "bike", "car"]
) or multiclass ([1, 2, 3, 4]
/["person", "car", "pear", "tree"]
) one-dimensional targets. For other types of target, you must explicitly specify the loss. If your target is one-hot encoded, you probably want to use"categorical_crossentropy"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only for using loss="auto"
if there are simple and easy-to-follow rules. Setting a fixed value of loss="sparse_categorical_crossentropy"
is a really simple rule.
I almost prefer this documentation:
KerasClassifier has
loss="sparse_categorical_crossentropy"
by default. This assumes that the model has C outputs neurons to classify C classes. It's intended to be used like this:def build_model(): ... model.add(output_layer_C_neurons) return model est = KerasClassifier(model=build_model) est.fit(X, [0, 1, 2, 0, 1, 2])If you have one-hot encoded targets, manually
from sklearn.datasets import OneHotEncoder est = KerasClassifier(model=build_model, loss="categorical_crossentropy") y = OneHotEncoder([0, 1, 2, 0, 1, 2]) y = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]] # or this est.fit(X, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only for using loss="auto" if there are simple and easy-to-follow rules
I totally agree. Reading over this PR again a couple weeks after writing it, even I get confused.
Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule
I think we tried this before. I don't remember the conclusion of those discussions (although I can dig it up), but off the top of my head I think the biggest issue is that new users will copy an example model from a tutorial, many of which do binary classification using a single neuron, or other incompatible architectures. Another common use case is one-hot encoded targets, which loss="sparse_categorical_crossentropy"
would not support.
Do you think we can just introspect the model and check if the number of neurons matches the number of classes (and that it is a single-output problem) and raise an error (or maybe a warning) to rescue users from facing whatever cryptic error TF would throw? In other words, with a good enough error message, can we support only the small subset of model architectures that work with loss="sparse_categorical_crossentropy"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the conclusion of those discussions (although I can dig it up
I recall introspecting the model to see what loss value should be used, but trying to abstract too much away from the user (and plus it got too complicated).
I think the new loss for KerasClassifier
is better: it's very simple and recommend changes if common mistakes are made (eventually; see below).
introspect the model and ... raise an error (or maybe a warning)
Yeah, I had the same idea. If I were developing this library, I think I'd have loss="sparse_categorical_crossentropy"
with clear documentation ("have model return one neuron for each output, likely with softmax activation"). I would catch these use cases:
- 1 output neuron and
loss != "binary_crossentropy"
. - target one-hot encoded (and tell to set loss="categorical_crossentropy"`).
I think both of these should be exceptions. If so, I'd make it clear how to clear how to adapt to BaseWrapper.
copy an example model from a tutorial, many of which do binary classification using a single neuron
I think a clear documentation note would resolve this, especially with good error catching.
keras.io examples
- One neuron: 3D CT scans, text classification
- Dynamic choice: image classification,
- ≥ 2 neurons: simple mnist, EfficientNet, ViT, Perceiver, text w/ switch, text w/ transformer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #210 (comment) is at at least worth exploring (again).
I'll open a new PR to test out #210 (comment) and to avoid loosing the changes here into the git history, and also because the changes are going to be pretty unrelated.
Thank you for following up on this PR 😄
Default losses are selected as follows: | ||
|
||
Classification | ||
.............. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section could use some use examples, and clarification of what "output" and "encoding" mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context,
outputs
refers to the number of things you are predicting (for example, you could predict justcolor
, in which case you have 1 output, or you might predictcolor
andis_tshirt
, in which case you have 2 outputs). Encoding refers to the representation of the target data. Generally, you will see data encoded as labels ([1, 2, 3]
or["red", "green", "blue"]
) or one-hot encoded. See one-hot on Wikipedia for more details.
@stsievert a parallel proposal to #208
This implements a default loss
"auto"
for KerasClassifier and KerasRegressor. An appropriate loss function is only selected if:For KerasRegressor, it always defaults to "mse" and supports any number of outputs.
For KerasClassifier, it defaults to "binary_crossentropy" for binary targets, "sparse_categorical_crossentropy" for "multiclass" targets. Only single outputs are supported. An error is raised if there is >1 output or if a different task type not listed above is passed (eg: multilabel-indicator). An error is raised if a multiclass problem is paired with a single output neuron.
TODO: