Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loss="auto" as the default loss #210

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

adriangb
Copy link
Owner

@adriangb adriangb commented Mar 4, 2021

@stsievert a parallel proposal to #208

This implements a default loss "auto" for KerasClassifier and KerasRegressor. An appropriate loss function is only selected if:

  1. The user did not provide a loss function (i.e. the default "auto" was passed).
  2. The user did not compile the model.

For KerasRegressor, it always defaults to "mse" and supports any number of outputs.

For KerasClassifier, it defaults to "binary_crossentropy" for binary targets, "sparse_categorical_crossentropy" for "multiclass" targets. Only single outputs are supported. An error is raised if there is >1 output or if a different task type not listed above is passed (eg: multilabel-indicator). An error is raised if a multiclass problem is paired with a single output neuron.

TODO:

  • docs

Comment on lines 568 to 571
try:
default_val = loss_name(default_val)
except ValueError:
pass
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: figure out what to do here, or even refactor this check like in #208

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think loss_name returning None says "the provided loss has no name/is not recognized."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to put that change in. I still think this check needs to be refactored

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked out. Only small doc and test changes needed.

scikeras/wrappers.py Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Mar 4, 2021

📝 Docs preview for commit dde0112 at: https://www.adriangb.com/scikeras/refs/pull/210/merge/

scikeras/wrappers.py Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
tests/mlp_models.py Outdated Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
Comment on lines 568 to 571
try:
default_val = loss_name(default_val)
except ValueError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think loss_name returning None says "the provided loss has no name/is not recognized."

scikeras/wrappers.py Show resolved Hide resolved
Copy link
Owner Author

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated most of the feedback. I think the main outstanding issue is #210 (comment)

scikeras/wrappers.py Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
Comment on lines 568 to 571
try:
default_val = loss_name(default_val)
except ValueError:
pass
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked out. Only small doc and test changes needed.

scikeras/wrappers.py Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
tests/mlp_models.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 5, 2021

Codecov Report

Merging #210 (ca868f5) into master (d941d96) will increase coverage by 0.15%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   99.71%   99.86%   +0.15%     
==========================================
  Files           6        6              
  Lines         693      732      +39     
==========================================
+ Hits          691      731      +40     
+ Misses          2        1       -1     
Impacted Files Coverage Δ
scikeras/utils/__init__.py 100.00% <100.00%> (ø)
scikeras/utils/transformers.py 100.00% <100.00%> (ø)
scikeras/wrappers.py 99.75% <100.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d941d96...ca868f5. Read the comment docs.

poetry.lock Outdated Show resolved Hide resolved
poetry.lock Outdated Show resolved Hide resolved
scikeras/wrappers.py Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
("binary_classification", True),
("binary_classification_w_one_class", True),
("classification_w_1d_targets", True),
("classification_w_onehot_targets", False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't classification with one-hot targets a really important use case that should be supported?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be supported, but only if the user explicitly passes the loss function. That is tested elsewhere.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a change, since this was not supported by loss=None either.

y = np.random.randint(0, N_CLASSES, size=(n_eg,))
est = KerasClassifier(
shallow_net,
model__compile=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.parametrize("compile", [True, False])

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth collapsing these tests. They're all very similar, and I'm having a hard time telling the difference.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to collapse these tests. Let me know if it is clearer now.

tests/test_simple_usage.py Outdated Show resolved Hide resolved
@adriangb

This comment has been minimized.

scikeras/wrappers.py Outdated Show resolved Hide resolved
scikeras/wrappers.py Outdated Show resolved Hide resolved
Co-authored-by: Scott Sievert <stsievert@users.noreply.github.com>
@adriangb adriangb mentioned this pull request Mar 5, 2021
@adriangb adriangb changed the title attempt at default losses Add loss=auto as the default loss Mar 31, 2021
@adriangb adriangb changed the title Add loss=auto as the default loss Add loss="auto" as the default loss Mar 31, 2021
@adriangb
Copy link
Owner Author

adriangb commented May 4, 2021

@stsievert do you think we should move forward with this PR?

@codecov-commenter
Copy link

codecov-commenter commented May 4, 2021

Codecov Report

Merging #210 (dde0112) into master (1fa9341) will increase coverage by 0.15%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   99.71%   99.86%   +0.15%     
==========================================
  Files           6        6              
  Lines         693      740      +47     
==========================================
+ Hits          691      739      +48     
+ Misses          2        1       -1     
Impacted Files Coverage Δ
scikeras/utils/__init__.py 100.00% <100.00%> (ø)
scikeras/utils/transformers.py 100.00% <100.00%> (ø)
scikeras/wrappers.py 99.75% <100.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fa9341...dde0112. Read the comment docs.

if compile_kwargs["loss"] == "auto":
if len(self.model_.outputs) > 1:
raise ValueError(
'Only single-output models are supported with `loss="auto"`'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this agree with the documentation?

Regression always defaults to mean squared error. For multi-output models, Keras will use the sum of each output's loss.
https://github.com/adriangb/scikeras/pull/210/files#diff-a330a0112e60c2872ba1c9bd84f85a963f9edc44a273d883fed5b59c5e8b4a98R167

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think the documentation is wrong. I think we shouldn't support multi-output at all, there's nothing to say that the sum is the right way to aggregate them (although that is what Keras does by default...).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SciKeras should mirror Keras as closely as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that is, I think the total loss should be the sum of outputs; that's what Keras does).

..............

+-----------+-----------+----------+---------------------------------+
| # outputs | # classes | encoding | loss |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this table. Let's say I have two classes, one "output," and I don't know my "encoding" (I'm not sure a naive user would know what that means). What loss is chosen?

Maybe it'd be simpler to say "KerasClassifier has loss="sparse_categorical_crossentropy" by default. It works for one dimensional labels like est.fit(X, [2, 3, 4, 5])). If you have binary labels like y=[-1, 1, -1, -1], specify binary_crossentropy. If you have one-hot encoded labels, use LOSS."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

KerasClassifier will automatically determine an appropriate loss function for binary ([0, 1, 0]/["car", "bike", "car"]) or multiclass ([1, 2, 3, 4]/["person", "car", "pear", "tree"]) one-dimensional targets. For other types of target, you must explicitly specify the loss. If your target is one-hot encoded, you probably want to use "categorical_crossentropy".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only for using loss="auto" if there are simple and easy-to-follow rules. Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule.

I almost prefer this documentation:

KerasClassifier has loss="sparse_categorical_crossentropy" by default. This assumes that the model has C outputs neurons to classify C classes. It's intended to be used like this:

def build_model():
    ...
    model.add(output_layer_C_neurons)
    return model

est = KerasClassifier(model=build_model)
est.fit(X, [0, 1, 2, 0, 1, 2])

If you have one-hot encoded targets, manually

from sklearn.datasets import OneHotEncoder
est = KerasClassifier(model=build_model, loss="categorical_crossentropy")
y = OneHotEncoder([0, 1, 2, 0, 1, 2])
y = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]]  # or this
est.fit(X, y)

Copy link
Owner Author

@adriangb adriangb May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only for using loss="auto" if there are simple and easy-to-follow rules

I totally agree. Reading over this PR again a couple weeks after writing it, even I get confused.

Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule

I think we tried this before. I don't remember the conclusion of those discussions (although I can dig it up), but off the top of my head I think the biggest issue is that new users will copy an example model from a tutorial, many of which do binary classification using a single neuron, or other incompatible architectures. Another common use case is one-hot encoded targets, which loss="sparse_categorical_crossentropy" would not support.

Do you think we can just introspect the model and check if the number of neurons matches the number of classes (and that it is a single-output problem) and raise an error (or maybe a warning) to rescue users from facing whatever cryptic error TF would throw? In other words, with a good enough error message, can we support only the small subset of model architectures that work with loss="sparse_categorical_crossentropy"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the conclusion of those discussions (although I can dig it up

I recall introspecting the model to see what loss value should be used, but trying to abstract too much away from the user (and plus it got too complicated).

I think the new loss for KerasClassifier is better: it's very simple and recommend changes if common mistakes are made (eventually; see below).

introspect the model and ... raise an error (or maybe a warning)

Yeah, I had the same idea. If I were developing this library, I think I'd have loss="sparse_categorical_crossentropy" with clear documentation ("have model return one neuron for each output, likely with softmax activation"). I would catch these use cases:

  • 1 output neuron and loss != "binary_crossentropy".
  • target one-hot encoded (and tell to set loss="categorical_crossentropy"`).

I think both of these should be exceptions. If so, I'd make it clear how to clear how to adapt to BaseWrapper.

copy an example model from a tutorial, many of which do binary classification using a single neuron

I think a clear documentation note would resolve this, especially with good error catching.

keras.io examples

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #210 (comment) is at at least worth exploring (again).

I'll open a new PR to test out #210 (comment) and to avoid loosing the changes here into the git history, and also because the changes are going to be pretty unrelated.

Thank you for following up on this PR 😄

Default losses are selected as follows:

Classification
..............
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could use some use examples, and clarification of what "output" and "encoding" mean.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, outputs refers to the number of things you are predicting (for example, you could predict just color, in which case you have 1 output, or you might predict color and is_tshirt, in which case you have 2 outputs). Encoding refers to the representation of the target data. Generally, you will see data encoded as labels ([1, 2, 3] or ["red", "green", "blue"]) or one-hot encoded. See one-hot on Wikipedia for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants