Add loss="auto" as the default loss #210

adriangb · 2021-03-04T07:26:07Z

@stsievert a parallel proposal to #208

This implements a default loss "auto" for KerasClassifier and KerasRegressor. An appropriate loss function is only selected if:

The user did not provide a loss function (i.e. the default "auto" was passed).
The user did not compile the model.

For KerasRegressor, it always defaults to "mse" and supports any number of outputs.

For KerasClassifier, it defaults to "binary_crossentropy" for binary targets, "sparse_categorical_crossentropy" for "multiclass" targets. Only single outputs are supported. An error is raised if there is >1 output or if a different task type not listed above is passed (eg: multilabel-indicator). An error is raised if a multiclass problem is paired with a single output neuron.

TODO:

docs

scikeras/wrappers.py

adriangb · 2021-03-04T07:27:27Z

scikeras/wrappers.py

+                    try:
+                        default_val = loss_name(default_val)
+                    except ValueError:
+                        pass


Todo: figure out what to do here, or even refactor this check like in #208

I think loss_name returning None says "the provided loss has no name/is not recognized."

I'll try to put that change in. I still think this check needs to be refactored

This worked out. Only small doc and test changes needed.

scikeras/wrappers.py

github-actions · 2021-03-04T07:34:40Z

📝 Docs preview for commit dde0112 at: https://www.adriangb.com/scikeras/refs/pull/210/merge/

scikeras/wrappers.py

tests/mlp_models.py

scikeras/wrappers.py

stsievert · 2021-03-04T19:29:26Z

scikeras/wrappers.py

+                    try:
+                        default_val = loss_name(default_val)
+                    except ValueError:
+                        pass


I think loss_name returning None says "the provided loss has no name/is not recognized."

scikeras/wrappers.py

adriangb

I incorporated most of the feedback. I think the main outstanding issue is #210 (comment)

scikeras/wrappers.py

adriangb · 2021-03-04T23:37:31Z

scikeras/wrappers.py

+                    try:
+                        default_val = loss_name(default_val)
+                    except ValueError:
+                        pass


This worked out. Only small doc and test changes needed.

scikeras/wrappers.py

tests/mlp_models.py

codecov-io · 2021-03-05T00:04:22Z

Codecov Report

Merging #210 (ca868f5) into master (d941d96) will increase coverage by 0.15%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   99.71%   99.86%   +0.15%     
==========================================
  Files           6        6              
  Lines         693      732      +39     
==========================================
+ Hits          691      731      +40     
+ Misses          2        1       -1

Impacted Files	Coverage Δ
scikeras/utils/__init__.py	`100.00% <100.00%> (ø)`
scikeras/utils/transformers.py	`100.00% <100.00%> (ø)`
scikeras/wrappers.py	`99.75% <100.00%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d941d96...ca868f5. Read the comment docs.

poetry.lock

scikeras/wrappers.py

stsievert · 2021-03-05T02:13:10Z

tests/test_simple_usage.py

+        ("binary_classification", True),
+        ("binary_classification_w_one_class", True),
+        ("classification_w_1d_targets", True),
+        ("classification_w_onehot_targets", False),


Isn't classification with one-hot targets a really important use case that should be supported?

It will be supported, but only if the user explicitly passes the loss function. That is tested elsewhere.

This is not really a change, since this was not supported by loss=None either.

stsievert · 2021-03-05T02:15:41Z

tests/test_simple_usage.py

+        y = np.random.randint(0, N_CLASSES, size=(n_eg,))
+    est = KerasClassifier(
+        shallow_net,
+        model__compile=True,


@pytest.mark.parametrize("compile", [True, False])

?

It might be worth collapsing these tests. They're all very similar, and I'm having a hard time telling the difference.

I attempted to collapse these tests. Let me know if it is clearer now.

tests/test_simple_usage.py

scikeras/wrappers.py

Co-authored-by: Scott Sievert <stsievert@users.noreply.github.com>

into default-loss-auto

tests/test_api.py

…to" (#218)

adriangb · 2021-05-04T00:19:30Z

@stsievert do you think we should move forward with this PR?

codecov-commenter · 2021-05-04T00:27:37Z

Codecov Report

Merging #210 (dde0112) into master (1fa9341) will increase coverage by 0.15%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   99.71%   99.86%   +0.15%     
==========================================
  Files           6        6              
  Lines         693      740      +47     
==========================================
+ Hits          691      739      +48     
+ Misses          2        1       -1

Impacted Files	Coverage Δ
scikeras/utils/__init__.py	`100.00% <100.00%> (ø)`
scikeras/utils/transformers.py	`100.00% <100.00%> (ø)`
scikeras/wrappers.py	`99.75% <100.00%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fa9341...dde0112. Read the comment docs.

stsievert · 2021-05-04T16:05:49Z

scikeras/wrappers.py

+        if compile_kwargs["loss"] == "auto":
+            if len(self.model_.outputs) > 1:
+                raise ValueError(
+                    'Only single-output models are supported with `loss="auto"`'


Does this agree with the documentation?

Regression always defaults to mean squared error. For multi-output models, Keras will use the sum of each output's loss.
—https://github.com/adriangb/scikeras/pull/210/files#diff-a330a0112e60c2872ba1c9bd84f85a963f9edc44a273d883fed5b59c5e8b4a98R167

Good catch. I think the documentation is wrong. I think we shouldn't support multi-output at all, there's nothing to say that the sum is the right way to aggregate them (although that is what Keras does by default...).

I think SciKeras should mirror Keras as closely as possible.

(that is, I think the total loss should be the sum of outputs; that's what Keras does).

stsievert · 2021-05-04T16:11:47Z

docs/source/advanced.rst

+..............
+
+-----------+-----------+----------+---------------------------------+
+| # outputs | # classes | encoding | loss                            |


I'm confused by this table. Let's say I have two classes, one "output," and I don't know my "encoding" (I'm not sure a naive user would know what that means). What loss is chosen?

Maybe it'd be simpler to say "KerasClassifier has loss="sparse_categorical_crossentropy" by default. It works for one dimensional labels like est.fit(X, [2, 3, 4, 5])). If you have binary labels like y=[-1, 1, -1, -1], specify binary_crossentropy. If you have one-hot encoded labels, use LOSS."

How about:

KerasClassifier will automatically determine an appropriate loss function for binary ([0, 1, 0]/["car", "bike", "car"]) or multiclass ([1, 2, 3, 4]/["person", "car", "pear", "tree"]) one-dimensional targets. For other types of target, you must explicitly specify the loss. If your target is one-hot encoded, you probably want to use "categorical_crossentropy".

I'm only for using loss="auto" if there are simple and easy-to-follow rules. Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule.

I almost prefer this documentation:

KerasClassifier has loss="sparse_categorical_crossentropy" by default. This assumes that the model has C outputs neurons to classify C classes. It's intended to be used like this:

def build_model(): ... model.add(output_layer_C_neurons) return model est = KerasClassifier(model=build_model) est.fit(X, [0, 1, 2, 0, 1, 2])

If you have one-hot encoded targets, manually

from sklearn.datasets import OneHotEncoder est = KerasClassifier(model=build_model, loss="categorical_crossentropy") y = OneHotEncoder([0, 1, 2, 0, 1, 2]) y = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]] # or this est.fit(X, y)

I'm only for using loss="auto" if there are simple and easy-to-follow rules

I totally agree. Reading over this PR again a couple weeks after writing it, even I get confused.

Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule

I think we tried this before. I don't remember the conclusion of those discussions (although I can dig it up), but off the top of my head I think the biggest issue is that new users will copy an example model from a tutorial, many of which do binary classification using a single neuron, or other incompatible architectures. Another common use case is one-hot encoded targets, which loss="sparse_categorical_crossentropy" would not support.

Do you think we can just introspect the model and check if the number of neurons matches the number of classes (and that it is a single-output problem) and raise an error (or maybe a warning) to rescue users from facing whatever cryptic error TF would throw? In other words, with a good enough error message, can we support only the small subset of model architectures that work with loss="sparse_categorical_crossentropy"?

I don't remember the conclusion of those discussions (although I can dig it up

I recall introspecting the model to see what loss value should be used, but trying to abstract too much away from the user (and plus it got too complicated).

I think the new loss for KerasClassifier is better: it's very simple and recommend changes if common mistakes are made (eventually; see below).

introspect the model and ... raise an error (or maybe a warning)

Yeah, I had the same idea. If I were developing this library, I think I'd have loss="sparse_categorical_crossentropy" with clear documentation ("have model return one neuron for each output, likely with softmax activation"). I would catch these use cases:

1 output neuron and loss != "binary_crossentropy".

target one-hot encoded (and tell to set loss="categorical_crossentropy"`).

I think both of these should be exceptions. If so, I'd make it clear how to clear how to adapt to BaseWrapper.

copy an example model from a tutorial, many of which do binary classification using a single neuron

I think a clear documentation note would resolve this, especially with good error catching.

keras.io examples

One neuron: 3D CT scans, text classification

Dynamic choice: image classification,

≥ 2 neurons: simple mnist, EfficientNet, ViT, Perceiver, text w/ switch, text w/ transformer

I think #210 (comment) is at at least worth exploring (again).

I'll open a new PR to test out #210 (comment) and to avoid loosing the changes here into the git history, and also because the changes are going to be pretty unrelated.

Thank you for following up on this PR 😄

stsievert · 2021-05-04T16:12:49Z

docs/source/advanced.rst

+Default losses are selected as follows:
+
+Classification
+..............


I think this section could use some use examples, and clarification of what "output" and "encoding" mean.

In this context, outputs refers to the number of things you are predicting (for example, you could predict just color, in which case you have 1 output, or you might predict color and is_tshirt, in which case you have 2 outputs). Encoding refers to the representation of the target data. Generally, you will see data encoded as labels ([1, 2, 3] or ["red", "green", "blue"]) or one-hot encoded. See one-hot on Wikipedia for more details.

attempt at default losses

6f22ebf

adriangb commented Mar 4, 2021

View reviewed changes

scikeras/wrappers.py Show resolved Hide resolved

adriangb commented Mar 4, 2021

View reviewed changes

scikeras/wrappers.py Show resolved Hide resolved

adriangb commented Mar 4, 2021

View reviewed changes

scikeras/wrappers.py Outdated Show resolved Hide resolved

support binary

426e18a

adriangb mentioned this pull request Mar 4, 2021

ENH: Add default losses to KerasClassifier and KerasRegressor #208

Open

stsievert reviewed Mar 4, 2021

View reviewed changes

adriangb added 4 commits March 4, 2021 15:14

add test_simple_usage.py

3d775df

call super()

63dfd27

return None for unknown loss/metric

1214c1a

pr suggestion

885e2a5

adriangb commented Mar 4, 2021

View reviewed changes

stsievert mentioned this pull request Mar 5, 2021

DOC: Better type hints #211

Open

stsievert reviewed Mar 5, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

stsievert reviewed Mar 5, 2021

View reviewed changes

scikeras/wrappers.py Outdated Show resolved Hide resolved

scikeras/wrappers.py Outdated Show resolved Hide resolved

stsievert mentioned this pull request Mar 5, 2021

Extra packages listed in dependencies #212

Closed

Update scikeras/wrappers.py

c2ad938

Co-authored-by: Scott Sievert <stsievert@users.noreply.github.com>

adriangb mentioned this pull request Mar 5, 2021

Delete poetry.lock #214

Merged

adriangb added 6 commits March 5, 2021 00:15

Merge branch 'master' into default-loss-auto

8ab75e1

Merge remote-tracking branch 'origin/master' into default-loss-auto

c386ac7

Merge branch 'default-loss-auto' of https://github.com/adriangb/scikeras

c8419f1

into default-loss-auto

better error

89a7b22

better tests

d8bcb95

Merge branch 'master' into default-loss-auto

0d8c2c3

adriangb commented Mar 6, 2021

View reviewed changes

tests/test_api.py Show resolved Hide resolved

add test file

c7b567f

add test for number of output units

ca868f5

adriangb mentioned this pull request Mar 8, 2021

MAINT: better error message about one-hot encoded targets w/ loss="auto" #218

Merged

stsievert and others added 3 commits March 31, 2021 14:43

MAINT: better error message about one-hot encoded targets w/ loss="au…

8d50bf9

…to" (#218)

fix python version upsie from merge/testing

66e1958

Merge branch 'master' into default-loss-auto

6e87dd9

adriangb changed the title ~~attempt at default losses~~ Add loss=auto as the default loss Mar 31, 2021

adriangb changed the title ~~Add loss=auto as the default loss~~ Add loss="auto" as the default loss Mar 31, 2021

Merge branch 'master' into default-loss-auto

dde0112

stsievert reviewed May 4, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add loss="auto" as the default loss #210

Add loss="auto" as the default loss #210

adriangb commented Mar 4, 2021 •

edited

Loading

adriangb Mar 4, 2021

stsievert Mar 4, 2021

adriangb Mar 4, 2021

adriangb Mar 4, 2021

github-actions bot commented Mar 4, 2021 •

edited

Loading

stsievert Mar 4, 2021

adriangb left a comment

adriangb Mar 4, 2021

codecov-io commented Mar 5, 2021 •

edited

Loading

stsievert Mar 5, 2021

adriangb Mar 5, 2021

adriangb Mar 5, 2021

stsievert Mar 5, 2021

stsievert Mar 5, 2021

adriangb Mar 6, 2021

This comment has been minimized.

adriangb commented May 4, 2021

codecov-commenter commented May 4, 2021 •

edited

Loading

stsievert May 4, 2021

adriangb May 4, 2021

stsievert May 4, 2021

stsievert May 7, 2021

stsievert May 4, 2021

adriangb May 4, 2021

stsievert May 4, 2021

adriangb May 6, 2021 •

edited

Loading

stsievert May 7, 2021

adriangb May 7, 2021

stsievert May 4, 2021

adriangb May 4, 2021

Add loss="auto" as the default loss #210

Are you sure you want to change the base?

Add loss="auto" as the default loss #210

Conversation

adriangb commented Mar 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 4, 2021 • edited Loading

Choose a reason for hiding this comment

adriangb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 5, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

adriangb commented May 4, 2021

codecov-commenter commented May 4, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb May 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Mar 4, 2021 •

edited

Loading

github-actions bot commented Mar 4, 2021 •

edited

Loading

codecov-io commented Mar 5, 2021 •

edited

Loading

codecov-commenter commented May 4, 2021 •

edited

Loading

adriangb May 6, 2021 •

edited

Loading