Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loss="auto" as the default loss #210

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 44 additions & 3 deletions docs/source/advanced.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
===================================
Advanced Usage of SciKeras Wrappers
===================================
==============
Advanced Usage
==============

Wrapper Classes
---------------
Expand Down Expand Up @@ -128,6 +128,43 @@ offer an easy way to compile and tune compilation parameters. Examples:
In all cases, returning an un-compiled model is equivalent to
calling ``model.compile(**compile_kwargs)`` within ``model_build_fn``.

.. _loss-selection:

Loss selection
++++++++++++++

If you do not explicitly define a loss, SciKeras attempts to find a loss
that matches the type of target (see :py:func:`sklearn.utils.multiclass.type_of_target`).

For guidance selecting losses in Keras, please see Jason Brownlee's
excellent article `How to Choose Loss Functions When Training Deep Learning Neural Networks`_
as well as `Keras Losses docs`_.

Default losses are selected as follows:

Classification
..............
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section could use some use examples, and clarification of what "output" and "encoding" mean.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, outputs refers to the number of things you are predicting (for example, you could predict just color, in which case you have 1 output, or you might predict color and is_tshirt, in which case you have 2 outputs). Encoding refers to the representation of the target data. Generally, you will see data encoded as labels ([1, 2, 3] or ["red", "green", "blue"]) or one-hot encoded. See one-hot on Wikipedia for more details.


+-----------+-----------+----------+---------------------------------+
| # outputs | # classes | encoding | loss |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this table. Let's say I have two classes, one "output," and I don't know my "encoding" (I'm not sure a naive user would know what that means). What loss is chosen?

Maybe it'd be simpler to say "KerasClassifier has loss="sparse_categorical_crossentropy" by default. It works for one dimensional labels like est.fit(X, [2, 3, 4, 5])). If you have binary labels like y=[-1, 1, -1, -1], specify binary_crossentropy. If you have one-hot encoded labels, use LOSS."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

KerasClassifier will automatically determine an appropriate loss function for binary ([0, 1, 0]/["car", "bike", "car"]) or multiclass ([1, 2, 3, 4]/["person", "car", "pear", "tree"]) one-dimensional targets. For other types of target, you must explicitly specify the loss. If your target is one-hot encoded, you probably want to use "categorical_crossentropy".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only for using loss="auto" if there are simple and easy-to-follow rules. Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule.

I almost prefer this documentation:

KerasClassifier has loss="sparse_categorical_crossentropy" by default. This assumes that the model has C outputs neurons to classify C classes. It's intended to be used like this:

def build_model():
    ...
    model.add(output_layer_C_neurons)
    return model

est = KerasClassifier(model=build_model)
est.fit(X, [0, 1, 2, 0, 1, 2])

If you have one-hot encoded targets, manually

from sklearn.datasets import OneHotEncoder
est = KerasClassifier(model=build_model, loss="categorical_crossentropy")
y = OneHotEncoder([0, 1, 2, 0, 1, 2])
y = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1]]  # or this
est.fit(X, y)

Copy link
Owner Author

@adriangb adriangb May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only for using loss="auto" if there are simple and easy-to-follow rules

I totally agree. Reading over this PR again a couple weeks after writing it, even I get confused.

Setting a fixed value of loss="sparse_categorical_crossentropy" is a really simple rule

I think we tried this before. I don't remember the conclusion of those discussions (although I can dig it up), but off the top of my head I think the biggest issue is that new users will copy an example model from a tutorial, many of which do binary classification using a single neuron, or other incompatible architectures. Another common use case is one-hot encoded targets, which loss="sparse_categorical_crossentropy" would not support.

Do you think we can just introspect the model and check if the number of neurons matches the number of classes (and that it is a single-output problem) and raise an error (or maybe a warning) to rescue users from facing whatever cryptic error TF would throw? In other words, with a good enough error message, can we support only the small subset of model architectures that work with loss="sparse_categorical_crossentropy"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the conclusion of those discussions (although I can dig it up

I recall introspecting the model to see what loss value should be used, but trying to abstract too much away from the user (and plus it got too complicated).

I think the new loss for KerasClassifier is better: it's very simple and recommend changes if common mistakes are made (eventually; see below).

introspect the model and ... raise an error (or maybe a warning)

Yeah, I had the same idea. If I were developing this library, I think I'd have loss="sparse_categorical_crossentropy" with clear documentation ("have model return one neuron for each output, likely with softmax activation"). I would catch these use cases:

  • 1 output neuron and loss != "binary_crossentropy".
  • target one-hot encoded (and tell to set loss="categorical_crossentropy"`).

I think both of these should be exceptions. If so, I'd make it clear how to clear how to adapt to BaseWrapper.

copy an example model from a tutorial, many of which do binary classification using a single neuron

I think a clear documentation note would resolve this, especially with good error catching.

keras.io examples

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #210 (comment) is at at least worth exploring (again).

I'll open a new PR to test out #210 (comment) and to avoid loosing the changes here into the git history, and also because the changes are going to be pretty unrelated.

Thank you for following up on this PR 😄

+===========+===========+==========+=================================+
| 1 | <= 2 | any | binary crossentropy |
+-----------+-----------+----------+---------------------------------+
| 1 | >=2 | labels | sparse categorical crossentropy |
+-----------+-----------+----------+---------------------------------+
| 1 | >=2 | one-hot | unsupported |
+-----------+-----------+----------+---------------------------------+
| > 1 | -- | -- | unsupported |
+-----------+-----------+----------+---------------------------------+

Note that SciKeras will not automatically infer the loss for one-hot encoded targets,
you would need to explicitly specify `loss="categorical_crossentropy"`.

Regression
..........

Regression always defaults to mean squared error.
For multi-output models, Keras will use the sum of each output's loss.

Arguments to ``model_build_fn``
-------------------------------
Expand Down Expand Up @@ -287,3 +324,7 @@ and :class:`scikeras.wrappers.KerasRegressor` respectively. To override these sc
.. _Keras Callbacks docs: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks

.. _Keras Metrics docs: https://www.tensorflow.org/api_docs/python/tf/keras/metrics

.. _Keras Losses docs: https://www.tensorflow.org/api_docs/python/tf/keras/losses

.. _How to Choose Loss Functions When Training Deep Learning Neural Networks: https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
19 changes: 14 additions & 5 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,25 @@ it on a toy classification dataset using SciKeras
model.add(keras.layers.Activation("softmax"))
return model

clf = KerasClassifier(
get_model,
loss="sparse_categorical_crossentropy",
hidden_layer_dim=100,
)
clf = KerasClassifier(get_model, hidden_layer_dim=100)

clf.fit(X, y)
y_proba = clf.predict_proba(X)


Note that SciKeras even chooses a loss function and compiles your model.
To override the default loss, simply specify a loss function:

.. code-block:: diff

-KerasClassifier(get_model, hidden_layer_dim=100)
+KerasClassifier(get_model, loss="categorical_crossentropy")

In this case, you would need to specify the loss since SciKeras
will not default to categorical crossentropy, even for one-hot
encoded targets.
See :ref:`loss-selection` for more details.

In an sklearn Pipeline
----------------------

Expand Down
28 changes: 21 additions & 7 deletions scikeras/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def _camel2snake(s: str) -> str:
return "".join(["_" + c.lower() if c.isupper() else c for c in s]).lstrip("_")


def loss_name(loss: Union[str, Loss, Callable]) -> str:
def loss_name(loss: Union[str, Loss, Callable]) -> Union[str, None]:
"""Retrieves a loss's full name (eg: "mean_squared_error").

Parameters
Expand All @@ -25,8 +25,9 @@ def loss_name(loss: Union[str, Loss, Callable]) -> str:

Returns
-------
str
String name of the loss.
Union[str, None]
String name of the loss. String inputs that do not map to a known
Keras loss function return `None`.

Notes
-----
Expand All @@ -43,6 +44,8 @@ def loss_name(loss: Union[str, Loss, Callable]) -> str:
'binary_crossentropy'
>>> loss_name(losses.binary_crossentropy)
'binary_crossentropy'
>>> loss_name("abcdefg")
None

Raises
------
Expand All @@ -56,13 +59,17 @@ def loss_name(loss: Union[str, Loss, Callable]) -> str:
"``loss`` must be a string, a function, an instance of ``tf.keras.losses.Loss``"
" or a type inheriting from ``tf.keras.losses.Loss``"
)
fn_or_cls = keras_loss_get(loss)
try:
fn_or_cls = keras_loss_get(loss)
except ValueError:
# unknown loss
return None
if isinstance(fn_or_cls, Loss):
return _camel2snake(fn_or_cls.__class__.__name__)
return fn_or_cls.__name__


def metric_name(metric: Union[str, Metric, Callable]) -> str:
def metric_name(metric: Union[str, Metric, Callable]) -> Union[str, None]:
"""Retrieves a metric's full name (eg: "mean_squared_error").

Parameters
Expand All @@ -73,8 +80,9 @@ def metric_name(metric: Union[str, Metric, Callable]) -> str:

Returns
-------
str
Union[str, None]
Full name for Keras metric. Ex: "mean_squared_error".
String inputs that do not map to a known Keras loss function return `None`.

Notes
-----
Expand All @@ -91,6 +99,8 @@ def metric_name(metric: Union[str, Metric, Callable]) -> str:
'BinaryCrossentropy'
>>> metric_name(metrics.binary_crossentropy)
'binary_crossentropy'
>>> metric_name("abcdefg")
None

Raises
------
Expand All @@ -106,7 +116,11 @@ def metric_name(metric: Union[str, Metric, Callable]) -> str:
" ``tf.keras.metrics.Metric`` or a type inheriting from"
" ``tf.keras.metrics.Metric``"
)
fn_or_cls = keras_metric_get(metric)
try:
fn_or_cls = keras_metric_get(metric)
except ValueError:
# unknown metric
return None
if isinstance(fn_or_cls, Metric):
return _camel2snake(fn_or_cls.__class__.__name__)
return fn_or_cls.__name__
2 changes: 1 addition & 1 deletion scikeras/utils/transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def fit(self, y: np.ndarray) -> "ClassifierLabelEncoder":
"multiclass-multioutput": FunctionTransformer(),
"multilabel-indicator": FunctionTransformer(),
}
if is_categorical_crossentropy(self.loss):
if target_type == "multiclass" and is_categorical_crossentropy(self.loss):
encoders["multiclass"] = make_pipeline(
TargetReshaper(),
OneHotEncoder(
Expand Down
Loading