ENH: Check input dimensions against the initialized model_ #143

adriangb · 2020-11-29T19:25:56Z

Closes #106

codecov-io · 2020-11-29T23:54:55Z

Codecov Report

Merging #143 (56918eb) into master (e419d8e) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   99.52%   99.53%   +0.01%     
==========================================
  Files           5        5              
  Lines         627      646      +19     
==========================================
+ Hits          624      643      +19     
  Misses          3        3

Impacted Files	Coverage Δ
scikeras/_utils.py	`100.00% <100.00%> (ø)`
scikeras/wrappers.py	`99.22% <100.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e419d8e...56918eb. Read the comment docs.

adriangb · 2020-11-30T02:41:17Z

@stsievert tagging you to take a look whenever you have a chance. This is addressing an issue you had opened (#106)

stsievert · 2020-11-30T03:20:49Z

scikeras/wrappers.py

+        # generic transformers just to make them SciKeras compatible
+        n_out_expect = getattr(self, "n_outputs_expected_", None)
+        if n_out_expect:
+            if n_out_expect != len(self.model_.outputs):


Style nit: I'd collapse this if-statement:

n_out_expect = getattr(self, "n_outputs_expected_", None) if n_out_expect and n_out_expect != len(self.model_.outputs): ...

Yep I'll implement that.

Do you think we should redefine n_outputs_expected_: int -> expected_output_shape_: Union[List[int], Dict[str, int]] so that each element/key can map to the expected output shape for each output?

Do you think we should redefine n_outputs_expected_: int -> expected_output_shape_: Union[List[int], Dict[str, int]] so that each element/key can map to the expected output shape for each output?

I don't think so. An integer suffices for basic use cases (regressors and classifiers). I can only see output shape being relevant for autoencoders.

I'm thinking of multi-input or multi-output models (as to why it would be a list/dict).

As to why we'd need the expected output shape specifically, for classifiers, the output shape depends on the classes and type of problem (i.e. Dense(1, activation="sigmoid") and Dense(2, activation="softmax") are generally equivalent), which is why we can't just check the shape of X like we do with the input layer.

I'm thinking of multi-input or multi-output models (as to why it would be a list/dict).

Maybe that's not a bad idea. I think the type of n_outputs_expected should be Union[int, List[int], Dict[str, int]]. That supports more complex models and simple user-defined transformers.

In this function that will mean something like this:

n_out = getattr(self, "n_outputs_expected", None) if isinstance(n_out, int): n_out = [n_out] if n_out and n_out != len(self.model_.outputs): ...

Another alternative is of course to revert to c7e6723 and try to continues this conversation in it's own issue/PR.

I think I like checking the model better. But why is n_outputs_expected_ necessary?

Another alternative is of course to revert to c7e6723

Do you mean delete the release from PyPI/GitHub? If I were handling releases, I think I'd rather delay any checks on n_outputs_expected_ to a later release.

It's not required. There's other (perhaps better) ways to achieve the same thing.

That commit is in this branch. I was suggesting we revert to that commit in this branch and table this discussion for another issue/PR since the rest of the changes here stand on their own.

That commit is in this branch. I was suggesting we revert to that commit in this branch and table this discussion for another issue/PR since the rest of the changes here stand on their own.

Oh, I see. I think reverting and a separate issue is a good idea.

It's not required. There's other (perhaps better) ways to achieve the same thing.

I think those (perhaps better) ideas are worth exploring.

Sounds good. I reverted to c7e6723 and implemented the collapsed logic in #143 (comment). I'll resolve this thread and we can continue the discussion on the rest of the PR. I also opened #148 to track the discussion surrounding output validation. Thank you for the feedback thus far!

adriangb added 3 commits November 29, 2020 13:25

Initial stab at validating input dimensions against the model.

1346421

Make a single parameterized test

85e519d

remove accidental imports

6b60481

adriangb added 5 commits November 29, 2020 19:45

begin to reconcile input and target checks

ac661bf

Remove duplicate code

f44e94d

Clarify docstring, remove duplicate function

4b3de8e

remove accidental import

5cb9f3a

allow _windows_upcast_ints to operate on dicts

a788190

adriangb marked this pull request as ready for review November 30, 2020 02:40

stsievert reviewed Nov 30, 2020

View reviewed changes

adriangb added 8 commits November 30, 2020 23:42

Merge branch 'master' into check-input-dimensions

431878e

Merge branch 'master' into check-input-dimensions

d5a7874

Merge branch 'master' into check-input-dimensions

c7e6723

Allow n_outputs_expected_ to be a dict/list

80d762e

fix imports

ae82e77

Draft error

c5d3ff8

Revert to c7e6723

89d673c

Collapse if statements

8e92ea9

adriangb mentioned this pull request Dec 8, 2020

ENH: Validate output units and outputs in multi-output models #148

Open

adriangb added 5 commits December 8, 2020 11:51

use Mapping in type hint

a8528ec

Merge branch 'master' into check-input-dimensions

4acb0fb

Merge branch 'master' into check-input-dimensions

8529f38

Merge branch 'master' into check-input-dimensions

c29d677

Merge branch 'master' into check-input-dimensions

56918eb

adriangb mentioned this pull request Dec 29, 2020

TST: compare to sklearn.neural_network #155

Merged

adriangb mentioned this pull request Feb 26, 2021

Sequential model doesn't have outputs #207

Open

adriangb mentioned this pull request Jun 21, 2021

RFC: Composable input/output pipeline #234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Check input dimensions against the initialized model_ #143

ENH: Check input dimensions against the initialized model_ #143

adriangb commented Nov 29, 2020

codecov-io commented Nov 29, 2020 •

edited

Loading

adriangb commented Nov 30, 2020

stsievert Nov 30, 2020

adriangb Nov 30, 2020

stsievert Nov 30, 2020

adriangb Nov 30, 2020 •

edited

Loading

stsievert Dec 6, 2020

adriangb Dec 8, 2020

stsievert Dec 8, 2020

adriangb Dec 8, 2020

stsievert Dec 8, 2020

adriangb Dec 8, 2020

ENH: Check input dimensions against the initialized model_ #143

Are you sure you want to change the base?

ENH: Check input dimensions against the initialized model_ #143

Conversation

adriangb commented Nov 29, 2020

codecov-io commented Nov 29, 2020 • edited Loading

Codecov Report

adriangb commented Nov 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb Nov 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Nov 29, 2020 •

edited

Loading

adriangb Nov 30, 2020 •

edited

Loading