Add basic smoke tests for topology branch #897

SylviaWhittle · 2024-09-11T12:31:56Z

This PR adds basic smoke tests for disordered tracing, node stats, ordered tracing and splining.

SylviaWhittle · 2024-09-11T12:43:28Z

Pre-commit problems list (shown below)

I believe that the files and code listed are not code that I have touched. Correct me if I've not noticed something 😄

ns-rse · 2024-09-11T13:32:12Z

I believe that the files and code listed are not code that I have touched.

Nope, none of those are in the files you've touched.

We should all be using the pre-commit configuration that is part of the distribution so that the CI checks pass (yes the target branch here isn't main but eventually it will be being merged into main and will have to pass all these checks, so its better practice to use these checks and get things right in the first instance).

Within the TopoStats directory run the following to install...

pre-commit install

It will then highlight all these problems before changes can be pushed.

ns-rse

Bunch of comments in-line, great work writing all these tests @SylviaWhittle 👍

The tests/resources/ is starting to look somewhat cluttered. I think many of the .csv files could be removed if we used pytest-regtest (some comments in-line on this).

That still leaves a lot of files though, many of the filenames carry common suffixes and its clear that many of the objects (.pkl and .npy) are related. I can think of two options to improve organisation...

Bundle similar objects into a dictionary with the keys formed from the component that distinguishes them and save as a single pickle.

example_rep_int_all_images.pkl
example_rep_int_all_images_nodestats.pkl
example_rep_int_disordered_crop_data.pkl
example_rep_int_disordered_tracing_stats.csv
example_rep_int_grainstats_additions_df.csv
example_rep_int_grainstats_additions_nodestats.csv
example_rep_int_labelled_grain_mask_thresholded.npy
example_rep_int_nodestats_branch_images.pkl
example_rep_int_nodestats_data.pkl
example_rep_int_ordered_tracing_data.pkl
example_rep_int_ordered_tracing_full_images.pkl
example_rep_int_ordered_tracing_grainstats_additions.csv
example_rep_int_splining_data.pkl
example_rep_int_splining_grainstats_additions.csv
example_rep_int_splining_molstats.csv

Would go into a dictionary with keys of...

{
"all_images": <object>
"all_images_nodestats": <object>
"disordered_crop_data": <object>
"disordered_tracing_stats": <object>
"grainstats_additions_df": <object>
"grainstats_additions_nodestats": <object>
"labelled_grain_mask_thresholded": <object>
"nodestats_branch_images": <object>
"nodestats_data": <object>
"ordered_tracing_data": <object>
"ordered_tracing_full_images": <object>
"ordered_tracing_grainstats_additions": <object>
"splining_data": <object>
"splining_grainstats_additions": <object>
"splining_molstats": <object>
}

...and that could be saved as tests/resources/example_rep_int.pkl.

Alternatively create a nested directory structure under tests/resources reflecting the common prefixes...

tests/resources/node/
tests/resources/catenanes/
tests/resources/example_rep_int/

...drop the prefixes from the filenames.

tests/tracing/conftest.py

tests/tracing/test_nodestats.py

ns-rse · 2024-09-11T13:50:25Z

tests/tracing/test_nodestats.py

-    np.testing.assert_equal(node_dict_result, nodestats_catenane_node_dict)
-    np.testing.assert_equal(image_dict_result, nodestats_catenane_image_dict)
-    np.testing.assert_array_equal(nodestats_catenane.all_connected_nodes, nodestats_catenane_all_connected_nodes)
+    # Debugging


This is presumably to update the test files when the underlying code changes?

The syrupy package which is compatible with pytest might be an alternative to this. Not used it yet, only became aware of it ar RSECon2024, but its similar to pytest-regtest I think.

tests/tracing/test_nodestats.py

ns-rse · 2024-09-11T14:10:09Z

tests/tracing/test_nodestats.py

+    # )
+
+    # Load the nodestats catenane node dict from pickle
+    with Path(RESOURCES / "nodestats_analyse_nodes_catenane_node_dict.pkl").open("rb") as f:


If the arrays and dictionaries aren't too large I'd be inclined to use the pytest-regtest approach to comparing these.

topostats/io.py

topostats/tracing/splining.py

tests/tracing/test_splining.py

ns-rse · 2024-09-11T15:17:42Z

tests/resources/example_catenanes_disordered_tracing_stats.csv

I think perhaps we should use pytest-regtest to compare the statistics that are produced rather than having to have code to update the CSV files when methods change and then read them in to pd.DataFrame() and pd.testing.assert_frame_equal().

We use this approach for other CSVs that the pipeline produces so should be ok here unless there is a specific reason for this approach?

Very happy to have pushback on this, but my thinking was trying to keep it all the same style of test with assertions and variables loaded explicitly since when the test fails, I'm finding it easier to debug using a debugger & IDE tools? I tried debugging tests that use pytest-regtest and it's rather difficult with large objects. Perhaps I went about it wrong?

I think we are anticipating quite a bit of iteration and so making it as smooth as possible to view what exactly changed and if it's valid and then to easily update the values is useful.

IIRC pytest-regtest has the excellent override method of pytest --regtest-reset which makes updating tests a dream but I always have to add code and dig to see if the change is legitimate. Do you have a good way of inspecting changes in regtests?

When I've had to update tests from pytest-regtest it is in essence a diff which is what many of the assert show (in one form or another, whether that is default Python / pytest / np.testing.assert* / pd.testing.assert*).

Diffs can be tricky to use and understand at times particularly when its a few numbers in the middle of a large CSV and perhaps Numpy/Pandas help with this but there are some alternatives and features that make it easier.

I find the specific changes within a line that Git can be configured to show really useful, one tool for this is delta but personally I use difftastic as it also understands structure of different programming languages and Git can be easily configured to work with it.

The --regtest-reset is in my view a lot quicker than having to uncomment a bunch of lines to write a CSV or pkl out.

Perhaps we should look at how syrupy compares to pytest-regtests and the manual approach?

Broadly I think it is useful to be consistent across a project though, pick one approach and stick with it, it reduces cognitive overhead and makes it easier for others to get to grips with how the repository works (this is true of more than just tests, e.g. always using pathlib rather than mixing it up with os modules).

SylviaWhittle · 2024-09-11T15:50:16Z

I believe that the files and code listed are not code that I have touched.

Nope, none of those are in the files you've touched.

We should all be using the pre-commit configuration that is part of the distribution so that the CI checks pass (yes the target branch here isn't main but eventually it will be being merged into main and will have to pass all these checks, so its better practice to use these checks and get things right in the first instance).

Within the TopoStats directory run the following to install...
pre-commit install
It will then highlight all these problems before changes can be pushed.

I do have pre-commit, I was going to tidy it up in a follow-up PR that was just focused on that, keeping this PR self-contained but am happy to fix it in this PR 👍

SylviaWhittle · 2024-09-19T14:53:42Z

DeepDiff doesn't seem to be a viable alternative currently. It cannot handle np.nan values properly.

ns-rse · 2024-09-23T10:49:14Z

tests/tracing/conftest.py

I'm not too bothered about having fixtures only being used once since they may be of use in the future.

ns-rse

Couple of in-line comments.

I might take a look at what can be shifted into pytest-regtest (or syrupy) as I feel we should make updating tests as simple as possible.

temp_create_disordered_trace_test_copy.ipynb

.../resources/tracing/disordered_tracing/rep_int_disordered_tracing_grainstats_additions_df.csv

ns-rse · 2024-09-23T12:49:02Z

Would you lie me to modify it in the other run_... functions?

Yes please, if you don't mind.

I think names should cascade through and be consistent across functions, it makes it easier to follow the flow of data through processing as it doesn't incur the mental overhead of translating one_thing that function a returns being assigned to another another_thing in function b which calls a. Its one of the reasons I think that variable names have scope

MaxGamill-Sheffield · 2024-09-24T10:39:33Z

Would you lie me to modify it in the other run_... functions?

Yes please, if you don't mind.

I think names should cascade through and be consistent across functions, it makes it easier to follow the flow of data through processing as it doesn't incur the mental overhead of translating one_thing that function a returns being assigned to another another_thing in function b which calls a. Its one of the reasons I think that variable names have scope

Done! There should now be a new commit to maxgamill-sheffield/topology` which can be pulled / rebased :)

…ct_nodes_nearest

SylviaWhittle · 2024-09-24T16:42:26Z

Would you lie me to modify it in the other run_... functions?

Yes please, if you don't mind.
I think names should cascade through and be consistent across functions, it makes it easier to follow the flow of data through processing as it doesn't incur the mental overhead of translating one_thing that function a returns being assigned to another another_thing in function b which calls a. Its one of the reasons I think that variable names have scope

Done! There should now be a new commit to maxgamill-sheffield/topology` which can be pulled / rebased :)

Rebased successfully 👍

…instats

…tats

SylviaWhittle · 2024-09-30T12:58:38Z

@ns-rse given the complexity with finding the best alternative to assert dict_almost_equal, could this be kicked down the road a little to a separate PR where I'll update all the uses of dict_almost_equal in the codebase once a good alternative is found?

SylviaWhittle · 2024-09-30T13:00:27Z

I think I've addressed all your points except the ones relating to how we assert objects are equal to each other. Apologies if I've skipped anything else, it's unintentional

ns-rse · 2024-09-30T13:31:47Z

@ns-rse given the complexity with finding the best alternative to assert dict_almost_equal, could this be kicked down the road a little to a separate PR where I'll update all the uses of dict_almost_equal in the codebase once a good alternative is found?

Yep, fine with that, just write up an issue if there isn't one already so we have it tracked in the backlog.

tests/tracing/test_splining.py

tests/tracing/test_nodestats.py

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

ns-rse

Looks good, happy to deal with checking dictionaries in a separate issue/branch and sorry for forgetting about #913 which already documents the issue.

I noticed when checking that we currently exclude ^nodestats and tracingfuncs from being checked with numpydoc-validation so we should do that in a separate issues (see #919 and #920).

SylviaWhittle requested a review from MaxGamill-Sheffield September 11, 2024 12:43

ns-rse requested changes Sep 11, 2024

View reviewed changes

This was referenced Sep 11, 2024

Adds topological features into better tracing #898

Merged

Move description of nodestats dictionary to documentation #901

Open

SylviaWhittle force-pushed the SylviaWhittle/topology_tests branch 3 times, most recently from f86f074 to 58a4b99 Compare September 18, 2024 14:03

ns-rse reviewed Sep 23, 2024

View reviewed changes

tests/tracing/conftest.py

Copy link

Collaborator

ns-rse Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too bothered about having fixtures only being used once since they may be of use in the future.

ns-rse requested changes Sep 23, 2024

View reviewed changes

temp_create_disordered_trace_test_copy.ipynb Outdated Show resolved Hide resolved

.../resources/tracing/disordered_tracing/rep_int_disordered_tracing_grainstats_additions_df.csv Outdated Show resolved Hide resolved

SylviaWhittle added 15 commits September 24, 2024 11:56

Start to test disordered_trace_grain, starting with height biasing

f4919b1

test_disordered_trace_grain: Add test case for no pruning

14101a6

test_disordered_trace_grain: Add test case for pruning small tail

730043f

test_disordered_trace_grain: Add test case for re-adding holes

e3462dc

[WIP] Unpack and create test catenane image

4801a50

Replace example catenane with multiple catenanes

b78b838

Add grain mask for spliced catenane test image

2b28188

Example catenanes: Replace grain mask with labelled grain mask

c6f796e

Add test_trace_image_disordered

fbc895c

Rename test files to not have 'result' in the filenames

5706ba2

Fix dict_almost_equal not being able to handle np.nan equal

6329d2c

Fix test_analyse_node_branches by adding singlet_branch_vectors

211da3b

Fix pair_odd_branches not present in class constructor for test_conne…

c15f390

…ct_nodes_nearest

Add test_nodestats_image

05038ce

Fix nodestats_catenane fixture pair_odd_branches argument not present

b9d8008

SylviaWhittle added 6 commits September 24, 2024 11:56

Move test files to relevant directory under tests/resources/tracing/

2e896d0

Remove old test files and clean up

06a6417

Update ordered tracing and splining tests after Max's fix

410fad0

Update tests again for latest tracing hotfix

ecbbcfb

Remove singularly used fixtures

dceb548

Remove comments that describe object structure

bcd2777

SylviaWhittle force-pushed the SylviaWhittle/topology_tests branch from 9d9075a to bcd2777 Compare September 24, 2024 10:56

Remove temp notebook

e9122e4

SylviaWhittle added 7 commits September 30, 2024 13:16

Move the debugging plotting code to a function instead, thanks @ns-rse

97160a2

Update nomenclature | disordered tracing | grainstats_additions > gra…

3b74e4d

…instats

Update nomenclature | nodestats | grainstats_additions > grainstats

7d3e91a

Update nomenclature | ordered tracing | grainstats_additions > grains…

b9761e0

…tats

Update nomenclature | splining | grainstats_additions > grainstats

3213a8d

Remove unused files

f9d63bb

Fix files removed required by tests

8b09629

ns-rse reviewed Sep 30, 2024

View reviewed changes

tests/tracing/test_splining.py Outdated Show resolved Hide resolved

Documentation linting

219790c

ns-rse reviewed Sep 30, 2024

View reviewed changes

tests/tracing/test_nodestats.py Outdated Show resolved Hide resolved

ns-rse reviewed Sep 30, 2024

View reviewed changes

tests/tracing/test_nodestats.py Outdated Show resolved Hide resolved

SylviaWhittle and others added 3 commits September 30, 2024 15:00

Better definition of resource paths

8ece357

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

Better definition of resource paths

91f74fa

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

Fix variable names

8b69efb

ns-rse approved these changes Sep 30, 2024

View reviewed changes

SylviaWhittle merged commit 263f07c into maxgamill-sheffield/topology Sep 30, 2024
2 checks passed

SylviaWhittle deleted the SylviaWhittle/topology_tests branch September 30, 2024 14:18

ns-rse mentioned this pull request Oct 1, 2024

Performance regression of Topoly with Python 3.12 #927

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic smoke tests for topology branch #897

Add basic smoke tests for topology branch #897

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 11, 2024

ns-rse commented Sep 11, 2024

ns-rse left a comment

ns-rse Sep 11, 2024

ns-rse Sep 11, 2024

ns-rse Sep 11, 2024

SylviaWhittle Sep 11, 2024

ns-rse Sep 12, 2024

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 19, 2024

ns-rse Sep 23, 2024

ns-rse left a comment

ns-rse commented Sep 23, 2024

MaxGamill-Sheffield commented Sep 24, 2024

SylviaWhittle commented Sep 24, 2024

SylviaWhittle commented Sep 30, 2024 •

edited

Loading

SylviaWhittle commented Sep 30, 2024

ns-rse commented Sep 30, 2024

ns-rse left a comment

Add basic smoke tests for topology branch #897

Add basic smoke tests for topology branch #897

Conversation

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 11, 2024

ns-rse commented Sep 11, 2024

ns-rse left a comment

Choose a reason for hiding this comment

ns-rse Sep 11, 2024

Choose a reason for hiding this comment

ns-rse Sep 11, 2024

Choose a reason for hiding this comment

ns-rse Sep 11, 2024

Choose a reason for hiding this comment

SylviaWhittle Sep 11, 2024

Choose a reason for hiding this comment

ns-rse Sep 12, 2024

Choose a reason for hiding this comment

SylviaWhittle commented Sep 11, 2024

SylviaWhittle commented Sep 19, 2024

ns-rse Sep 23, 2024

Choose a reason for hiding this comment

ns-rse left a comment

Choose a reason for hiding this comment

ns-rse commented Sep 23, 2024

MaxGamill-Sheffield commented Sep 24, 2024

SylviaWhittle commented Sep 24, 2024

SylviaWhittle commented Sep 30, 2024 • edited Loading

SylviaWhittle commented Sep 30, 2024

ns-rse commented Sep 30, 2024

ns-rse left a comment

Choose a reason for hiding this comment

SylviaWhittle commented Sep 30, 2024 •

edited

Loading