Adding the Better Tracing Suite into Main #932

MaxGamill-Sheffield · 2024-10-07T15:47:05Z

Closes #800

This aims to add features which split out the DNATracing pipeline into smaller parts (disordered tracing, ordered tracing and splining) while adding individual analyses in each of these to be more modular. It also adds topological functions and analyses into ordered tracing, facilitating the processing of catenated molecules as separate objects, and includes a new module to handle and analyse crossings of DNA segments.

….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining

…/TopoStats into SylviaWhittle/800-splining-tests

…nicircle.spm is still processed

….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining

…her grains set to 0

…first iteration

…sts)

SylviaWhittle · 2024-10-11T19:59:40Z

Okay I give up

if I use a different backend to appease windows, it fails on the other OSs

…b backend)" This reverts commit 07a4d96.

SylviaWhittle · 2024-10-11T20:00:50Z

Reverted commit.

The tests pass on everything except Windows python 3.9 so we just ditch that support please?

ns-rse · 2024-10-14T10:33:13Z

topostats/tracing/ordered_tracing.py

@@ -888,6 +889,8 @@ def ordered_tracing_image(
        Results containing the ordered_trace_data (coordinates), any grain-level metrics to be added to the grains
        dataframe, a dataframe of molecule statistics and a dictionary of diagnostic images.
    """
+    topoly_version = importlib.metadata.version("topoly")
+    print(f"Topoly version: {topoly_version}")


Will this be permanent?

If so LOGGER.debug() might be appropriate and we could set up the tests to use a debug level in logging.

I will also address #896 now as its a small change and means you can SSH into GitHub runners when these fail and find out such information directly.

Nah not permanent, just desperate debugging to see what version was being used

SylviaWhittle · 2024-10-14T11:39:21Z

@ns-rse Do you have any thoughts on this? Windows 3.9 fails due to tk issues, I tried using a different matplotlib backend but that caused Ubuntu to fail, and I'm scared of adding platform-specific code (backends for different platforms)

3.10 fails due to data type issues but I need to re-look at that as I've forgotten what I checked on Friday

ns-rse · 2024-10-14T11:54:39Z

The tests pass on everything except Windows python 3.9 so we just ditch that support please?

They also failed on Windows 3.10 with the following...

5830 FAILED tests/tracing/test_disordered_tracing.py::test_trace_image_disordered[catenane] - AssertionError: Attributes of DataFrame.iloc[:, 3] (column name="branch_type") are different
5831
5832 Attribute "dtype" are different
5833 [left]:  int32
5834 [right]: int64
5835 !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
5836 ====== 1 failed, 560 passed, 4 skipped, 2 warnings in 215.91s (0:03:35) =======
5837 Error: Process completed with exit code 1.

Could add in check_dtype = False to the pd.testing.assert_fames_equal() but this could be a slippery slope. Perhaps a better solution is to downcast the dtype of columns (I can't see we'll ever need int64 for the branch_type variable).

Thus after data frames are concatenated on line 344 of disordered_tracing.py we could...

disordered_tracing_stats = pd.concat((disordered_tracing_stats, skan_df))
disordered_tracing_stats = disordered_tracing_stats.astype({"branch_type": 'int32'})

However, this will also require that the same step is done when reading the CSV using pd.read_csv() and so the test itself will need to have a dictionary of the column names and dtype that they should be read as.

This creates a bit more work in my view and seems like the sort of situation that pytest-regtest was designed to handle. ⚖️

SylviaWhittle · 2024-10-14T11:58:32Z

The tests pass on everything except Windows python 3.9 so we just ditch that support please?

They also failed on Windows 3.10 with the following...
5830 FAILED tests/tracing/test_disordered_tracing.py::test_trace_image_disordered[catenane] - AssertionError: Attributes of DataFrame.iloc[:, 3] (column name="branch_type") are different
5831
5832 Attribute "dtype" are different
5833 [left]:  int32
5834 [right]: int64
5835 !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
5836 ====== 1 failed, 560 passed, 4 skipped, 2 warnings in 215.91s (0:03:35) =======
5837 Error: Process completed with exit code 1.
Could add in check_dtype = False to the pd.testing.assert_fames_equal() but this could be a slippery slope. Perhaps a better solution is to downcast the dtype of columns (I can't see we'll ever need int64 for the branch_type variable).

Thus after data frames are concatenated on line 344 of disordered_tracing.py we could...
disordered_tracing_stats = pd.concat((disordered_tracing_stats, skan_df))
disordered_tracing_stats = disordered_tracing_stats.astype({"branch_type": 'int32'})
However, this will also require that the same step is done when reading the CSV using pd.read_csv() and so the test itself will need to have a dictionary of the column names and dtype that they should be read as.

This creates a bit more work in my view and seems like the sort of situation that pytest-regtest was designed to handle. ⚖️

Yeah I will fix the type issue, by forcing the dtype as the data goes into the dictionary.

Do you have any thoughts / ideas about the matplotlib tk / tcl errors that seem to be failing the rest of the tests the majority of the time? I've tried a different backend but this breaks ubuntu :(

ns-rse · 2024-10-14T12:04:46Z

Do you have any thoughts / ideas about the matplotlib tk / tcl errors that seem to be failing the rest of the tests the majority of the time? I've tried a different backend but this breaks ubuntu :(

I think I've seen these quite a few times and from memory its when setting up/installing packages that things fail. I typically just have to manually re-run the tests.

Python 3.9 is the minimum version of Python and I think for the sake of not having to explain to people how to install and upgrade Python we should support the minimum version.

topostats/tracing/disordered_tracing.py

MaxGamill-Sheffield · 2024-10-14T14:59:27Z

I think I've seen these quite a few times and from memory its when setting up/installing packages that things fail. I typically just have to manually re-run the tests.

@ns-rse, @SylviaWhittle says this is on every re-run hence why she can't just push through. Is there a way to go into the test machines to manually ensure the package installs work? If not and as this is the highest priority, is dropping 3.9 support alright?

ns-rse · 2024-10-14T15:40:07Z

@MaxGamill-Sheffield The tests are failing occasionally because of tcl/tk issues (and this is something I've observed many times over the last year or two). Of the two that are currently showing as failed in this Pull Request both have failed on Windows Python 3.9 and 3.10 because of a dtype issue on reading the CSV file into a Pandas Dataframe and testing equality.

Dropping Python 3.9 support won't address this particular issue as it is a Windows issue.

A quick search led me to this thread which suggests that, unsurprisingly, the underlying problem encountered here is with Numpy (Pandas Dataframes are all essentially Numpy arrays).

ns-rse · 2024-10-14T15:42:29Z

@MaxGamill-Sheffield P.S. - The way to go into the GitHub runners is via the tmate action I created a Pull Request for and have just merged. But you can always look at the log file and see what has been installed under the Python setup section (needs expanding).

…point errors on windows (we think)

ns-rse · 2024-10-14T18:12:53Z

@MaxGamill-Sheffield Commit e59d1bd35059298c8f0794ab4f36d63fc3d9bc0c may have had some unintended consequences as other tests are now failing.

Somehow I see the same errors on the PR #949 even though e59d1bd35 was made after I branched as it doesn't show in the log for that branch as I branched from 0ad6ed3b8 * origin/maxgamill-sheffield/800-better-tracing Revert "Try to appease the worse operating system... (pyqt5 matplotlib backend)" where the test suite passed on both GNU/Linux and OSX.

The failures on Windows do seem to be due to dtype errors, at least those that I've had eyes on here on GitHub's Continuous Integration. If there are others that have cropped up I'd suggest making the changes on a branch to merge them into maxgamill-sheffield/800-better-tracing as this branch is itself currently closing in on being ready to merge to main and committing directly to it is probably a bad idea.

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

SylviaWhittle · 2024-10-14T20:42:29Z

Weirdly MacOS runner seems to time out on tmate initialisation...

ns-rse · 2024-10-14T21:16:12Z

But tmate only runs if there has been an earlier failure which there has as we now see the tests fail with...

Error: test_interpolate_height_profile_images[Skeleton loop 1]

AssertionError: 
Arrays are not almost equal to 12 decimals

Mismatched elements: 104 / 104 (100%)
Max absolute difference: 10.53686996
Max relative difference: 9.22337208e+18
 x: array([ 4.307918269844,  4.8637[168](https://github.com/AFM-SPM/TopoStats/actions/runs/11334618627/job/31521176486?pr=932#step:5:169)38544,  5.285009481818,  5.534734846213,
        5.558195073897,  5.392339586946,  5.071938213424,  4.62108600[173](https://github.com/AFM-SPM/TopoStats/actions/runs/11334618627/job/31521176486?pr=932#step:5:174)9,
        4.09936482494 ,  3.548273814461,  2.995079890958,  2.504882276258,...
 y: array([4.67065435e-19, 5.27325236e-19, 5.73001876e-19, 6.00077154e-19,
       6.02620717e-19, 5.84638629e-19, 5.49900643e-19, 5.01019148e-19,
       4.44454025e-19, 3.84704618e-19, 3.24727212e-19, 2.71579881e-19,...
tests/measure/test_height_profiles.py::test_interpolate_height_profile_images[Skeleton loop 1] FAILED [ 17%]

...which looks like the order of magnitude change in e59d1bd.

I'll employ git bisect tomorrow to work out what has caused this change and then sort out my #949 in a similar manner.

ns-rse · 2024-10-14T21:32:48Z

tests/conftest.py

@@ -885,7 +885,7 @@ def _generate_random_skeleton(**extra_kwargs):
        "allow_overlap": True,
    }
    # kwargs.update
-    heights = {"scale": 100, "sigma": 5.0, "cval": 20.0}
+    heights = {"scale": 1e2, "sigma": 5.0, "cval": 20.0}


Are these not the same value?

Sorry for the confusion I caused, should have posted here after pushing. But aye, what Sylvia, said it was super weird as although the values 100 and 1e2 are the same, they have a very different effect when used in scale:

And the 1e2 is far from very small values which seem to give floating point errors.

ns-rse · 2024-10-14T21:33:28Z

tests/tracing/test_pruning.py

@@ -1242,7 +1241,7 @@ def test_prune_by_length(
        pytest.param(
            "pruning_skeleton",
            None,
-            8.0e-19,
+            7.3,


Why has the value as well as the magnitude changed for this set of parameters?

As seen above, some of the values changed slightly so I modified the test values in order to keep the expected output the same. When doing this I was logging the height values produced by the method_values to find one that fits but for the value of ~8e-19, the largest branch height was ~7.9 so couldn't be used.

ns-rse · 2024-10-14T21:33:47Z

tests/tracing/test_pruning.py

            # marks=pytest.mark.skip(),
        ),
        pytest.param(
            "pruning_skeleton",
            None,
-            7.7e-19,
+            7.1,


Ditto, value and magnitude have changed?

Similar story for this other one that changed.

ns-rse · 2024-10-14T21:35:03Z

tests/tracing/test_pruning.py

@@ -1374,7 +1373,7 @@ def test_prune_by_length(
        pytest.param(
            "pruning_skeleton",
            None,
-            1.0e-19,
+            10000,  # can assume any not-None value as the threshold is found via the IQR


Useful comments like this could go in the id="" where they will be clearly seen if the test fails.

I thought we'd want to keep the id's short and readable but if you think it's useful I'll make this change tomorrow

ns-rse · 2024-10-14T21:39:27Z

topostats/default_config.yaml

 base_dir: ./ # Directory in which to search for data files
 output_dir: ./output # Directory to output results to
 log_level: info # Verbosity of output. Options: warning, error, info, debug
-cores: 2 # Number of CPU cores to utilise for processing multiple files simultaneously.
+cores: 1 # Number of CPU cores to utilise for processing multiple files simultaneously.


Is there a reason for decreasing the number of cores? Roughly doubles processing time and most modern systems have at least 4 cores (although 2 is the default so that a) people's computers don't get clobbered; b) GitHub Runners have only 2 cores).

Might have been an accident, or intentional so users can see which file throws an error. Who's on the blame? If it's me, this was probs an accident I think I'd prefer the default to be speedier.

ns-rse

Why was the value of scale considered to be the cause of the Windows int32 v int64 error with dtype?

The affected variable was branch_type in the data frames that were being compared so nothing to do with height profiles.

These changes have affected the tests/measure/test_height_profiles.py::test_interpolate_height_profile_images because the same skeletons and height profiles are used in those tests.

It is worth running pytest before making pull requests and the introduction of pytest --testmon should have helped with that but may not be setup locally. Also branch and PR rather than committing directly to what is a development branch.

SylviaWhittle · 2024-10-14T21:50:15Z

Why was the value of scale considered to be the cause of the Windows int32 v int64 error with dtype?

The affected variable was branch_type in the data frames that were being compared so nothing to do with height profiles.

These changes have affected the tests/measure/test_height_profiles.py::test_interpolate_height_profile_images because the same skeletons and height profiles are used in those tests.

It is worth running pytest before making pull requests and the introduction of pytest --testmon should have helped with that but may not be setup locally. Also branch and PR rather than committing directly to what is a development branch.

That commit was made to fix a pruning test that fails exclusively on windows that I flagged on friday, it appeared to be floating point arithmetic on values of order ~1e-19 (posting from windows so don't have the screenshot to hand) Max found that it was likely able to be fixed by just increasing the values used in the pruning heights to be less prone to being in the floating point error range ie, by setting them to be order ~1e-1

MaxGamill-Sheffield and others added 30 commits August 15, 2024 15:59

+ docstring for tracingfuncs.reorderTrace class

5098a44

amended optional plot comment for clarity

6483626

Re-add the run_ordered_tracing try / except

d6be379

Nodestats list terminators use └

6cbf7c9

FWHM references explained in each func

c30308f

Amended compile_traces docstring and temp__heights references

b4865c3

removed '_''s from loops

db7b68b

'z' array name changed to more informative 'stacking_order'

e68b444

clarify remove_common_values()

7faa5cd

cleanup get_trace_idxs docstring and another comment

2bc1290

use list comprehension for compile trace start

48a6eab

removed stacking_order referenced

70b271b

replaced remove_duplicates with remove_common_values

0628d94

remove redundant code from over/under img

fd31aa1

merge conflict with Neil's suggestions

514fda4

Merge branch 'maxgamill-sheffield/800-btr-splining' of https://github…

d26911e

….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining

else in rolling window smoothing while loop to avoid empty mean list

09cb6f5

Add test for pool_trace_circular

44e9268

Add test for pool_trace_circular

d371e74

Merge branch 'SylviaWhittle/800-splining-tests' of github.com:AFM-SPM…

9d845e0

…/TopoStats into SylviaWhittle/800-splining-tests

Add contingency for no mols found and failing modules

0cca27f

rm accidentally copied over files

ebd3550

Changed input and output dir in default_config to ensure resources/mi…

c5d96e9

…nicircle.spm is still processed

added endpoint and crossing T/F stats from disordered tracing

9fa717f

Merge branch 'maxgamill-sheffield/800-btr-splining' of https://github…

c8c91ca

….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining

returns results_df not None when run is false

a050763

create mol stats csv

2f518aa

Edits cropped_masks so that each crop only contains one grain, all ot…

9d5a03d

…her grains set to 0

Fixes average_crossing_confs() so that it no longer terminates after …

3320b27

…first iteration

Removes run_dnatracing import that was causing tests to break

be5154d

SylviaWhittle mentioned this pull request Oct 11, 2024

Update notebooks for better tracing #943

Open

SylviaWhittle added 2 commits October 11, 2024 11:31

Force grain endpoints and grain junctions to be int64 (for windows te…

2c4658f

…sts)

Try to appease the worse operating system... (pyqt5 matplotlib backend)

07a4d96

Revert "Try to appease the worse operating system... (pyqt5 matplotli…

0ad6ed3

…b backend)" This reverts commit 07a4d96.

ns-rse reviewed Oct 14, 2024

View reviewed changes

topostats/tracing/disordered_tracing.py Outdated Show resolved Hide resolved

ns-rse reviewed Oct 14, 2024

View reviewed changes

topostats/tracing/disordered_tracing.py Outdated Show resolved Hide resolved

Re-done pruning images so heights >> 1e-19 (now ~1) causing floating …

e59d1bd

…point errors on windows (we think)

SylviaWhittle and others added 2 commits October 14, 2024 21:33

Type hints, thanks @ns-rse

b228394

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

Type hints, thanks @ns-rse

c80459a

Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>

SylviaWhittle mentioned this pull request Oct 14, 2024

fix(disordered_tracing): force branch_type to int64 #949

Open

ns-rse reviewed Oct 14, 2024

View reviewed changes

ns-rse requested changes Oct 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the Better Tracing Suite into Main #932

Adding the Better Tracing Suite into Main #932

MaxGamill-Sheffield commented Oct 7, 2024

SylviaWhittle commented Oct 11, 2024

SylviaWhittle commented Oct 11, 2024

ns-rse Oct 14, 2024

SylviaWhittle Oct 14, 2024

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

MaxGamill-Sheffield commented Oct 14, 2024

ns-rse commented Oct 14, 2024

ns-rse commented Oct 14, 2024

ns-rse commented Oct 14, 2024 •

edited

Loading

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

ns-rse Oct 14, 2024

MaxGamill-Sheffield Oct 14, 2024

ns-rse Oct 14, 2024

MaxGamill-Sheffield Oct 14, 2024

ns-rse Oct 14, 2024

MaxGamill-Sheffield Oct 14, 2024

ns-rse Oct 14, 2024

MaxGamill-Sheffield Oct 14, 2024

ns-rse Oct 14, 2024

MaxGamill-Sheffield Oct 14, 2024

ns-rse left a comment

SylviaWhittle commented Oct 14, 2024

Adding the Better Tracing Suite into Main #932

Are you sure you want to change the base?

Adding the Better Tracing Suite into Main #932

Conversation

MaxGamill-Sheffield commented Oct 7, 2024

SylviaWhittle commented Oct 11, 2024

SylviaWhittle commented Oct 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

MaxGamill-Sheffield commented Oct 14, 2024

ns-rse commented Oct 14, 2024

ns-rse commented Oct 14, 2024

ns-rse commented Oct 14, 2024 • edited Loading

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ns-rse left a comment

Choose a reason for hiding this comment

SylviaWhittle commented Oct 14, 2024

ns-rse commented Oct 14, 2024 •

edited

Loading