-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the Better Tracing Suite into Main #932
base: main
Are you sure you want to change the base?
Adding the Better Tracing Suite into Main #932
Conversation
….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining
…/TopoStats into SylviaWhittle/800-splining-tests
…nicircle.spm is still processed
….com/AFM-SPM/TopoStats into maxgamill-sheffield/800-btr-splining
…her grains set to 0
…b backend)" This reverts commit 07a4d96.
Reverted commit. The tests pass on everything except Windows python 3.9 so we just ditch that support please? |
@@ -888,6 +889,8 @@ def ordered_tracing_image( | |||
Results containing the ordered_trace_data (coordinates), any grain-level metrics to be added to the grains | |||
dataframe, a dataframe of molecule statistics and a dictionary of diagnostic images. | |||
""" | |||
topoly_version = importlib.metadata.version("topoly") | |||
print(f"Topoly version: {topoly_version}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be permanent?
If so LOGGER.debug()
might be appropriate and we could set up the tests to use a debug level in logging.
I will also address #896 now as its a small change and means you can SSH into GitHub runners when these fail and find out such information directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah not permanent, just desperate debugging to see what version was being used
@ns-rse Do you have any thoughts on this? Windows 3.9 fails due to tk issues, I tried using a different matplotlib backend but that caused Ubuntu to fail, and I'm scared of adding platform-specific code (backends for different platforms) 3.10 fails due to data type issues but I need to re-look at that as I've forgotten what I checked on Friday |
They also failed on Windows 3.10 with the following...
Could add in Thus after data frames are concatenated on line 344 of disordered_tracing_stats = pd.concat((disordered_tracing_stats, skan_df))
disordered_tracing_stats = disordered_tracing_stats.astype({"branch_type": 'int32'}) However, this will also require that the same step is done when reading the CSV using This creates a bit more work in my view and seems like the sort of situation that |
Yeah I will fix the type issue, by forcing the dtype as the data goes into the dictionary. Do you have any thoughts / ideas about the matplotlib tk / tcl errors that seem to be failing the rest of the tests the majority of the time? I've tried a different backend but this breaks ubuntu :( |
I think I've seen these quite a few times and from memory its when setting up/installing packages that things fail. I typically just have to manually re-run the tests. Python 3.9 is the minimum version of Python and I think for the sake of not having to explain to people how to install and upgrade Python we should support the minimum version. |
@ns-rse, @SylviaWhittle says this is on every re-run hence why she can't just push through. Is there a way to go into the test machines to manually ensure the package installs work? If not and as this is the highest priority, is dropping 3.9 support alright? |
@MaxGamill-Sheffield The tests are failing occasionally because of Dropping Python 3.9 support won't address this particular issue as it is a Windows issue. A quick search led me to this thread which suggests that, unsurprisingly, the underlying problem encountered here is with Numpy (Pandas Dataframes are all essentially Numpy arrays). |
@MaxGamill-Sheffield P.S. - The way to go into the GitHub runners is via the |
…point errors on windows (we think)
@MaxGamill-Sheffield Commit e59d1bd35059298c8f0794ab4f36d63fc3d9bc0c may have had some unintended consequences as other tests are now failing. Somehow I see the same errors on the PR #949 even though The failures on Windows do seem to be due to |
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
Co-authored-by: Neil Shephard <n.shephard@sheffield.ac.uk>
But
...which looks like the order of magnitude change in e59d1bd. I'll employ |
@@ -885,7 +885,7 @@ def _generate_random_skeleton(**extra_kwargs): | |||
"allow_overlap": True, | |||
} | |||
# kwargs.update | |||
heights = {"scale": 100, "sigma": 5.0, "cval": 20.0} | |||
heights = {"scale": 1e2, "sigma": 5.0, "cval": 20.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these not the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion I caused, should have posted here after pushing. But aye, what Sylvia, said it was super weird as although the values 100
and 1e2
are the same, they have a very different effect when used in scale:
And the 1e2
is far from very small values which seem to give floating point errors.
@@ -1242,7 +1241,7 @@ def test_prune_by_length( | |||
pytest.param( | |||
"pruning_skeleton", | |||
None, | |||
8.0e-19, | |||
7.3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the value as well as the magnitude changed for this set of parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As seen above, some of the values changed slightly so I modified the test values in order to keep the expected output the same. When doing this I was logging the height values produced by the method_values
to find one that fits but for the value of ~8e-19, the largest branch height was ~7.9 so couldn't be used.
# marks=pytest.mark.skip(), | ||
), | ||
pytest.param( | ||
"pruning_skeleton", | ||
None, | ||
7.7e-19, | ||
7.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, value and magnitude have changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar story for this other one that changed.
@@ -1374,7 +1373,7 @@ def test_prune_by_length( | |||
pytest.param( | |||
"pruning_skeleton", | |||
None, | |||
1.0e-19, | |||
10000, # can assume any not-None value as the threshold is found via the IQR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful comments like this could go in the id=""
where they will be clearly seen if the test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we'd want to keep the id
's short and readable but if you think it's useful I'll make this change tomorrow
base_dir: ./ # Directory in which to search for data files | ||
output_dir: ./output # Directory to output results to | ||
log_level: info # Verbosity of output. Options: warning, error, info, debug | ||
cores: 2 # Number of CPU cores to utilise for processing multiple files simultaneously. | ||
cores: 1 # Number of CPU cores to utilise for processing multiple files simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for decreasing the number of cores? Roughly doubles processing time and most modern systems have at least 4 cores (although 2 is the default so that a) people's computers don't get clobbered; b) GitHub Runners have only 2 cores).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have been an accident, or intentional so users can see which file throws an error. Who's on the blame? If it's me, this was probs an accident I think I'd prefer the default to be speedier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the value of scale
considered to be the cause of the Windows int32
v int64
error with dtype
?
The affected variable was branch_type
in the data frames that were being compared so nothing to do with height profiles.
These changes have affected the tests/measure/test_height_profiles.py::test_interpolate_height_profile_images
because the same skeletons and height profiles are used in those tests.
It is worth running pytest
before making pull requests and the introduction of pytest --testmon
should have helped with that but may not be setup locally. Also branch and PR rather than committing directly to what is a development branch.
That commit was made to fix a pruning test that fails exclusively on windows that I flagged on friday, it appeared to be floating point arithmetic on values of order ~1e-19 (posting from windows so don't have the screenshot to hand) Max found that it was likely able to be fixed by just increasing the values used in the pruning heights to be less prone to being in the floating point error range ie, by setting them to be order ~1e-1 |
Closes #800
This aims to add features which split out the DNATracing pipeline into smaller parts (disordered tracing, ordered tracing and splining) while adding individual analyses in each of these to be more modular. It also adds topological functions and analyses into ordered tracing, facilitating the processing of catenated molecules as separate objects, and includes a new module to handle and analyse crossings of DNA segments.