[feature] : Better Tracing & Skeletonisation Merger #800

MaxGamill-Sheffield · 2024-02-12T11:36:13Z

Is your feature request related to a problem?

Users currently require the better skeletonisation, pruning and tracing found within the maxgamill-sheffield/cats branch along with the height-traces Sylvia has made together into one branch.

Describe the solution you would like.

The coding group has decided the best approach is to cherry pick the relevant code (a.k.a no topology stuff yet, or NodeStats), and push these into main, then when we have the time we can write tests. This will hold off any releases to PyPi in the meantime as the Software will be untested.

The tasks are as below:

Current ToDo list:

Outputs

Get advice from experimentalists - How do you want the data saving in the csv if any? Due to multiple molecules per grain.
Get Sylvia to check and comment on the hdf5 structure.
Return crops and overlay traces.

Skeletons

Look into skeleton bias not seeming to work.

Tracing

Look into why fitted traces don't lie on the backbones (ordered traces do).
Get the contingency using ordinary tracing working - need to handle extra dict fields that aren't generated this way. #902

Config

Possibly move the save_fig config dpi param to the plotting dict as a high setting takes a while but are needed for skeletal plots. Or change so "null" takes the plotting config defaults, not "Figure".
Work out something for the mask_cmap to produce > 1 colour for crossing plots.

Code Tidy

Add docstrings.
Maybe tidy up the behemoth analyse_nodes function.

`pre-commit` hooks

The following checks have been disabled whilst @ns-rse started working on adding/developing tests (under #818) and will need re-enabling and the problems/errors they highlight addressing (doing so off the bat would be a massive amount of work and @ns-rse is focusing on writing unit-tests for the changes on this branch).

Specific hooks can be run on specific files with the following.

pre-commit run <hook-name> --file <path-to-file>

Once they pass the relevant checks the task can be checked off.

`numpydoc-validation`

Disabled for the following files by adding patterns to pyproject.toml configuration file. Lines that exclude are noted to assist with their removal. Once removed run the pre-commit run numpydoc-validation --file <path-to-file> locally to check everything.

topostats/tracing/dnatracing.py (line 275)
topostats/tracing/nodestats.py (line 276)
topostats/tracing/tracinfuncs.py (line 277) - may be redundant as code may have moved to other modules.
topostats/theme.py (line 279)

`codespell`

The following files have been excluded from this check for now, the relevant line to remove them from in pyproject.toml is 258...

[tool.codespell]
skip = '*.spm*,*.mplstyle,*.svg,nodestats.py,dnatracing.py'
count = ''
quiet-level = 3

topostats/tracing/dnatracing.py
topostats/tracing/nodestats.py

`pylint`

Currently the following files are explicitly excluded from being checked by pylint this will need reinstating and issues flagged addressing.

topostats/tracing/nodestats.py
topostats/tracing/dnatracing.py

The text was updated successfully, but these errors were encountered:

ns-rse · 2024-02-12T16:25:45Z

This will hold off any releases to PyPi in the meantime as the Software will be untested.

It will however mean people can work off of a single main branch installed from Git which will hold all the desired features.

Height-tracing and tests were merged with #790 this morning, closing the original issue #488.

MaxGamill-Sheffield · 2024-02-23T12:27:46Z

MaxGamill-Sheffield · 2024-02-23T13:09:56Z

Get advice from experimentalists

Happy with the state of what goes into the grainstats file and the .topostats files and it makes sense.
Distinction between grain and molecule seems clear.
Only thing they wanted to add was to understand how to load data from the .topostats file to plot.

llwiggins · 2024-03-01T13:20:45Z

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

MaxGamill-Sheffield · 2024-03-01T14:47:25Z

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

Great point! There are a few edits within the cats branch to improve the U-Net code (1 object from the segmentation and square crop edge-cases) but I think as it's quite separated from the rest of the code to be added, it might be more beneficial to start that as a separate branch and then merge it.

What do you think @SylviaWhittle?

ns-rse · 2024-05-21T12:52:16Z

Hi @MaxGamill-Sheffield,

I'm back on writing more tests for pruning and looking at the topostats.tracing.pruning.topostatsPrune() class. There is one method prune_all_skeletons() that I hadn't previously written tests for but now I look at it I've got some questions about it.

Question 1 - Going loopy

This method uses skimage.morphology.label() to label grains and then loops over them. This shouldn't be needed though since in #600 I refactored the tracing class to work with single grains and the control of tracing multiple grains from a single image is now done in the processing.run_dnatracing() function.

Is this looping a throwback to the older code from before the refactoring or is there a specific reason to loop over multiple grains? Currently I don't think it would do more than one loop since skimage.morphology.label() would label entangled grains as a single item.

Question 2 - Length v Height v both?

The logic in the prune_all_skeletons() isn't clear to me.

Currently it has...

    def prune_all_skeletons(self) -> npt.NDArray:
        """
        Prune all skeletons.
        Returns
        -------
        npt.NDArray
            A single mask with all pruned skeletons.
        """
        pruned_skeleton_mask = np.zeros_like(self.skeleton)
        labeled_skel = morphology.label(self.skeleton)
        for i in range(1, labeled_skel.max() + 1):
            single_skeleton = np.where(labeled_skel == i, 1, 0)
            if self.max_length is not None:
                single_skeleton = self._prune_by_length(single_skeleton, max_length=self.max_length)
            if self.height_threshold is not None:
                single_skeleton = heightPruning(
                    self.img,
                    single_skeleton,
                    height_threshold=self.height_threshold,
                    method_values=self.method_values,
                    method_outlier=self.method_outlier,
                ).remove_bridges()
            # skeletonise to remove nibs
            pruned_skeleton_mask += getSkeleton(self.img, single_skeleton, method="zhang").get_skeleton()
        return pruned_skeleton_mask

Within the for loop there are two if statements.

Prune single_skeleton if the self.max_length is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is -1 (which triggers length based pruning based on 15% of grain length).
Prune single_skeleton if the self.height_threshold is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is None.

This sounds ok, but what if a configuration such as...

dnatracing:
  run: true # Options : true, false
  ...
  pruning_params:
    pruning_method: topostats
    max_length: -1
    height_threshold: 789
    method_values: mid
    method_outlier: mean_abs

...were used by a user? Its not clear to the user, based on the docstrings as they stand what would happen.

I can say looking at the code that only the height-based pruning would be undertaken since the results of pruning based on length are not passed into the heightPruning() class, but is this what is expected?

Are there cases where both length and height based pruning should be applied?

Background

I realised this when starting to remove the for: loop and refactor and write out paramterised tests to check the effects of the different combinations of options for max_length / height_threshold / method_values / method_outlier as ideally tests should test the effects of different options but then I realised the lack of clarity about what behaviour is intended or required.

MaxGamill-Sheffield · 2024-05-21T17:25:43Z

Hey @ns-rse ,

A1: Yep just old code that I didn't have time to refactor, and I think it was designed similarly to the skeletonisation code which did the same thing as that's what the skimage methods did.

A2: This should absolutely take both the parameters from the config and performing both types of pruning on the skeleton, but concurrently (obvs there may be some issue with which comes first but this is sensible to me as they address different problems). However, maybe I'm misunderstanding but both pruning types update single_skeleton and use single_skeleton so I thought this would be working as intended and both will run under your config?

Also unless there are catastrophic things wrong with the code (e.g. if I've misunderstood here and both pruning stages can't happen together) I'm not sure it's worth your time refactoring the code so we can make the most of your time with us.

ns-rse · 2024-05-22T06:30:20Z

D'oh! 🤦 I clearly hadn't engaged my brain when I sat down and looked at the code yesterday as it will indeed apply both pruning types. I'm an idiot sometimes, thanks for pointing out my error so politely.

As I'm unlikely to look at this code again any time soon I will refactor after writing tests to remove the unnecessary loop, otherwise it will persist in the code base and need removing somewhen which may never happen. By writing a test first and then refactoring it will ensure I don't break anything.

Cheers,

@ns-rse

ns-rse · 2024-07-02T07:05:17Z

@MaxGamill-Sheffield and I met to discuss moving this forward.

Currently Nodestats is called from within DNATracing and it returns one or more disentangled loops if there are junctions (if there are no junctions then a single loop is returned).

Tracing is then undertaken.

Currently the term "DNA Tracing" is a bit of a catch all for undertaking a lot of steps,

Skeletonisation (tracing.skeletonise.getSkeleton())
Pruning (tracing.skeletonise.pruneSkeleton())
Unordered traces
Nodestats (tracing.dnatracing.dnaTrace())
Ordering traces
Splinning
Curvature calculations (tracing.dnacurvature.Curvature())

The number of methods and lines of code is growing and growing making the dnaTrace class increasingly complex, although as noted above some of the functionality has been broken out into its own modules/classes and been refactored (thanks @MaxGamill-Sheffield and @llwiggins).

Currently we process scans via the processing.process_scan() function (called from run_topostats.runtopostats()) and it makes calls to...

processing.run_filters()
processing.run_grains()
processing.run_grainstats()
processing.run_dnatracing()

The processing.run_dnatracing() step undertakes a lot and we agreed that we could introduce a number of additional steps to the above...

processing.run_filters()
processing.run_grains()
processing.run_grainstats()
processing.run_nodestats()
processing.run_dnatracing()
processing.run_splinning()
processing.run_curvature()

Workflow

The following workflow shows how these sections hang together. This may be modified in future and we would benefit from expanding the level of detail of the steps taken by nodeStats in a similar vein to some of the existing documentation.

%%{init: {'theme': 'base', 'gitGraph': {'rotateCommitLabel': true}
         }
}%%
graph TD;

  A1([Filter]) --> A2
  A2([Grains]) --> A3
  A3([GrainStats]) --> A4
  A4([NodeStats]) --> A5
  A5([DNA Tracing]) --> A6
  A6([Splining])

  
  style A1 fill:#648FFF,stroke:#000000
  style A2 fill:#648FFF,stroke:#000000
  style A3 fill:#648FFF,stroke:#000000
  style A4 fill:#00FF90,stroke:#000000
  style A5 fill:#FE6100,stroke:#000000
  style A6 fill:#FEE100,stroke:#000000

  B1([NodeStats]) --> B2
  B2([Smooth Grains]) --> B3
  B3([Get Disordered Trace]) --> B4
  B4([NodeStats - Get Stats]) --> B5([OUT: Node Dict\nand compile csv stats:\nnum crossings\navg/min cross conf])

  style B1 fill:#00FF90,stroke:#000000
  style B2 fill:#00FF90,stroke:#000000
  style B3 fill:#00FF90,stroke:#000000
  style B4 fill:#00FF90,stroke:#000000
  style B5 fill:#00FF90,stroke:#000000


  C1([DNA Tracing]) --> C2
  C2([Nodestats - Compile Traces]) --> D1
  C1 --> C3([Traditional Tracing:])
  C3 --> C4([Linear or Circular])
  C4 --> C5([Get Ordered Traces])
  C5 --> D1

  style C1 fill:#FE6100,stroke:#000000
  style C2 fill:#FE6100,stroke:#000000
  style C3 fill:#FE6100,stroke:#000000
  style C4 fill:#FE6100,stroke:#000000
  style C5 fill:#FE6100,stroke:#000000


  D1([Linear or Circular]) --> D2
  D2([Get Ordered Trace Heights]) --> D3
  D3([Get Ordered Trace Distances])

  style D1 fill:#FE6100,stroke:#000000
  style D2 fill:#FE6100,stroke:#000000
  style D3 fill:#FE6100,stroke:#000000
  

  E1([Splining]) --> E2
  E2([xX Fitted Traces Xx]) --> E21
  E21([Get Splined Traces]) --> E3
  E3([Add to Image]) --> E4
  E4([Contour Lengths]) --> E5
  E5([End 2 End Distance])
  
  style E1 fill:#FEE100,stroke:#000000
  style E2 fill:#FEE100,stroke:#000000
  style E21 fill:#FEE100,stroke:#000000
  style E3 fill:#FEE100,stroke:#000000
  style E4 fill:#FEE100,stroke:#000000
  style E5 fill:#FEE100,stroke:#000000

MaxGamill-Sheffield added the enhancement New feature or request label Feb 12, 2024

MaxGamill-Sheffield assigned SylviaWhittle, MaxGamill-Sheffield and llwiggins Feb 12, 2024

ns-rse mentioned this issue May 14, 2024

[tests] Consolidate fixtures from pruning and height profiles #843

Closed

ns-rse mentioned this issue Jun 4, 2024

Fix broken tests in Better Tracing & Skeletonisation Merger branch #850

Open

6 tasks

ns-rse mentioned this issue Sep 23, 2024

Refactoring and tests of dnatracing.py #183

Closed

8 tasks

ns-rse mentioned this issue Oct 7, 2024

Performance regression of Topoly with Python 3.12 #927

Open

MaxGamill-Sheffield linked a pull request Oct 7, 2024 that will close this issue

Adding the Better Tracing Suite into Main #932

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] : Better Tracing & Skeletonisation Merger #800

[feature] : Better Tracing & Skeletonisation Merger #800

MaxGamill-Sheffield commented Feb 12, 2024 •

edited by ns-rse

Loading

ns-rse commented Feb 12, 2024

MaxGamill-Sheffield commented Feb 23, 2024 •

edited

Loading

MaxGamill-Sheffield commented Feb 23, 2024 •

edited

Loading

llwiggins commented Mar 1, 2024

MaxGamill-Sheffield commented Mar 1, 2024

ns-rse commented May 21, 2024

MaxGamill-Sheffield commented May 21, 2024

ns-rse commented May 22, 2024

ns-rse commented Jul 2, 2024

[feature] : Better Tracing & Skeletonisation Merger #800

[feature] : Better Tracing & Skeletonisation Merger #800

Comments

MaxGamill-Sheffield commented Feb 12, 2024 • edited by ns-rse Loading

Is your feature request related to a problem?

Describe the solution you would like.

Outputs

Skeletons

Tracing

Config

Code Tidy

pre-commit hooks

numpydoc-validation

codespell

pylint

ns-rse commented Feb 12, 2024

MaxGamill-Sheffield commented Feb 23, 2024 • edited Loading

MaxGamill-Sheffield commented Feb 23, 2024 • edited Loading

llwiggins commented Mar 1, 2024

MaxGamill-Sheffield commented Mar 1, 2024

ns-rse commented May 21, 2024

Question 1 - Going loopy

Question 2 - Length v Height v both?

Background

MaxGamill-Sheffield commented May 21, 2024

ns-rse commented May 22, 2024

ns-rse commented Jul 2, 2024

Workflow

MaxGamill-Sheffield commented Feb 12, 2024 •

edited by ns-rse

Loading

`pre-commit` hooks

`numpydoc-validation`

`codespell`

`pylint`

MaxGamill-Sheffield commented Feb 23, 2024 •

edited

Loading

MaxGamill-Sheffield commented Feb 23, 2024 •

edited

Loading