Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] : Better Tracing & Skeletonisation Merger #800

Open
21 of 29 tasks
MaxGamill-Sheffield opened this issue Feb 12, 2024 · 9 comments · May be fixed by #932
Open
21 of 29 tasks

[feature] : Better Tracing & Skeletonisation Merger #800

MaxGamill-Sheffield opened this issue Feb 12, 2024 · 9 comments · May be fixed by #932
Assignees
Labels
enhancement New feature or request

Comments

@MaxGamill-Sheffield
Copy link
Collaborator

MaxGamill-Sheffield commented Feb 12, 2024

Is your feature request related to a problem?

Users currently require the better skeletonisation, pruning and tracing found within the maxgamill-sheffield/cats branch along with the height-traces Sylvia has made together into one branch.

Describe the solution you would like.

The coding group has decided the best approach is to cherry pick the relevant code (a.k.a no topology stuff yet, or NodeStats), and push these into main, then when we have the time we can write tests. This will hold off any releases to PyPi in the meantime as the Software will be untested.

The tasks are as below:

Current ToDo list:

Outputs

  • Get advice from experimentalists - How do you want the data saving in the csv if any? Due to multiple molecules per grain.
  • Get Sylvia to check and comment on the hdf5 structure.
  • Return crops and overlay traces.

Skeletons

  • Look into skeleton bias not seeming to work.

Tracing

Config

  • Possibly move the save_fig config dpi param to the plotting dict as a high setting takes a while but are needed for skeletal plots. Or change so "null" takes the plotting config defaults, not "Figure".
  • Work out something for the mask_cmap to produce > 1 colour for crossing plots.

Code Tidy

  • Add docstrings.
  • Maybe tidy up the behemoth analyse_nodes function.

pre-commit hooks

The following checks have been disabled whilst @ns-rse started working on adding/developing tests (under #818) and will need re-enabling and the problems/errors they highlight addressing (doing so off the bat would be a massive amount of work and @ns-rse is focusing on writing unit-tests for the changes on this branch).

Specific hooks can be run on specific files with the following.

pre-commit run <hook-name> --file <path-to-file>

Once they pass the relevant checks the task can be checked off.

numpydoc-validation

Disabled for the following files by adding patterns to pyproject.toml configuration file. Lines that exclude are noted to assist with their removal. Once removed run the pre-commit run numpydoc-validation --file <path-to-file> locally to check everything.

  • topostats/tracing/dnatracing.py (line 275)
  • topostats/tracing/nodestats.py (line 276)
  • topostats/tracing/tracinfuncs.py (line 277) - may be redundant as code may have moved to other modules.
  • topostats/theme.py (line 279)

codespell

The following files have been excluded from this check for now, the relevant line to remove them from in pyproject.toml is 258...

[tool.codespell]
skip = '*.spm*,*.mplstyle,*.svg,nodestats.py,dnatracing.py'
count = ''
quiet-level = 3
  • topostats/tracing/dnatracing.py
  • topostats/tracing/nodestats.py

pylint

Currently the following files are explicitly excluded from being checked by pylint this will need reinstating and issues flagged addressing.

  • topostats/tracing/nodestats.py
  • topostats/tracing/dnatracing.py
@ns-rse
Copy link
Collaborator

ns-rse commented Feb 12, 2024

This will hold off any releases to PyPi in the meantime as the Software will be untested.

It will however mean people can work off of a single main branch installed from Git which will hold all the desired features.

Height-tracing and tests were merged with #790 this morning, closing the original issue #488.

@MaxGamill-Sheffield
Copy link
Collaborator Author

MaxGamill-Sheffield commented Feb 23, 2024

Skeletonisation and pruning added together with the nodestats tracing due to their intertwining and is on branch maxgamill-sheffield/800-better-tracing (linked in the development section).

It currently runs to completion when testing on the nice cat image and minicircles.spm, but a few things have not translated over well...

Current ToDo list:
Outputs

  • Get advice from experimentalists - How do you want the data saving in the csv if any? Due to multiple molecules per grain.
  • Get Sylvia to check and comment on the hdf5 structure.
  • Return crops and overlay traces.

Skeletons

  • Look into skeleton bias not seeming to work.

Tracing

  • Look into why fitted traces don't lie on the backbones (ordered traces do).
  • Get the contingency using ordinary tracing working - need to handle extra dict fields that aren't generated this way.

Config

  • Possibly move the save_fig config dpi param to the plotting dict as a high setting takes a while but are needed for skeletal plots. Or change so "null" takes the plotting config defaults, not "Figure".
  • Work out something for the mask_cmap to produce > 1 colour for crossing plots.

Code Tidy

  • Add docstrings.
  • Maybe tidy up the behemoth analyse_nodes function.

@MaxGamill-Sheffield
Copy link
Collaborator Author

MaxGamill-Sheffield commented Feb 23, 2024

Get advice from experimentalists

Happy with the state of what goes into the grainstats file and the .topostats files and it makes sense.
Distinction between grain and molecule seems clear.
Only thing they wanted to add was to understand how to load data from the .topostats file to plot.

@llwiggins
Copy link
Collaborator

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

@MaxGamill-Sheffield
Copy link
Collaborator Author

Should U-net masking be included as part of this merger, or perhaps as a separate issue/PR?

Great point! There are a few edits within the cats branch to improve the U-Net code (1 object from the segmentation and square crop edge-cases) but I think as it's quite separated from the rest of the code to be added, it might be more beneficial to start that as a separate branch and then merge it.

What do you think @SylviaWhittle?

@ns-rse
Copy link
Collaborator

ns-rse commented May 21, 2024

Hi @MaxGamill-Sheffield,

I'm back on writing more tests for pruning and looking at the topostats.tracing.pruning.topostatsPrune() class. There is one method prune_all_skeletons() that I hadn't previously written tests for but now I look at it I've got some questions about it.

Question 1 - Going loopy

This method uses skimage.morphology.label() to label grains and then loops over them. This shouldn't be needed though since in #600 I refactored the tracing class to work with single grains and the control of tracing multiple grains from a single image is now done in the processing.run_dnatracing() function.

Is this looping a throwback to the older code from before the refactoring or is there a specific reason to loop over multiple grains? Currently I don't think it would do more than one loop since skimage.morphology.label() would label entangled grains as a single item.

Question 2 - Length v Height v both?

The logic in the prune_all_skeletons() isn't clear to me.

Currently it has...

    def prune_all_skeletons(self) -> npt.NDArray:
        """
        Prune all skeletons.
        Returns
        -------
        npt.NDArray
            A single mask with all pruned skeletons.
        """
        pruned_skeleton_mask = np.zeros_like(self.skeleton)
        labeled_skel = morphology.label(self.skeleton)
        for i in range(1, labeled_skel.max() + 1):
            single_skeleton = np.where(labeled_skel == i, 1, 0)
            if self.max_length is not None:
                single_skeleton = self._prune_by_length(single_skeleton, max_length=self.max_length)
            if self.height_threshold is not None:
                single_skeleton = heightPruning(
                    self.img,
                    single_skeleton,
                    height_threshold=self.height_threshold,
                    method_values=self.method_values,
                    method_outlier=self.method_outlier,
                ).remove_bridges()
            # skeletonise to remove nibs
            pruned_skeleton_mask += getSkeleton(self.img, single_skeleton, method="zhang").get_skeleton()
        return pruned_skeleton_mask

Within the for loop there are two if statements.

  1. Prune single_skeleton if the self.max_length is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is -1 (which triggers length based pruning based on 15% of grain length).
  2. Prune single_skeleton if the self.height_threshold is not None. The value for this parameter set in the topostats/default_config.yaml of your branch is None.

This sounds ok, but what if a configuration such as...

dnatracing:
  run: true # Options : true, false
  ...
  pruning_params:
    pruning_method: topostats
    max_length: -1
    height_threshold: 789
    method_values: mid
    method_outlier: mean_abs

...were used by a user? Its not clear to the user, based on the docstrings as they stand what would happen.

I can say looking at the code that only the height-based pruning would be undertaken since the results of pruning based on length are not passed into the heightPruning() class, but is this what is expected?

Are there cases where both length and height based pruning should be applied?

Background

I realised this when starting to remove the for: loop and refactor and write out paramterised tests to check the effects of the different combinations of options for max_length / height_threshold / method_values / method_outlier as ideally tests should test the effects of different options but then I realised the lack of clarity about what behaviour is intended or required.

@MaxGamill-Sheffield
Copy link
Collaborator Author

Hey @ns-rse ,

A1: Yep just old code that I didn't have time to refactor, and I think it was designed similarly to the skeletonisation code which did the same thing as that's what the skimage methods did.

A2: This should absolutely take both the parameters from the config and performing both types of pruning on the skeleton, but concurrently (obvs there may be some issue with which comes first but this is sensible to me as they address different problems). However, maybe I'm misunderstanding but both pruning types update single_skeleton and use single_skeleton so I thought this would be working as intended and both will run under your config?

Also unless there are catastrophic things wrong with the code (e.g. if I've misunderstood here and both pruning stages can't happen together) I'm not sure it's worth your time refactoring the code so we can make the most of your time with us.

@ns-rse
Copy link
Collaborator

ns-rse commented May 22, 2024

D'oh! 🤦 I clearly hadn't engaged my brain when I sat down and looked at the code yesterday as it will indeed apply both pruning types. I'm an idiot sometimes, thanks for pointing out my error so politely.

As I'm unlikely to look at this code again any time soon I will refactor after writing tests to remove the unnecessary loop, otherwise it will persist in the code base and need removing somewhen which may never happen. By writing a test first and then refactoring it will ensure I don't break anything.

Cheers,

@ns-rse

@ns-rse
Copy link
Collaborator

ns-rse commented Jul 2, 2024

@MaxGamill-Sheffield and I met to discuss moving this forward.

Currently Nodestats is called from within DNATracing and it returns one or more disentangled loops if there are junctions (if there are no junctions then a single loop is returned).

Tracing is then undertaken.

Currently the term "DNA Tracing" is a bit of a catch all for undertaking a lot of steps,

  • Skeletonisation (tracing.skeletonise.getSkeleton())
  • Pruning (tracing.skeletonise.pruneSkeleton())
  • Unordered traces
  • Nodestats (tracing.dnatracing.dnaTrace())
  • Ordering traces
  • Splinning
  • Curvature calculations (tracing.dnacurvature.Curvature())

The number of methods and lines of code is growing and growing making the dnaTrace class increasingly complex, although as noted above some of the functionality has been broken out into its own modules/classes and been refactored (thanks @MaxGamill-Sheffield and @llwiggins).

Currently we process scans via the processing.process_scan() function (called from run_topostats.runtopostats()) and it makes calls to...

  • processing.run_filters()
  • processing.run_grains()
  • processing.run_grainstats()
  • processing.run_dnatracing()

The processing.run_dnatracing() step undertakes a lot and we agreed that we could introduce a number of additional steps to the above...

  • processing.run_filters()
  • processing.run_grains()
  • processing.run_grainstats()
  • processing.run_nodestats()
  • processing.run_dnatracing()
  • processing.run_splinning()
  • processing.run_curvature()

Workflow

The following workflow shows how these sections hang together. This may be modified in future and we would benefit from expanding the level of detail of the steps taken by nodeStats in a similar vein to some of the existing documentation.

%%{init: {'theme': 'base', 'gitGraph': {'rotateCommitLabel': true}
         }
}%%
graph TD;

  A1([Filter]) --> A2
  A2([Grains]) --> A3
  A3([GrainStats]) --> A4
  A4([NodeStats]) --> A5
  A5([DNA Tracing]) --> A6
  A6([Splining])

  
  style A1 fill:#648FFF,stroke:#000000
  style A2 fill:#648FFF,stroke:#000000
  style A3 fill:#648FFF,stroke:#000000
  style A4 fill:#00FF90,stroke:#000000
  style A5 fill:#FE6100,stroke:#000000
  style A6 fill:#FEE100,stroke:#000000

  B1([NodeStats]) --> B2
  B2([Smooth Grains]) --> B3
  B3([Get Disordered Trace]) --> B4
  B4([NodeStats - Get Stats]) --> B5([OUT: Node Dict\nand compile csv stats:\nnum crossings\navg/min cross conf])

  style B1 fill:#00FF90,stroke:#000000
  style B2 fill:#00FF90,stroke:#000000
  style B3 fill:#00FF90,stroke:#000000
  style B4 fill:#00FF90,stroke:#000000
  style B5 fill:#00FF90,stroke:#000000


  C1([DNA Tracing]) --> C2
  C2([Nodestats - Compile Traces]) --> D1
  C1 --> C3([Traditional Tracing:])
  C3 --> C4([Linear or Circular])
  C4 --> C5([Get Ordered Traces])
  C5 --> D1

  style C1 fill:#FE6100,stroke:#000000
  style C2 fill:#FE6100,stroke:#000000
  style C3 fill:#FE6100,stroke:#000000
  style C4 fill:#FE6100,stroke:#000000
  style C5 fill:#FE6100,stroke:#000000


  D1([Linear or Circular]) --> D2
  D2([Get Ordered Trace Heights]) --> D3
  D3([Get Ordered Trace Distances])

  style D1 fill:#FE6100,stroke:#000000
  style D2 fill:#FE6100,stroke:#000000
  style D3 fill:#FE6100,stroke:#000000
  

  E1([Splining]) --> E2
  E2([xX Fitted Traces Xx]) --> E21
  E21([Get Splined Traces]) --> E3
  E3([Add to Image]) --> E4
  E4([Contour Lengths]) --> E5
  E5([End 2 End Distance])
  
  style E1 fill:#FEE100,stroke:#000000
  style E2 fill:#FEE100,stroke:#000000
  style E21 fill:#FEE100,stroke:#000000
  style E3 fill:#FEE100,stroke:#000000
  style E4 fill:#FEE100,stroke:#000000
  style E5 fill:#FEE100,stroke:#000000
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants