Skip to content

2c) Running the Clustering Pipeline to Cluster a Single Subject from the Atlas

Fan Zhang edited this page Apr 21, 2019 · 15 revisions

Running the Clustering Pipeline to Cluster a Single Subject from the Atlas

The following steps assume you have a cluster atlas, i.e. a data-driven parcellation providing a model of common white matter structures in a population, created using whitematteranalysis. This atlas can be created using your own data for a study-specific parcellation, or you can use a pre-provided atlas.

The commands for running the clustering pipeline are listed below in bold text with a brief explanation of their use. To get more information on any command, run it with the --help flag as follows:

wm_NAME_OF_SCRIPT.py --help

The output file(s) will be created in the output directory you specify. The output directory you specify will be created for you if it does not exist.

Initial data quality control

This step is important to check that all subjects' tractography data has been created the same way (same data stored), that the gradient directions in the input DWI files that produced the tractography were okay (visual inspection of tract anatomy to verify correct appearance), and that in general the tractography dataset is ready to proceed to the next step. In addition, this command is useful to visually and quantitatively inspect any directory of tractography, such as the output from creating a cluster atlas.

wm_quality_control_tractography.py

  • First step in the pipeline to check tractography files for errors.
  • Outputs rendered images of each subject (each tractography file in input directory) and information about fiber length distributions and data fields stored along the tracts.
  • Input fiber tracts must be in vtkPolyData format, in either .vtk or .vtp files.
  • Example command to perform quality control on all files in the input_tractography directory:
wm_quality_control_tractography.py input_tractography/ qc_output/

Commands to apply the atlas to a new subject

In this step, a subject is first registered to the atlas. Note that if the subject was already registered as part of a group (if all subjects were used in a study-specific atlas) then it does not need to be registered again. After ensuring the subject data is in the atlas coordinate system, then all input fibers from the subject are clustered according to the atlas.

wm_register_to_atlas_new.py

  • This code registers a single subject to the registration_atlas.vtk that was created by multisubject registration.
  • This should be run before clustering from the atlas (unless you want the output in subject space--in that case, see below -reg parameter).
  • This command can run tractography registration with options of different modes, including: 'rigid_affine_fast', 'affine' and 'nonrigid' (b-spline).
  • For tractography data that has relatively similar shape to the atlas, e.g., tractography computed from a healthy adult dataset, a two-step registration of 'affine+ nonrigid' is recommended: first running affine then running nonrigid afterwards with the affine output as input.
  • For tractography data computed from a dataset, e.g. of a brain tumor patient and a very young clild, a single-step registration of 'rigid_affine_fast' is recommended. This is because this mode is more robust to the tractography data with different shape to the atlas. For example, shape of local fiber tracts can be changed by tumor and edema in a brain tumor patient dataset.
wm_register_to_atlas_new.py -l 40 -mode affine 
path_to_input_vtk groupwise_registraion_directory/registration_atlas.vtk registered_subject_output/  

wm_cluster_from_atlas.py

  • Clusters tractography for single cases according to the cluster atlas.
  • This code can (optionally) register the atlas to the subject space (-reg). [currently not supported]
  • This step should be run for each subject in the full dataset to be analyzed.
  • By default this command will cluster ALL input fibers. Analyzing ALL fibers is necessary to do a study. For a quick test, it's possible to use the -f (number of fibers) parameter to cluster a subset of the data. Otherwise avoid the -f parameter.
  • The default minimum fiber length (-l) to be clustered is 40mm. Values in the range of 40-60mm are reasonable for adult brain data. Lower values will include more fibers in shorter tracts like the uncinate and can be more appropriate for brains smaller than adult human size.
  • The input 'atlas_folder' is the folder where the atlas.vtp and atlas.p files are.
wm_cluster_from_atlas.py -l 40 
registered_subject_output/subject.vtk atlas_folder/ subject_cluster_output/

wm_cluster_remove_outliers.py

  • Removes outliers in a subject dataset that was clustered from a cluster atlas.
  • This script uses the atlas to identify and remove outliers in each cluster of the subject. The atlas must be the same one used to cluster the subject dataset.
  • This is different from the outlier removal process during the atlas generation that is done automatically by wm_cluster_atlas.py.
  • The default setting for outlier removal is 2 standard deviations from the atlas cluster.
wm_cluster_remove_outliers.py atlas_folder/ subject_cluster_outlier_removed_output/

wm_separate_clusters_by_hemisphere.py (in atlas space)

  • Bilateral clustering, simultaneously segmenting fibers in both hemispheres, is performed following the above steps. If separate visualization or measurement of each hemisphere is needed, run this script to separate each cluster into left/right/commissural tracts according to the percentage of each fiber.
  • The output is three directories of fiber bundles according to left hemisphere, right hemisphere, and commissural tracts. Also copies any Slicer MRML scene file that is given as input into the three separate directories.
  • This command should be run on clusters that are in the atlas coordinate system (not subject space).
  • A parameter ''-pthresh'' needs to be specified (default 0.6) to decide if a cluster is a commissural tract or a hemispheric tract. This parameter can be skipped when using a pre-provided atlas because a cluster location file that defines the commissural and hemispheric clusters is provided. In this situation, use ''-clusterLocationFile'' to specify the path of the location file.
  • A parameter ''-labelInputClusterOnly'' can be specified without outputting the separated fiber clusters in the atlas space. However, the fiber location information will be saved in the input fiber cluster file, which can be used later for hemisphere separation. Usually, such a hemisphere step is done after transforming the fiber clusters in the atlas space back to the original DWI space (see the next step).
wm_separate_clusters_by_hemisphere.py -pthresh 0.6 -atlasMRML atlas_folder/clustered_tracts_display_100_percent.mrml 
subject_cluster_outlier_removed_output/ subject_cluster_separated_output/ 

or 

wm_separate_clusters_by_hemisphere.py -clusterLocationFile atlas_folder/cluster_hemisphere_location.txt -atlasMRML atlas_folder/clustered_tracts_display_100_percent.mrml 
subject_cluster_outlier_removed_output/ subject_cluster_separated_output/ 

wm_harden_transform.py

  • This code transforms the fiber clusters in the atlas space back to the DWI space.
  • This code applies the inverse transformation matrix (a .tfm file) computed in the above tractography registration step to the fiber cluster files.
  • 3D Slicer is needed to do the transformation.
  • If a two-step registration of 'affine+nonrigid' was used, a two-step transformation is needed, including a first inverse-nonrigid (using the tmf file in the nonrigid registration result folder) and then an inverse-affine (using the tmf file in the affine registration result folder) with the inverse-nonrigid output as input.
  • Make sure '-i' (inverse transform) is specified in the command.
  • This code will need X display mode because it starts a 3D Slicer for the transformation. If running this code in an environment without X display support (e.g. a high performance cluster server), one can add 'xvfb-run' before the below command to run it in a virtual X server environment.
wm_harden_transform.py -i -t registered_subject_output/output_tractography/transformation_file.tfm 
subject_cluster_outlier_removed_output transformed_cluster_output path_to_Slicer

wm_separate_clusters_by_hemisphere.py (in subject DWI space)

  • This step runs the same script as the hemisphere separation in the atlas space, but now it separates the fiber clusters that have been transformed back to the subject DWI space.
  • Make sure that the hemisphere separation in the atlas space has been performed. (Fiber hemisphere location needs to be decided in the atlas space.)
wm_separate_clusters_by_hemisphere.py -atlasMRML atlas_folder/clustered_tracts_display_100_percent.mrml 
transformed_cluster_output/ subject_cluster_separated_output/ 

wm_append_clusters.py

  • This code appends multiple fiber clusters into a larger fiber tract.
  • This is normally performed when it is known certain fiber clusters belonging to a white matter structure.
  • If using a pre-provided anatomically curated atlas, a parameter ''-tractMRML'' can be used to specify the path to a MRML file. Then, the code will look up all clusters defined in the MRML file and append these clusters together.
  • If one needs to manually specify which clusters to be appended, a parameter ''-clusterList'' can be used.
wm_append_clusters.py input_fiber_cluster_folder output_folder -appendedTractName T_AF -tractMRML atlas_folder/T_AF.mrml

or 

wm_append_clusters.py input_fiber_cluster_folder output_folder -appendedTractName T_appended -clusterList 1,4,6,345

Commands to do fiber clustering result quality control

wm_quality_control_tractography.py

  • This is the same quality control script as in the initial data quality control.
  • Randomly choose several (two or more or all) subjects' fiber clustering results and run this script to check if the obtained subject-specific clusters are visually similar to the atlas clusters. This is a sanity check to ensure your study is run correctly.

wm_quality_control_after_clustering.py

  • This step does quality control of fiber clustering results across multiple subjects.
  • This checks that, given the obtained atlas, the total numbers of fiber clusters across subjects are the same.
  • Make sure that the fiber clustering results from all subjects are stored in one folder (directory), with each subdirectory corresponding to one subject.
wm_quality_control_after_clustering.py directory_of_fiber_clusters_from_all_subjects/ qc_output/

Please see instructions for data analysis and visualization here:

https://github.com/SlicerDMRI/whitematteranalysis/wiki/3)-Visualization-of-Clustered-Tracts

https://github.com/SlicerDMRI/whitematteranalysis/wiki/4)-Measurement-from-Clustered-Tracts

Help

  • Test input data to try running the commands can be found within the test directory:

whitematteranalysis/test/test_data

  • The source code of the clustering commands can be found in the bin directory of whitematteranalysis:

whitematteranalysis/bin/

Thank you to Julie Marie Stamm for her help writing these instructions.