-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating hdbscan-clustering plugin (#498)
* fix merge conflicts * fix apply manifest * fix apply manifest * remove file * updated hdbscan-clustering-plugin * fix bug in tests * fixed random generation of floats * fixed docker file and shell script for running docker * fixed docker files * renamed plugin and fixed merged conflicts * fixed docker files
- Loading branch information
1 parent
014f4dd
commit a6bfd1d
Showing
17 changed files
with
804 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
[bumpversion] | ||
current_version = 0.4.8-dev0 | ||
commit = True | ||
tag = False | ||
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))? | ||
serialize = | ||
{major}.{minor}.{patch}-{release}{dev} | ||
{major}.{minor}.{patch} | ||
|
||
[bumpversion:part:release] | ||
optional_value = _ | ||
first_value = dev | ||
values = | ||
dev | ||
_ | ||
|
||
[bumpversion:part:dev] | ||
|
||
[bumpversion:file:pyproject.toml] | ||
search = version = "{current_version}" | ||
replace = version = "{new_version}" | ||
|
||
[bumpversion:file:plugin.json] | ||
|
||
[bumpversion:file:VERSION] | ||
|
||
[bumpversion:file:src/polus/images/clustering/hdbscan_clustering/__init__.py] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
poetry.lock | ||
../../poetry.lock | ||
# Environments | ||
.env | ||
.myenv | ||
.venv | ||
env/ | ||
venv/ | ||
# test data directory | ||
data | ||
# yaml file | ||
.pre-commit-config.yaml | ||
# hidden files | ||
.DS_Store | ||
.ds_store | ||
# flake8 | ||
.flake8 | ||
../../.flake8 | ||
__pycache__ | ||
.mypy_cache | ||
requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
FROM polusai/bfio:2.3.6 | ||
|
||
# environment variables defined in polusai/bfio | ||
ENV EXEC_DIR="/opt/executables" | ||
ENV POLUS_LOG="INFO" | ||
ENV POLUS_IMG_EXT=".ome.tif" | ||
ENV POLUS_TAB_EXT=".csv" | ||
|
||
# Work directory defined in the base container | ||
WORKDIR ${EXEC_DIR} | ||
|
||
COPY pyproject.toml ${EXEC_DIR} | ||
COPY VERSION ${EXEC_DIR} | ||
COPY README.md ${EXEC_DIR} | ||
COPY src ${EXEC_DIR}/src | ||
|
||
RUN pip3 install ${EXEC_DIR} --no-cache-dir | ||
|
||
|
||
ENTRYPOINT ["python3", "-m", "polus.images.clustering.hdbscan_clustering"] | ||
CMD ["--help"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Hierarchical Density-Based Spatial Clustering of Applications with Noise(HDBSCAN) Clustering (v0.4.8-dev0) | ||
|
||
The HDBSCAN Clustering plugin clusters the data using [HDBSCAN clustering](https://pypi.org/project/hdbscan/) library. The input and output for this plugin is a CSV file. Each observation (row) in the input CSV file is assigned to one of the clusters. The output CSV file contains the column `cluster` that identifies the cluster to which each observation belongs. A user can supply a regular expression with capture groups if they wish to cluster each group independently, or if they wish to average the numerical features across each group and treat them as a single observation. | ||
|
||
## Inputs: | ||
|
||
### Input directory: | ||
This plugin supports the all [vaex](https://vaex.readthedocs.io/en/latest/guides/io.html) supported file formats. | ||
|
||
### Filename pattern: | ||
This plugin uses [filepattern](https://filepattern2.readthedocs.io/en/latest/Home.html) python library to parse file names of tabular files to be processed by this plugin. | ||
|
||
### Grouping pattern: | ||
The input for this parameter is a regular expression with capture group. This input splits the data into groups based on the matched pattern. A new column `group` is created in the output file that has the group based on the given pattern. Unless `averageGroups` is set to `true`, providing a grouping pattern will cluster each group independently. | ||
|
||
### Average groups: | ||
`groupingPattern` to average the numerical features and produce a single row per group which is then clustered. The resulting cluster is assigned to all observations belonging in that group. | ||
|
||
### Label column: | ||
This is the name of the column containing the labels to be used with `groupingPattern`. | ||
|
||
### Minimum cluster size: | ||
This parameter defines the smallest number of points that should be considered as cluster. This is a required parameter. The input should be an integer and the value should be greater than 1. | ||
|
||
### Increment outlier ID: | ||
This parameter sets the ID of the outlier cluster to `1`, otherwise it will be 0. This is useful for visualization purposes if the resulting cluster IDs are turned into image annotations. | ||
|
||
## Output: | ||
The output is a tabular file containing the clustered data. | ||
|
||
## Building | ||
To build the Docker image for the conversion plugin, run | ||
`./build-docker.sh`. | ||
|
||
## Install WIPP Plugin | ||
If WIPP is running, navigate to the plugins page and add a new plugin. Paste the contents of `plugin.json` into the pop-up window and submit. | ||
For more information on WIPP, visit the [official WIPP page](https://isg.nist.gov/deepzoomweb/software/wipp). | ||
|
||
## Options | ||
|
||
This plugin takes four input arguments and one output argument: | ||
|
||
| Name | Description | I/O | Type | | ||
| ---------------------- | ---------------------------------------------------------------------------------------------- | ------ | ------------- | | ||
| `--inpDir` | Input tabular data files. | Input | genericData | | ||
| `--groupingPattern` | Regular expression to group rows. Clustering will be applied across capture groups by default. | Input | string | | ||
| `--averageGroups` | Average data across groups. Requires capture groups | Input | boolean | | ||
| `--labelCol` | Name of the column containing labels for grouping pattern. | Input | string | | ||
| `--minClusterSize` | Minimum cluster size. | Input | number | | ||
| `--incrementOutlierId` | Increments outlier ID to 1. | Input | boolean | | ||
| `--outDir` | Output collection | Output | genericData | | ||
| `--preview` | Generate a JSON file with outputs | Output | JSON | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
0.4.8-dev0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/bin/bash | ||
|
||
version=$(<VERSION) | ||
docker build . -t polusai/hdbscan-clustering-tool:${version} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# This script is designed to help package a new version of a plugin | ||
|
||
# Get the new version | ||
version=$(<VERSION) | ||
|
||
# Bump the version | ||
bump2version --config-file bumpversion.cfg --new-version ${version} --allow-dirty part | ||
|
||
# Build the container | ||
./build-docker.sh | ||
|
||
# Push to dockerhub | ||
docker push polusai/hdbscan-clustering-tool:${version} | ||
|
||
# Run pytests | ||
python -m pytest -s tests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
{ | ||
"name": "Hdbscan Clustering", | ||
"version": "0.4.8-dev0", | ||
"title": "Hdbscan Clustering", | ||
"description": "Cluster the data using HDBSCAN.", | ||
"author": "Jayapriya Nagarajan (github.com/Priyaaxle), Hythem Sidky (hythem.sidky@nih.gov) and Hamdah Shafqat Abbasi (hamdahshafqat.abbasi@nih.gov)", | ||
"institution": "National Center for Advancing Translational Sciences, National Institutes of Health", | ||
"repository": "https://github.com/PolusAI/image-tools", | ||
"website": "https://ncats.nih.gov/preclinical/core/informatics", | ||
"citation": "", | ||
"containerId": "polusai/hdbscan-clustering-tool:0.4.8-dev0", | ||
"baseCommand": [ | ||
"python3", | ||
"-m", | ||
"polus.images.clustering.hdbscan_clustering" | ||
], | ||
"inputs": { | ||
"inpDir": { | ||
"type": "genericData", | ||
"title": "Input tabular data", | ||
"description": "Input tabular data.", | ||
"required": "True" | ||
}, | ||
"filePattern": { | ||
"type": "string", | ||
"title": "Filename pattern", | ||
"description": "Filename pattern used to separate data.", | ||
"required": "False" | ||
}, | ||
"groupingPattern": { | ||
"type": "string", | ||
"title": "Grouping pattern", | ||
"description": "Regular expression for optional row grouping.", | ||
"required": "False" | ||
}, | ||
"averageGroups": { | ||
"type": "boolean", | ||
"title": "Average groups", | ||
"description": "Whether to average data across groups. Requires grouping pattern to be defined.", | ||
"required": "False" | ||
}, | ||
"labelCol": { | ||
"type": "string", | ||
"title": "Label Column", | ||
"description": "Name of column containing labels. Required for grouping pattern.", | ||
"required": "False" | ||
}, | ||
"minClusterSize": { | ||
"type": "number", | ||
"title": "Minimum cluster size", | ||
"description": "Minimum cluster size.", | ||
"required": "True" | ||
}, | ||
"incrementOutlierId": { | ||
"type": "number", | ||
"title": "Increment Outlier ID", | ||
"description": "Increments outlier ID to 1.", | ||
"required": "True" | ||
}, | ||
"preview": { | ||
"type": "boolean", | ||
"title": "Preview", | ||
"description": "Generate an output preview.", | ||
"required": "False" | ||
} | ||
}, | ||
"outputs": { | ||
"outDir": { | ||
"type": "genericData", | ||
"description": "Output collection." | ||
} | ||
}, | ||
"ui": { | ||
"inpDir": { | ||
"type": "genericData", | ||
"title": "Input tabular data", | ||
"description": "Input tabular data to be processed by this plugin.", | ||
"required": "True" | ||
}, | ||
"filePattern": { | ||
"type": "string", | ||
"title": "Filename pattern", | ||
"description": "Filename pattern used to separate data.", | ||
"required": "False" | ||
}, | ||
"groupingPattern": { | ||
"type": "string", | ||
"title": "Grouping pattern", | ||
"description": "Regular expression for optional row grouping.", | ||
"required": "False" | ||
}, | ||
"averageGroups": { | ||
"type": "boolean", | ||
"title": "Average groups", | ||
"description": "Whether to average data across groups. Requires grouping pattern to be defined.", | ||
"required": "False" | ||
}, | ||
"labelCol": { | ||
"type": "string", | ||
"title": "Label Column", | ||
"description": "Name of column containing labels. Required for grouping pattern.", | ||
"required": "False" | ||
}, | ||
"minClusterSize": { | ||
"type": "number", | ||
"title": "Minimum cluster size", | ||
"description": "Minimum cluster size.", | ||
"required": "True" | ||
}, | ||
"incrementOutlierId": { | ||
"type": "number", | ||
"title": "Increment Outlier ID", | ||
"description": "Increments outlier ID to 1.", | ||
"required": "True" | ||
}, | ||
"preview": { | ||
"type": "boolean", | ||
"title": "Preview", | ||
"description": "Generate an output preview.", | ||
"required": "False" | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
[tool.poetry] | ||
name = "polus-images-clustering-hdbscan-clustering" | ||
version = "0.4.8-dev0" | ||
description = "Cluster the data using HDBSCAN." | ||
authors = [ | ||
"Jayapriya Nagarajan <jayapriya.nagarajan@axleinfo.com>", | ||
"Hythem Sidky <hythem.sidky@nih.gov>", | ||
"Hamdah Shafqat abbasi <hamdahshafqat.abbasi@nih.gov>" | ||
] | ||
readme = "README.md" | ||
packages = [{include = "polus", from = "src"}] | ||
|
||
[tool.poetry.dependencies] | ||
python = ">=3.9,<3.12" | ||
filepattern = "^2.0.4" | ||
typer = "^0.7.0" | ||
tqdm = "^4.64.1" | ||
preadator="0.4.0.dev2" | ||
vaex = "^4.17.0" | ||
hdbscan = "^0.8.34rc1" | ||
|
||
|
||
[tool.poetry.group.dev.dependencies] | ||
pre-commit = "^3.3.3" | ||
bump2version = "^1.0.1" | ||
pytest = "^7.3.2" | ||
pytest-xdist = "^3.3.1" | ||
pytest-sugar = "^0.9.7" | ||
|
||
[build-system] | ||
requires = ["poetry-core"] | ||
build-backend = "poetry.core.masonry.api" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
|
||
version=$(<VERSION) | ||
datapath=$(readlink --canonicalize data) | ||
echo ${datapath} | ||
|
||
# Inputs | ||
inpDir=${datapath}/input | ||
filePattern=".*.csv" | ||
groupingPattern="\w+$" | ||
labelCol="species" | ||
minClusterSize=3 | ||
outDir=${datapath}/output | ||
|
||
docker run -v ${datapath}:${datapath} \ | ||
polusai/hdbscan-clustering-plugin:${version} \ | ||
--inpDir ${inpDir} \ | ||
--filePattern ${filePattern} \ | ||
--groupingPattern ${groupingPattern} \ | ||
--labelCol ${labelCol} \ | ||
--minClusterSize ${minClusterSize} \ | ||
--incrementOutlierId \ | ||
--outDir ${outDir} |
4 changes: 4 additions & 0 deletions
4
...tering/hdbscan-clustering-tool/src/polus/images/clustering/hdbscan_clustering/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
"""Hdbscan Clustering Plugin.""" | ||
|
||
__version__ = "0.4.8-dev0" | ||
from . import hdbscan_clustering |
Oops, something went wrong.