Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcrdist draft version implemented #502

Merged
merged 33 commits into from
Apr 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
44f8731
tcrdist draft version implemented
felixpetschko Apr 3, 2024
d6807a3
Merge branch 'main' into tcrdist
grst Apr 3, 2024
3081139
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 3, 2024
3b96e8a
tcrdist tests added
felixpetschko Apr 8, 2024
d2c606d
fixed ir_dist _get_distance_calculator parameter handling
felixpetschko Apr 8, 2024
1d223ec
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 8, 2024
4ba2c8d
handling of empty input sequences fixed
felixpetschko Apr 10, 2024
cf90904
additional tests for tcrdist added
felixpetschko Apr 10, 2024
02a667a
tcrdist test with comparison against reference implementation added
felixpetschko Apr 10, 2024
b63fd1d
Merge branch 'tcrdist' of https://github.com/felixpetschko/scirpy int…
felixpetschko Apr 10, 2024
ab6eb46
formatting of tcrdist tests improved
felixpetschko Apr 10, 2024
a9e46b4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 10, 2024
ba72d41
comments for TCRdist added
felixpetschko Apr 16, 2024
e93834d
Merge branch 'tcrdist' of https://github.com/felixpetschko/scirpy int…
felixpetschko Apr 16, 2024
8e8dfe8
code formatting
felixpetschko Apr 16, 2024
21702cb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2024
9e9950c
Merge branch 'main' into tcrdist
grst Apr 17, 2024
51112d8
handling of default values for cutoff and n_jobs in _get_distance_cal…
felixpetschko Apr 18, 2024
101b411
auto formatting disabled for tcr_dict_distance_matrix
felixpetschko Apr 18, 2024
7ac26eb
added data type hints to functions and adapted function comments
felixpetschko Apr 18, 2024
cc17364
changed testdata import for test cases
felixpetschko Apr 18, 2024
e7fe4eb
changed __init__ and _nb_tcrdist_mat in TCRdistDistanceCalculator to …
felixpetschko Apr 18, 2024
ac803e7
keywords only for _nb_tcrdist_mat removed, because it doesn't work wi…
felixpetschko Apr 18, 2024
4998090
Merge branch 'tcrdist' of https://github.com/felixpetschko/scirpy int…
felixpetschko Apr 18, 2024
3a3d53c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 18, 2024
05817d3
unused control variable changed to _
felixpetschko Apr 18, 2024
f5b9857
creation of numba lookup matrix changed
felixpetschko Apr 19, 2024
801fa83
Merge branch 'tcrdist' of https://github.com/felixpetschko/scirpy int…
felixpetschko Apr 19, 2024
5d12de5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
fcce8e7
Update CHANGELOG
grst Apr 21, 2024
d6e7191
Update docstring
grst Apr 21, 2024
1d83f8b
Update ir_dist docstring
grst Apr 21, 2024
74bfce2
Update description in tutorial
grst Apr 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ and this project adheres to [Semantic Versioning][].
[keep a changelog]: https://keepachangelog.com/en/1.0.0/
[semantic versioning]: https://semver.org/spec/v2.0.0.html

## Unreleased

- Add "TCRdist" as new metric ([#502](https://github.com/scverse/scirpy/pull/502))

## v0.16.1

### Fixes
Expand Down
1 change: 1 addition & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,3 +302,4 @@ distance metrics
ir_dist.metrics.HammingDistanceCalculator
ir_dist.metrics.AlignmentDistanceCalculator
ir_dist.metrics.FastAlignmentDistanceCalculator
ir_dist.metrics.TCRdistDistanceCalculator
20 changes: 13 additions & 7 deletions docs/tutorials/tutorial_3k_tcr.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1049,13 +1049,19 @@
"For instance, a distance of `10` is equivalent to 2 Rs mutating into N.\n",
"This appoach was initially proposed as *TCRdist* by Dash et al. {cite}`TCRdist`.\n",
"\n",
":::{tip}\n",
"You can use `metric=\"fastalignment\"` for a faster calculation at the cost of a few false-negatives (i.e. sequence pairs\n",
"that are actually below the distance cutoff, but are removed during a pre-filtering step). With default parameters, \n",
"the false-negative rate (of all sequence pairs actually below the cutoff) was ~2% on the {func}`scirpy.datasets.wu2020`\n",
"dataset. \n",
"\n",
"See also {class}`scirpy.ir_dist.metrics.FastAlignmentDistanceCalculator`. \n",
":::{admonition} Speeding up TCR distance calculation\n",
":class: tip\n",
"\n",
"Scirpy provides alternative distance metrics that are similar to `\"alignment\"`, but a lot faster: \n",
"\n",
"* `metric=\"tcrdist\"` is an implementation of [tcrdist3](https://github.com/kmayerb/tcrdist3) within scirpy. The scores\n",
" are calculated differently, but it gives very similar results compared to `metric=\"alignment\"`.\n",
" See also {class}`scirpy.ir_dist.metrics.TCRdistDistanceCalculator`.\n",
"* `metric=\"fastalignment\"` uses a heuristic to speed up the `\"alignment\"` metric at the cost of a few false-negatives (i.e. sequence pairs\n",
" that are actually below the distance cutoff, but are removed during a pre-filtering step). With default parameters, \n",
" the false-negative rate (of all sequence pairs actually below the cutoff) was ~2% on the {func}`scirpy.datasets.wu2020`\n",
" dataset. See also {class}`scirpy.ir_dist.metrics.FastAlignmentDistanceCalculator`.\n",
" \n",
":::\n",
"\n",
"All cells with a distance between their CDR3 sequences lower than `cutoff` will be connected in the network.\n"
Expand Down
7 changes: 6 additions & 1 deletion src/scirpy/ir_dist/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ def _get_distance_calculator(metric: MetricType, cutoff: Union[int, None], *, n_
dist_calc = metrics.LevenshteinDistanceCalculator(n_jobs=n_jobs, **kwargs)
elif metric == "hamming":
dist_calc = metrics.HammingDistanceCalculator(n_jobs=n_jobs, **kwargs)
elif metric == "tcrdist":
dist_calc = metrics.TCRdistDistanceCalculator(n_jobs=n_jobs, **kwargs)
else:
raise ValueError("Invalid distance metric.")

Expand All @@ -122,6 +124,7 @@ def _ir_dist(
airr_mod_ref: str = "airr",
airr_key_ref: str = "airr",
chain_idx_key_ref: str = "chain_indices",
**kwargs,
) -> Union[dict, None]:
"""\
Computes a sequence-distance metric between all unique :term:`VJ <Chain locus>`
Expand Down Expand Up @@ -171,6 +174,8 @@ def _ir_dist(
Like `airr_key`, but for `reference`.
chain_idx_key_ref
Like `chain_idx_key`, but for `reference`.
**kwargs
Arguments are passed to the respective :class:`~scirpy.ir_dist.metrics.DistanceCalculator` class.

Returns
-------
Expand Down Expand Up @@ -227,7 +232,7 @@ def _get_unique_seqs(tmp_adata, chain_type):
result[chain_type][tmp_key] = unique_seqs

# compute distance matrices
dist_calc = _get_distance_calculator(metric, cutoff, n_jobs=n_jobs)
dist_calc = _get_distance_calculator(metric, cutoff, n_jobs=n_jobs, **kwargs)
for chain_type in ["VJ", "VDJ"]:
logging.info(f"Computing sequence x sequence distance matrix for {chain_type} sequences.") # type: ignore
result[chain_type]["distances"] = dist_calc.calc_dist_mat(
Expand Down
297 changes: 297 additions & 0 deletions src/scirpy/ir_dist/metrics.py

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Loading
Loading