Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cellrangermulti subworkflow #276
Adding cellrangermulti subworkflow #276
Changes from 71 commits
94397e5
e6a54bb
ae20b9e
5863d2c
631a980
70fff60
f97e430
ff66e97
0af2761
8e7d436
97281aa
8b19f48
6f494fe
215226b
7b86f80
e6dc3b2
7c66115
abb0e3c
5e27a4c
e00a78d
ead6462
ab4425e
2cfb148
4c275b4
a8d2702
0d8be69
4d75c83
e4e37b5
132e247
0ec7019
6c7e550
a0d666b
d5c24c3
9206c59
9764f3a
df89a9b
85136a2
2f77b4d
32e480b
3aeaeb8
c85c6dd
b38ef0f
38060f0
aa5e77a
c8b86e4
cc4f0c0
56123e2
e98f444
98e92bb
0e7efe2
d432d42
b41d89b
912fc73
f8c65ba
a5009f8
53bd304
176060b
04009ff
43f1374
9ff70c6
87abd86
614537e
ceafcdf
31ff8f4
4ed087b
5807fd2
8d396d5
bb8d909
84f781d
f86a8d2
8d30a6f
3b33ded
db13a48
f9e5017
d044f51
c8ac4cf
d81062d
c6e7bfc
319092b
ac9e5b6
a92e697
03e9e82
51c8577
62a396e
58466c9
4634a4a
07c399a
4750ebc
aeaea61
0973363
fa05a9b
a6f46e6
7ef943c
3b90ddd
53f5dc2
b2f434c
524eedb
555ede3
a0f4ebf
92e9d22
39a953c
1841682
3069563
b4d8d45
fc16327
2881f09
e849278
2b68ac8
acd15cf
0d8bb2f
54b1fa6
9f0d8db
118d754
b64fa2d
e97e1c0
37a9bb6
619d921
c51d0f6
81650b1
b4204ff
ae6c561
f6b0f92
f8865e6
5b68077
32a070b
97f22ae
02dc61d
0361375
04492b1
2194739
b660d19
5cdb691
2b79067
e8e14de
714db1b
87f38d0
273ddd9
aa1733f
7da5644
889353c
3aa7278
844fbc3
9a3e529
f842cba
adfda0f
0fce1c8
dc63ae3
9bff0b5
92502c9
c22ad6f
c95b11c
f4304ad
03a38cd
e22f986
0d0275e
882812c
708c903
b3afdb7
d497ca8
4d9f17e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go to the
test-datasets
repo eventually (together with the sample sheet)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this correspond to the samplesheet above? Don't the
sample_id
s have to match?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grst ,
I cannot give much information. This is something I got from the
cellranger/multi
module testings and all, and there I think it was already different (so I did not change).Hi @klkeys, Could you shed some light on this?
Should indeed we make
sample_id
the same of the one in thesamplesheet
(which I agree with @grst is the logical thinking) or this is indeed something different?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I think I answered it below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that physical sample IDs need to match the CMO IDs, see the CMO test data for the
cellranger multi
modulematching physical and CMO IDs does sense if you have exactly one CMO sample per physical sample, as in @grst's explanation in #276 (comment)
note that the CMO Feature Reference CSV only needs controls, and it requires all controls from the same CMO sample as one line of the
cellranger multi
config (doc reference)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also relevant to #276 (comment)
I haven't run CMO myself, but I don't think that you tag multiple samples with the same CMO
your CMO files should be this:
proposed CMO samplesheet
sample 1 CMO config
sample 2 CMO config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, from reading the code the logic seems to make sense to me.
I only have a multiplexed FFPE dataset I could test on, I'll try to do so this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fixing some stuff (which I will update when done) then you can test it afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grst ,
I have added the proposed changes and other fixes that I found were around:
Code as a module
First of all, as suggested here I have modified the parsing code from Groovy to a Module so we face no problems with AWS neither with
caching
feature.The module is a python script that checks the consistency of the additional samplesheet and, based on the
cmo / frna
related columns, it splits the samplesheet having one for each sample.Adding module splitted samplesheets to workflow
Overall, the code above is simple. The tricky part was adding it to the context of the code again. Here, I took advantage of the "FIFO" rules and using the "GEX" files channel as a base to re-connect and order the split
cmo / frna
samplesheets generated so that they are used, in the same order (the correct samples) in theCELLRANGER_MULTI
module.This is done in this chunk:
https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/subworkflows/local/align_cellrangermulti.nf#L64-L105
Parsing the generated results for
MTX_CONVERSION
Then, I had to add a parsing for the generated results, in order to be able to convert the generated data to
.h5ad / .Rds
. Cellranger multi, outputs the filtered results in a special folder calledper_sample_outs
so that, when you have multiple samples demultiplexed by the barcodes given, they will be each in a subdir there.As such, when doing
cellranger/multi
, we will be converting theraw_matrices
( not per sample ) and thefiltered_matrices
(per sample).The code related to it is here:
https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/subworkflows/local/align_cellrangermulti.nf#L181-L228
The final part of it is just the "standard" splitting raw / filtered as we do for normal cellranger as well.
Custom Emptydrops
Currently, emptydrops will not be performed for
cellranger/multi
as I am not sure if it is relevant.https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/workflows/scrnaseq.nf#L278
Samplesheet parsing
Because the cellranger multi subworkflow can receive data from multiple feature types for the same sample, but we must preserve the features type per sample information in different channels so they can be properly parsed here to create the correct channels expected modules.
So, I had to adapt the samplesheet parsing here: https://github.com/nf-core/scrnaseq/blob/247-support-for-10x-ffpe-scrna/subworkflows/local/utils_nfcore_scrnaseq_pipeline/main.nf#L83-L124
nf-tests and documentation
Nf-tests and documentation need to be worked on. But I will only start it once we resolve the code.
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it can be tested with your data so you can also see how the outputs look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! I'll try it on Monday, have a nice weekend!
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.