diff --git a/.editorconfig b/.editorconfig index 72dda289..a61f2e8b 100644 --- a/.editorconfig +++ b/.editorconfig @@ -31,3 +31,9 @@ indent_size = unset # ignore python and markdown [*.{py,md}] indent_style = unset + +[/docs/*.xml] +indent_style = unset + +[/docs/images/metro/*.xml] +indent_style = unset diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d17dbeaa..03f6bcdc 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,5 +1,6 @@ name: nf-core CI # This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors + on: push: branches: @@ -26,6 +27,11 @@ jobs: NXF_VER: - "23.04.0" - "latest-everything" + TEST_PROFILE: + - "test" + - "test_sim" + - "test_quilt" + - "test_stitch" steps: - name: Check out pipeline code uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 @@ -43,4 +49,4 @@ jobs: # For example: adding multiple test runs with different parameters # Remember that you can parallelise this by using strategy.matrix run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile "${{ matrix.TEST_PROFILE }}",docker --outdir ./results diff --git a/.gitignore b/.gitignore index 5124c9ac..57ffc906 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,5 @@ results/ testing/ testing* *.pyc +*.code-workspace +.nf-test* diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ef061e9..e5b7e078 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,8 +9,23 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co ### `Added` +### `Changed` + +- [#18](https://github.com/nf-core/phaseimpute/pull/18) + - Maps and region by chromosome + - update tests config files + - correct meta map propagation + - Test impute and test sim works +- [#19](https://github.com/nf-core/phaseimpute/pull/19) - Changed reference panel to accept a csv, update modules and subworkflows (glimpse1/2 and shapeit5) +- [#20](https://github.com/nf-core/phaseimpute/pull/20) - Added automatic detection of vcf contigs for the reference panel and automatic renaming available +- [#22](https://github.com/nf-core/phaseimpute/pull/20) - Add validation step for concordance analysis. Input channels changed to match inputs steps. Outdir folder organised by steps. Modules config by subworkflows. +- [#26](https://github.com/nf-core/phaseimpute/pull/26) - Added QUILT method + ### `Fixed` +- [#15](https://github.com/nf-core/phaseimpute/pull/15) - Changed test csv files to point to nf-core repository +- [#16](https://github.com/nf-core/phaseimpute/pull/16) - Removed outdir from test config files + ### `Dependencies` ### `Deprecated` diff --git a/CITATIONS.md b/CITATIONS.md index 31f66a91..10c0d290 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,9 +10,21 @@ ## Pipeline tools -- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) +- [QUILT](https://pubmed.ncbi.nlm.nih.gov/34083788/) - > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. + > Davies, R. W., Kucka, M., Su, D., Shi, S., Flanagan, M., Cunniff, C. M., ... & Myers, S. (2021). Rapid genotype imputation from sequence with reference panels. Nature genetics, 53(7), 1104-1111. + +- [GLIMPSE](https://www.nature.com/articles/s41588-020-00756-0) + + > Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J., & Delaneau, O. (2021). Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nature Genetics, 53(1), 120-126. + +- [Shapeit](https://odelaneau.github.io/shapeit5/) + + > Hofmeister RJ, Ribeiro DM, Rubinacci S., Delaneau O. (2023). Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nature Genetics doi: https://doi.org/10.1038/s41588-023-01415-w + +- [bcftools](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198575/) + + > Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987-2993. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) diff --git a/README.md b/README.md index 2235851b..82279608 100644 --- a/README.md +++ b/README.md @@ -19,50 +19,43 @@ ## Introduction -**nf-core/phaseimpute** is a bioinformatics pipeline that ... +**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. Different steps are available each corresponding to a dedicated modes. - +### Main steps of the pipeline - - +The **phaseimpute** pipeline is constituted of 5 main steps: -1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) -2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) +| Metro map | Modes | +| ---------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| metromap | - **Pre-processing**: Phasing, QC, variant filtering, variant annotation of the reference panel
- **Phase**: Phasing of the target dataset on the reference panel
- **Simulate**: Simulation of the target dataset from high quality target data
- **Concordance**: Concordance between the target dataset and a truth dataset
- **Post-processing**: Variant filtering based on their imputation quality | ## Usage > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. - +Each row represents a bam file with its index file. Now, you can run the pipeline using: - - ```bash nextflow run nf-core/phaseimpute \ -profile \ --input samplesheet.csv \ + --genome "GRCh38" \ + --panel \ + --steps "impute" \ + --tools "glimpse1" \ --outdir ``` @@ -72,6 +65,19 @@ nextflow run nf-core/phaseimpute \ For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/phaseimpute/usage) and the [parameter documentation](https://nf-co.re/phaseimpute/parameters). +## Description of the different mode of the pipeline + +Here is a short description of the different mode of the pipeline. +For more information please refer to the [documentation](https://nf-core.github.io/phaseimpute/usage/). + +| Mode | Flow chart | Description | +| ------------------ | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Preprocessing** | phase_metro | The preprocessing mode is responsible to the preparation of the multiple input file that will be used by the phasing process.
The main processes are :
- **Haplotypes phasing** of the reference panel using [**Shapeit5**](https://odelaneau.github.io/shapeit5/).
- **Filter** the reference panel to select only the necessary variants.
- **Chunking the reference panel** in a subset of region for all the chromosomes.
- **Extract** the positions where to perform the imputation. | +| **Phasing** | phase_metro | The phasing mode is the core mode of this pipeline.
It is constituted of 3 main steps:
- **Phasing**: Phasing of the target dataset on the reference panel using either:
  - [**Glimpse1**](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html)
  It's come with the necessety to compute the genotype likelihoods of the target dataset.
  This step is done using [BCFTOOLS_mpileup](https://samtools.github.io/bcftools/bcftools.html#mpileup)
  - [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) For this step the reference panel is transformed to binary chunks.
  - [**Stitch**](https://github.com/rwdavies/stitch)
  - [**Quilt**](https://github.com/rwdavies/QUILT)
- **Ligation**: all the different chunks are merged together.
- **Sampling** (optional) | +| **Simulate** | simulate_metro | The simulation mode is used to create artificial low informative genetic information from high density data. This allow to compare the imputed result to a _truth_ and therefore evaluate the quality of the imputation.
For the moment it is possible to simulate:
- Low-pass data by **downsample** BAM or CRAM using [SAMTOOLS_view -s]() at different depth
- Genotype data by **SNP selecting** the position used by a designated SNP chip.
The simulation mode will also compute the **Genotype likelihoods** of the high density data. | +| **Concordance** | concordance_metro | This mode compare two vcf together to compute a summary of the differences between them.
To do so it use either:
- [**Glimpse1**](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html) concordance process.
- [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) concordance process
- Or convert the two vcf fill to `.zarr` using [**Scikit allele**](https://scikit-allel.readthedocs.io/en/stable/) and [**anndata**](https://anndata.readthedocs.io/en/latest/) before comparing the SNPs. | +| **Postprocessing** | postprocessing_metro | This final process unable to loop the whole pipeline for increasing the performance of the imputation. To do so it filter out the best imputed position and rerun the analysis using this positions. | + ## Pipeline output To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/phaseimpute/results) tab on the nf-core website pipeline page. @@ -80,16 +86,20 @@ For more details about the output files and reports, please refer to the ## Credits -nf-core/phaseimpute was originally written by LouisLeNezet. +nf-core/phaseimpute was originally written by Louis Le Nézet. We thank the following people for their extensive assistance in the development of this pipeline: - +- Anabella Trigila +- Saul Pierotti +- Eugenia Fontecha +- Matias Romero Victorica ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). +For further information or help, don't hesitate to get in touch on the [Slack `#phaseimpute` channel](https://nfcore.slack.com/channels/phaseimpute) (you can join with [this invite](https://nf-co.re/join/slack)). For further information or help, don't hesitate to get in touch on the [Slack `#phaseimpute` channel](https://nfcore.slack.com/channels/phaseimpute) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citations @@ -99,6 +109,14 @@ For further information or help, don't hesitate to get in touch on the [Slack `# +You can cite one of the main imputation methods ([`QUILT`](https://github.com/rwdavies/QUILT)) as follows: + +> **Rapid genotype imputation from sequence with reference panels.** +> +> Davies, R. W., Kucka, M., Su, D., Shi, S., Flanagan, M., Cunniff, C. M., Chan, Y. F., & Myers, S. +> +> _Nature genetics_ 2021 June 03. doi: [10.1038/s41588-021-00877-0](https://doi.org/10.1038/s41588-021-00877-0) + An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: diff --git a/assets/chr_rename_add.txt b/assets/chr_rename_add.txt new file mode 100644 index 00000000..c48a2640 --- /dev/null +++ b/assets/chr_rename_add.txt @@ -0,0 +1,39 @@ +1 chr1 +2 chr2 +3 chr3 +4 chr4 +5 chr5 +6 chr6 +7 chr7 +8 chr8 +9 chr9 +10 chr10 +11 chr11 +12 chr12 +13 chr13 +14 chr14 +15 chr15 +16 chr16 +17 chr17 +18 chr18 +19 chr19 +20 chr20 +21 chr21 +22 chr22 +23 chr23 +24 chr24 +25 chr25 +26 chr26 +27 chr27 +28 chr28 +29 chr29 +30 chr30 +31 chr31 +32 chr32 +33 chr33 +34 chr34 +35 chr35 +36 chr36 +37 chr37 +38 chr38 +X chrX diff --git a/assets/chr_rename_del.txt b/assets/chr_rename_del.txt new file mode 100644 index 00000000..a85016b6 --- /dev/null +++ b/assets/chr_rename_del.txt @@ -0,0 +1,39 @@ +chr1 1 +chr2 2 +chr3 3 +chr4 4 +chr5 5 +chr6 6 +chr7 7 +chr8 8 +chr9 9 +chr10 10 +chr11 11 +chr12 12 +chr13 13 +chr14 14 +chr15 15 +chr16 16 +chr17 17 +chr18 18 +chr19 19 +chr20 20 +chr21 21 +chr22 22 +chr23 23 +chr24 24 +chr25 25 +chr26 26 +chr27 27 +chr28 28 +chr29 29 +chr30 30 +chr31 31 +chr32 32 +chr33 33 +chr34 34 +chr35 35 +chr36 36 +chr37 37 +chr38 38 +chr39 X diff --git a/assets/panel.csv b/assets/panel.csv new file mode 100644 index 00000000..7286169e --- /dev/null +++ b/assets/panel.csv @@ -0,0 +1,3 @@ +panel,chr,vcf,index +1000GP,chr21,1000GP_21.vcf,1000GP_21.vcf.csi +1000GP,chr22,1000GP_22.vcf,1000GP_22.vcf.csi diff --git a/assets/regionsheet.csv b/assets/regionsheet.csv new file mode 100644 index 00000000..030c9ba1 --- /dev/null +++ b/assets/regionsheet.csv @@ -0,0 +1,2 @@ +chr,start,end +20,20000000,2200000 diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv index 5f653ab7..217ef7c3 100644 --- a/assets/samplesheet.csv +++ b/assets/samplesheet.csv @@ -1,3 +1,3 @@ -sample,fastq_1,fastq_2 -SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz -SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz, +sample,bam,bai +1_BAM_1X,/path/to/.bam,/path/to/.bai +1_BAM_SNP,/path/to/.bam,/path/to/.bai diff --git a/assets/schema_input.json b/assets/schema_input.json index 6a7e788c..971c3fb3 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,7 +1,7 @@ { "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json", - "title": "nf-core/phaseimpute pipeline - params.input schema", + "title": "nf-core/phaseimpute pipeline - params.input", "description": "Schema for the file provided with params.input", "type": "array", "items": { @@ -13,21 +13,17 @@ "errorMessage": "Sample name must be provided and cannot contain spaces", "meta": ["id"] }, - "fastq_1": { + "file": { "type": "string", - "format": "file-path", - "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.(bam)|((vcf|bcf)(\\.gz))?$", + "errorMessage": "BAM, VCF or BCF file must be provided, cannot contain spaces and must have extension '.bam' or '.vcf', '.bcf' with optional '.gz' extension" }, - "fastq_2": { + "index": { + "errorMessage": "Input file index must be provided, cannot contain spaces and must have extension '.bai', '.tbi' or '.csi'", "type": "string", - "format": "file-path", - "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.(bai|tbi|csi)$" } }, - "required": ["sample", "fastq_1"] + "required": ["sample", "file", "index"] } } diff --git a/assets/schema_input_panel.json b/assets/schema_input_panel.json new file mode 100644 index 00000000..242a4136 --- /dev/null +++ b/assets/schema_input_panel.json @@ -0,0 +1,35 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input_panel.json", + "title": "nf-core/phaseimpute pipeline - params.panel schema", + "description": "Schema for the file provided with params.panel", + "type": "array", + "items": { + "type": "object", + "properties": { + "panel": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Panel name must be provided as a string and cannot contain spaces", + "meta": ["id"] + }, + "chr": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Chromosome must be provided as a string and cannot contain spaces", + "meta": ["chr"] + }, + "vcf": { + "type": "string", + "pattern": "^\\S+\\.(vcf|bcf)(.gz)?$", + "errorMessage": "Panel file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with optional '.gz' extension" + }, + "index": { + "type": "string", + "pattern": "^\\S+\\.(vcf|bcf)(\\.gz)?\\.(tbi|csi)$", + "errorMessage": "Panel index file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with optional '.gz' extension and with '.csi' or '.tbi' extension" + } + }, + "required": ["panel", "chr", "vcf", "index"] + } +} diff --git a/assets/schema_input_region.json b/assets/schema_input_region.json new file mode 100644 index 00000000..5592aea9 --- /dev/null +++ b/assets/schema_input_region.json @@ -0,0 +1,28 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input_region.json", + "title": "nf-core/phaseimpute pipeline - params.input_region schema", + "description": "Schema for the file provided with params.input_region", + "type": "array", + "items": { + "type": "object", + "properties": { + "chr": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Chromosome name must be provided as a string and cannot contain spaces" + }, + "start": { + "type": "integer", + "pattern": "^\\d+$", + "errorMessage": "Region start name must be provided, cannot contain spaces and must be numeric" + }, + "end": { + "type": "integer", + "pattern": "^\\d+$", + "errorMessage": "Region end name must be provided, cannot contain spaces and must be numeric" + } + }, + "required": ["chr", "start", "end"] + } +} diff --git a/assets/schema_map.json b/assets/schema_map.json new file mode 100644 index 00000000..4b981006 --- /dev/null +++ b/assets/schema_map.json @@ -0,0 +1,24 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_map.json", + "title": "nf-core/phaseimpute pipeline - params.map schema", + "description": "Schema for the file provided with params.map", + "type": "array", + "items": { + "type": "object", + "properties": { + "chr": { + "type": "string", + "pattern": "^(chr)?[0-9]+$", + "errorMessage": "Chromosome must be provided and must be a string containing only numbers, with or without the prefix 'chr'", + "meta": ["chr"] + }, + "map": { + "type": "string", + "pattern": "^\\S+\\.(g)?map(\\.gz)?$", + "errorMessage": "Map file must be provided, cannot contain spaces and must have extension '.map' or '.gmap' with optional 'gz' extension" + } + }, + "required": ["chr", "map"] + } +} diff --git a/assets/schema_posfile.json b/assets/schema_posfile.json new file mode 100644 index 00000000..5a3c9d5a --- /dev/null +++ b/assets/schema_posfile.json @@ -0,0 +1,24 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema", + "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_posfile.json", + "title": "nf-core/phaseimpute pipeline - params.posfile schema", + "description": "Schema for the file provided with params.posfile", + "type": "array", + "items": { + "type": "object", + "properties": { + "chr": { + "type": "string", + "pattern": "^\\S+$", + "errorMessage": "Chromosome name must be provided as a string and cannot contain spaces", + "meta": ["chr"] + }, + "file": { + "type": "string", + "pattern": "^\\S+\\.txt$", + "errorMessage": "Posfile per chromosome must be provided. Must have .txt extension" + } + }, + "required": ["chr", "file"] + } +} diff --git a/conf/modules.config b/conf/modules.config index d203d2b6..f212b77c 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -18,10 +18,6 @@ process { saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] - withName: FASTQC { - ext.args = '--quiet' - } - withName: 'MULTIQC' { ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' } publishDir = [ @@ -30,5 +26,4 @@ process { saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } - } diff --git a/conf/steps/imputation.config b/conf/steps/imputation.config new file mode 100644 index 00000000..7e859eca --- /dev/null +++ b/conf/steps/imputation.config @@ -0,0 +1,33 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_IMPUT:.*' { + publishDir = [ + path: { "${params.outdir}/imputation/concat" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + ext.prefix = { "${meta.id}_impute_concat" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_IMPUT:BCFTOOLS_CONCAT' { + ext.args = {[ + "--ligate", + "--output-type z", + ].join(" ").trim()} + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_IMPUT:BCFTOOLS_INDEX' { + ext.args = "--tbi" + } +} diff --git a/conf/steps/imputation_glimpse1.config b/conf/steps/imputation_glimpse1.config new file mode 100644 index 00000000..521c8beb --- /dev/null +++ b/conf/steps/imputation_glimpse1.config @@ -0,0 +1,106 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + // Configuration for the glimpse1 imputation subworkflow + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_INPUT:.*' { + publishDir = [ + path: { "${params.outdir}/imputation/glimpse1/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + enabled: false + ] + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_INPUT:BCFTOOLS_MPILEUP' { + ext.args = [ + "-I", + "-E", + "-a 'FORMAT/DP'" + ].join(' ') + ext.args2 = [ + "-Aim", + "-C alleles" + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.call" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_INPUT:BCFTOOLS_ANNOTATE' { + ext.args = "--set-id '%CHROM:%POS:%REF:%ALT' -Oz" + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.annotate" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_INPUT:BCFTOOLS_INDEX' { + ext.args = "--tbi" + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:.*' { + publishDir = [ + path: { "${params.outdir}/imputation/glimpse1/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:GLIMPSE_CHUNK' { + ext.args = [ + "--window-size 200000", + "--buffer-size 20000" + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.chunk" } + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:GLIMPSE_PHASE' { + ext.args = [ + "--impute-reference-only-variants" + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.phase" } + ext.suffix = "bcf" + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:INDEX_PHASE' { + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:GLIMPSE_LIGATE' { + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.ligate" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_IMPUTE_GLIMPSE1:INDEX_LIGATE' { + publishDir = [ + path: { "${params.outdir}/imputation/glimpse1" } + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_GLIMPSE1:.*' { + publishDir = [ + [ + path: { "${params.outdir}/imputation/glimpse1/concat" }, + mode: params.publish_dir_mode, + ], + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_GLIMPSE1:BCFTOOLS_CONCAT' { + ext.args = {[ + "--ligate", + "--output-type z", + ].join(" ").trim()} + ext.prefix = { "${meta.id}_glimpse1" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_GLIMPSE1:BCFTOOLS_INDEX' { + ext.args = "--tbi" + ext.prefix = { "${meta.id}_glimpse1" } + } +} diff --git a/conf/steps/imputation_quilt.config b/conf/steps/imputation_quilt.config new file mode 100644 index 00000000..fbb8dcc4 --- /dev/null +++ b/conf/steps/imputation_quilt.config @@ -0,0 +1,92 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + + withName: CUSTOM_DUMPSOFTWAREVERSIONS { + publishDir = [ + path: { "${params.outdir}/pipeline_info" }, + mode: params.publish_dir_mode, + pattern: '*_versions.yml' + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:.*' { + + ext.prefix = { "${meta.id}_${meta.chr}" } + + publishDir = [ + [ + path: { "${params.outdir}/imputation/quilt/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}_chunk" }, + mode: params.publish_dir_mode, + enabled: false + ], + + + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK' { + ext.prefix = { "${meta.id}_${meta.chr}" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:IMPUTE_QUILT:.*' { + publishDir = [ + [ + path: { "${params.outdir}/imputation/quilt/" }, + mode: params.publish_dir_mode, + ], + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:IMPUTE_QUILT:QUILT_QUILT' { + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.impute" } + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:IMPUTE_QUILT:BCFTOOLS_INDEX_1' { + ext.args = "--tbi" + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:IMPUTE_QUILT:BCFTOOLS_ANNOTATE' { + ext.args = "--set-id '%CHROM:%POS:%REF:%ALT' -Oz" + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.impute.annotate" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:IMPUTE_QUILT:BCFTOOLS_INDEX_2' { + ext.args = "--tbi" + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT:.*' { + publishDir = [ + [ + path: { "${params.outdir}/imputation/quilt/concat" }, + mode: params.publish_dir_mode, + ], + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT:BCFTOOLS_CONCAT' { + ext.args = {[ + "--ligate", + "--output-type z", + ].join(" ").trim()} + ext.prefix = { "${meta.id}_quilt" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT:BCFTOOLS_INDEX' { + ext.args = "--tbi" + ext.prefix = { "${meta.id}_quilt" } + } + +} diff --git a/conf/steps/imputation_stitch.config b/conf/steps/imputation_stitch.config new file mode 100644 index 00000000..920255e0 --- /dev/null +++ b/conf/steps/imputation_stitch.config @@ -0,0 +1,111 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + + withName: CUSTOM_DUMPSOFTWAREVERSIONS { + publishDir = [ + path: { "${params.outdir}/pipeline_info" }, + mode: params.publish_dir_mode, + pattern: '*_versions.yml' + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_POSFILE_TSV:.*' { + publishDir = [ + path: { "${params.outdir}/prep_panel/posfile/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_POSFILE_TSV:GAWK' { + ext.args = "'{ key = \$1 FS \$2 } !seen[key]++'" + ext.prefix = { "${meta.id}_${meta.chr}_posfile_stitch" } + ext.suffix = ".txt" + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_INPUT_STITCH:BCFTOOLS_NORM' { + ext.args = '-m +any --output-type z' + ext.prefix = { "${meta.id}_${meta.chr}_multiallelic" } + maxRetries = 2 + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_INPUT_STITCH:BCFTOOLS_VIEW' { + ext.args = '-v snps -Oz' + ext.prefix = { "${meta.id}_${meta.chr}_biallelic" } + maxRetries = 2 + publishDir = [enabled: false] + + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_INPUT_STITCH:BCFTOOLS_INDEX' { + maxRetries = 2 + publishDir = [enabled: false] + + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_INPUT_STITCH:BCFTOOLS_INDEX_2' { + ext.args = '--tbi' + maxRetries = 2 + publishDir = [enabled: false] + + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:PREPARE_POSFILE_TSV:BCFTOOLS_QUERY' { + ext.args = [ + "-f'%CHROM\t%POS\t%REF\t%ALT\\n'", + ].join(' ') + ext.prefix = { "${meta.id}_${meta.chr}_posfile_stitch" } + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_STITCH:.*' { + publishDir = [ + path: { "${params.outdir}/imputation/stitch/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_STITCH:BCFTOOLS_INDEX' { + ext.args = '--tbi' + maxRetries = 2 + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_STITCH:.*' { + publishDir = [ + [ + path: { "${params.outdir}/imputation/stitch/concat" }, + mode: params.publish_dir_mode, + ], + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_STITCH:BCFTOOLS_CONCAT' { + ext.args = {[ + "--ligate", + "--output-type z", + ].join(" ").trim()} + ext.prefix = { "${meta.id}_stitch" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_STITCH:BCFTOOLS_INDEX' { + ext.args = "--tbi" + ext.prefix = { "${meta.id}_stitch" } + } + + + +} diff --git a/conf/steps/initialisation.config b/conf/steps/initialisation.config new file mode 100644 index 00000000..7cc8daf5 --- /dev/null +++ b/conf/steps/initialisation.config @@ -0,0 +1,21 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + withName: 'PIPELINE_INITIALISATION:.*' { + publishDir = [ + path: { "${params.outdir}/initialisation/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, + mode: params.publish_dir_mode, + enabled: false + ] + } +} diff --git a/conf/steps/panel_prep.config b/conf/steps/panel_prep.config new file mode 100644 index 00000000..7497e3f0 --- /dev/null +++ b/conf/steps/panel_prep.config @@ -0,0 +1,152 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CHR_CHECK:.*' { + publishDir = [ + path: { "${params.outdir}/prep_panel/" }, + mode: params.publish_dir_mode, + enabled: false + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CHR_CHECK:VCF_CHR_RENAME:BCFTOOLS_ANNOTATE' { + ext.args = [ + "-Oz", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_chrrename" } + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_NORMALIZE_BCFTOOLS:BCFTOOLS_NORM' { + ext.args = '-m +any --no-version --output-type z' + ext.prefix = { "${meta.id}_${meta.chr}_multiallelic" } + maxRetries = 2 + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_NORMALIZE_BCFTOOLS:BCFTOOLS_INDEX' { + ext.args = "--tbi" + publishDir = [enabled: false] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_NORMALIZE_BCFTOOLS:BCFTOOLS_VIEW' { + ext.args = '-v snps -Oz' + ext.prefix = { "${meta.id}_${meta.chr}_biallelic" } + maxRetries = 2 + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_NORMALIZE_BCFTOOLS:BCFTOOLS_INDEX_2' { + ext.args = '--tbi' + maxRetries = 2 + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:VIEW_VCF_SNPS' { + ext.args = [ + "-m 2", + "-M 2", + "-v snps", + "--output-type z", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_SNPS" } + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:VIEW_VCF_SITES' { + ext.args = [ + "-G", + "-m 2", + "-M 2", + "-v snps", + "--output-type z", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_C${meta.chr}_SITES" } + publishDir = [ + path: { "${params.outdir}/prep_panel/sites/vcf/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:BCFTOOLS_QUERY' { + ext.args = [ + "-f'%CHROM\t%POS\t%REF,%ALT\\n'", + ].join(' ') + ext.prefix = { "${meta.id}_glimpse_SITES_TSV" } + publishDir = [ + path: { "${params.outdir}/prep_panel/sites/tsv/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:TABIX_TABIX' { + ext.args = [ + "-s1", + "-b2", + "-e2" + ].join(' ') + ext.prefix = { "${meta.id}_glimpse_SITES_TSV" } + publishDir = [ + path: { "${params.outdir}/prep_panel/sites/tsv/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_NORMALIZE_BCFTOOLS:BCFTOOLS_CONVERT' { + ext.args = {"--haplegendsample ${meta.id}_${meta.chr}"} + maxRetries = 2 + publishDir = [ + path: { "${params.outdir}/prep_panel/haplegend/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + // Phasing + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_PHASE_PANEL:VCF_PHASE_SHAPEIT5:BEDTOOLS_MAKEWINDOWS' { + ext.args = [ + '-w 60000', + '-s 40000' + ].join(' ') + ext.prefix = { "${meta.id}_chunks" } + publishDir = [ enabled: false ] + } + + // TSV + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:BCFTOOLS_QUERY_STITCH' { + ext.args = [ + "-f'%CHROM\t%POS\t%REF\t%ALT\\n'", + ].join(' ') + ext.prefix = { "${meta.id}_${meta.chr}_posfile_stitch" } + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_SITES_EXTRACT_BCFTOOLS:GAWK_STITCH' { + ext.args = "'{ key = \$1 FS \$2 } !seen[key]++'" + ext.prefix = { "${meta.id}_${meta.chr}_posfile_stitch" } + ext.suffix = "txt" + publishDir = [ + path: { "${params.outdir}/prep_panel/sites/tsv/" }, + mode: params.publish_dir_mode, + enabled: true + ] + } + + +} diff --git a/conf/steps/simulation.config b/conf/steps/simulation.config new file mode 100644 index 00000000..412c82a4 --- /dev/null +++ b/conf/steps/simulation.config @@ -0,0 +1,44 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_REGION:.*' { + publishDir = [ + path: { "${params.outdir}/simulation/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + enabled: false + ] + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_REGION:SAMTOOLS_VIEW' { + ext.args = [ + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}" } + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_DOWNSAMPLE:.*' { + publishDir = [ + path: { "${params.outdir}/simulation/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_DOWNSAMPLE:SAMTOOLS_COVERAGE' { + ext.args = [ + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.stats" } + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_DOWNSAMPLE:SAMTOOLS_VIEW' { + ext.args = [ + ].join(' ') + ext.prefix = { "${meta.id}_D${meta.depth}_R${meta.region.replace(':','_')}" } + } +} diff --git a/conf/steps/validation.config b/conf/steps/validation.config new file mode 100644 index 00000000..d84d2c63 --- /dev/null +++ b/conf/steps/validation.config @@ -0,0 +1,91 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Available keys to override module options: + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. +---------------------------------------------------------------------------------------- +*/ + +process { + // Configuration for the validation step + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_TRUTH:.*' { + publishDir = [ + path: { "${params.outdir}/validation/truth" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + enabled: false + ] + } + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_TRUTH:BCFTOOLS_MPILEUP' { + ext.args = [ + "-I", + "-E", + "-a 'FORMAT/DP'" + ].join(' ') + ext.args2 = [ + "-Aim", + "-C alleles" + ].join(' ') + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}_truth.call" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_TRUTH:BCFTOOLS_ANNOTATE' { + ext.args = "--set-id '%CHROM:%POS:%REF:%ALT' -Oz" + ext.prefix = { "${meta.id}_R${meta.region.replace(':','_')}.annotate" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:GL_TRUTH:BCFTOOLS_INDEX' { + ext.args = "--tbi" + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_TRUTH:.*' { + publishDir = [ + path: { "${params.outdir}/validation/concat" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + ext.prefix = { "${meta.id}_truth_concat" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_TRUTH:BCFTOOLS_CONCAT' { + ext.args = {[ + "--ligate", + "--output-type z", + ].join(" ").trim()} + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_TRUTH:BCFTOOLS_INDEX' { + ext.args = "--tbi" + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CONCORDANCE_GLIMPSE2:.*' { + publishDir = [ + path: { "${params.outdir}/validation/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CONCORDANCE_GLIMPSE2:GLIMPSE2_CONCORDANCE' { + ext.prefix = { "${meta.id}.concordance" } + ext.args = "--out-r2-per-site" + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CONCORDANCE_GLIMPSE2:CONCATENATE' { + ext.suffix = { "txt" } + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CONCORDANCE_GLIMPSE2:GUNZIP' { + publishDir = [ enabled: false ] + } + + withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:VCF_CONCORDANCE_GLIMPSE2:ADD_COLUMNS' { + ext.prefix = { "${meta.id}_D${meta.depth}_P${meta.panel}_SNP" } + publishDir = [ enabled: false ] + } +} diff --git a/conf/test.config b/conf/test.config index 7bd1bf14..8a7e5331 100644 --- a/conf/test.config +++ b/conf/test.config @@ -16,14 +16,19 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' + max_memory = '2.GB' + max_time = '1.h' // Input data - // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets - // TODO nf-core: Give any required params for the test so that command line flags are not needed - input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv' + input = "${projectDir}/tests/csv/sample_bam.csv" + input_region = "${projectDir}/tests/csv/region.csv" // Genome references - genome = 'R64-1-1' + fasta = params.pipelines_testdata_base_path + "reference_genome/21_22/hs38DH.chr21_22.fa" + panel = "${projectDir}/tests/csv/panel.csv" + phased = true + + // Impute parameters + step = "panelprep,impute" + tools = "glimpse1" } diff --git a/conf/test_all.config b/conf/test_all.config new file mode 100644 index 00000000..3e95cb32 --- /dev/null +++ b/conf/test_all.config @@ -0,0 +1,35 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/phaseimpute -profile test_all, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test simulation / imputation / validation mode' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = "${projectDir}/tests/csv/sample_sim.csv" + input_region = "${projectDir}/tests/csv/region.csv" + depth = 1 + + // Genome references + fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa" + panel = "${projectDir}/tests/csv/panel.csv" + phased = true + map = "${projectDir}/tests/csv/map.csv" + + step = "all" + tools = "glimpse1" +} diff --git a/conf/test_full.config b/conf/test_full.config index dde97d61..6e492802 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -14,11 +14,18 @@ params { config_profile_name = 'Full test profile' config_profile_description = 'Full test dataset to check pipeline function' - // Input data for full size test - // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) - // TODO nf-core: Give any required params for the test so that command line flags are not needed - input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv' - // Genome references - genome = 'R64-1-1' + map = "https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh38.map.zip" + genome = "GRCh38" + + // Resources increase incompatible with Github Action + max_cpus = 12 + max_memory = '50.GB' + max_time = '6.h' + + // Input data + input = "${projectDir}/tests/csv/sample_sim_full.csv" + panel = "${projectDir}/tests/csv/panel_full.csv" + input_region_string = "all" + step = "simulate" } diff --git a/conf/test_quilt.config b/conf/test_quilt.config new file mode 100644 index 00000000..27d31445 --- /dev/null +++ b/conf/test_quilt.config @@ -0,0 +1,34 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/phaseimpute -profile test_quilt, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Minimal Quilt Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function using the tool QUILT' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '2.GB' + max_time = '1.h' + + // Input data + input = "${projectDir}/tests/csv/sample_bam.csv" + input_region = "${projectDir}/tests/csv/region.csv" + + // Genome references + fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa" + panel = "${projectDir}/tests/csv/panel.csv" + phased = true + + // Impute parameters + step = "panelprep,impute" + tools = "quilt" +} diff --git a/conf/test_sim.config b/conf/test_sim.config new file mode 100644 index 00000000..6f18229a --- /dev/null +++ b/conf/test_sim.config @@ -0,0 +1,30 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/phaseimpute -profile test_sim, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test simulation / imputation / validation mode' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = "${projectDir}/tests/csv/sample_sim.csv" + input_region = "${projectDir}/tests/csv/region.csv" + depth = 1 + + // Genome references + fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa" + step = "simulate" +} diff --git a/conf/test_stitch.config b/conf/test_stitch.config new file mode 100644 index 00000000..11508421 --- /dev/null +++ b/conf/test_stitch.config @@ -0,0 +1,33 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/phaseimpute -profile test_stitch, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Minimal Stitch Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function using the tool STITCH' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '2.GB' + max_time = '1.h' + + // Input data + input = "${projectDir}/tests/csv/sample_bam.csv" + input_region = "${projectDir}/tests/csv/region.csv" + + // Genome references + fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa" + posfile = "${projectDir}/tests/csv/posfile.csv" + + // Impute parameters + step = "impute" + tools = "stitch" +} diff --git a/conf/test_validate.config b/conf/test_validate.config new file mode 100644 index 00000000..c92c39bb --- /dev/null +++ b/conf/test_validate.config @@ -0,0 +1,33 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/phaseimpute -profile test_validate, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test validation mode' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = "${projectDir}/tests/csv/sample_validate_imputed.csv" + input_truth = "${projectDir}/tests/csv/sample_validate_truth.csv" + input_region = "${projectDir}/tests/csv/region.csv" + + // Genome references + fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa" + panel = "${projectDir}/tests/csv/panel.csv" + phased = true + map = "${projectDir}/tests/csv/map.csv" + step = "validate" +} diff --git a/docs/NfCore_library.xml b/docs/NfCore_library.xml new file mode 100644 index 00000000..c7d26934 --- /dev/null +++ b/docs/NfCore_library.xml @@ -0,0 +1,163 @@ +[ + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"group\" vertex=\"1\" connectable=\"0\" parent=\"1\"><mxGeometry y=\"15\" width=\"20\" height=\"45\" as=\"geometry\"/></mxCell><mxCell id=\"3\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry width=\"20\" height=\"20\" relative=\"1\" as=\"geometry\"><mxPoint y=\"-15\" as=\"offset\"/></mxGeometry></mxCell><mxCell id=\"4\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"5\" width=\"20\" height=\"20\" as=\"geometry\"/></mxCell><mxCell id=\"5\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=1;exitDx=0;exitDy=0;entryX=0;entryY=0;entryDx=0;entryDy=0;strokeWidth=2;endArrow=none;endFill=0;\" edge=\"1\" parent=\"2\" source=\"3\" target=\"4\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"6\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=1;exitDx=0;exitDy=0;entryX=1;entryY=0;entryDx=0;entryDy=0;endArrow=none;endFill=0;strokeWidth=2;\" edge=\"1\" parent=\"2\" source=\"3\" target=\"4\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"7\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"25\" width=\"20\" height=\"20\" as=\"geometry\"/></mxCell><mxCell id=\"8\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=1;exitDx=0;exitDy=0;entryX=0;entryY=0;entryDx=0;entryDy=0;strokeWidth=2;endArrow=none;endFill=0;\" edge=\"1\" parent=\"2\" source=\"4\" target=\"7\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"9\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=1;exitDx=0;exitDy=0;entryX=1;entryY=0;entryDx=0;entryDy=0;strokeWidth=2;endArrow=none;endFill=0;\" edge=\"1\" parent=\"2\" source=\"4\" target=\"7\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"10\" value=\"\" style=\"rounded=0;whiteSpace=wrap;html=1;strokeColor=#FFFFFF;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" width=\"10\" height=\"30\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 60, + "aspect": "fixed", + "title": "triple_circle" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"group\" vertex=\"1\" connectable=\"0\" parent=\"1\"><mxGeometry width=\"20\" height=\"40\" as=\"geometry\"/></mxCell><mxCell id=\"3\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry width=\"20\" height=\"20\" as=\"geometry\"/></mxCell><mxCell id=\"4\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"20\" width=\"20\" height=\"20\" as=\"geometry\"/></mxCell><mxCell id=\"5\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=0;exitY=1;exitDx=0;exitDy=0;entryX=0;entryY=0;entryDx=0;entryDy=0;strokeWidth=2;endArrow=none;endFill=0;\" edge=\"1\" parent=\"2\" source=\"3\" target=\"4\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"6\" style=\"edgeStyle=orthogonalEdgeStyle;rounded=1;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=1;exitDx=0;exitDy=0;entryX=1;entryY=0;entryDx=0;entryDy=0;endArrow=none;endFill=0;strokeWidth=2;\" edge=\"1\" parent=\"2\" source=\"3\" target=\"4\"><mxGeometry relative=\"1\" as=\"geometry\"/></mxCell><mxCell id=\"7\" value=\"\" style=\"rounded=0;whiteSpace=wrap;html=1;strokeColor=#FFFFFF;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" y=\"15\" width=\"10\" height=\"10\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 40, + "aspect": "fixed", + "title": "double_circle" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"ellipse;whiteSpace=wrap;html=1;aspect=fixed;rounded=1;rotation=0;strokeWidth=2;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"20\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 20, + "aspect": "fixed", + "title": "single_circle" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"60\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 60, + "aspect": "fixed", + "title": "triple_square" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"40\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 40, + "aspect": "fixed", + "title": "double_square" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"20\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 20, + "aspect": "fixed", + "title": "single_square" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;arcSize=50;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"60\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 60, + "aspect": "fixed", + "title": "triple_round" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;arcSize=50;\" vertex=\"1\" parent=\"1\"><mxGeometry width=\"20\" height=\"40\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 20, + "h": 40, + "aspect": "fixed", + "title": "double_round" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"group\" vertex=\"1\" connectable=\"0\" parent=\"1\"><mxGeometry width=\"90\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"3\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"4\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"40\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 90, + "h": 100, + "aspect": "fixed", + "title": "single_file" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"group\" vertex=\"1\" connectable=\"0\" parent=\"1\"><mxGeometry width=\"95\" height=\"105\" as=\"geometry\"/></mxCell><mxCell id=\"3\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"4\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"40\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell><mxCell id=\"5\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"10\" y=\"5\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"6\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" y=\"45\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 95, + "h": 105, + "aspect": "fixed", + "title": "double_file" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"group\" vertex=\"1\" connectable=\"0\" parent=\"1\"><mxGeometry width=\"100\" height=\"110\" as=\"geometry\"/></mxCell><mxCell id=\"3\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"4\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry y=\"40\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell><mxCell id=\"5\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"10\" y=\"5\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"6\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"5\" y=\"45\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell><mxCell id=\"7\" value=\"\" style=\"shape=note;whiteSpace=wrap;html=1;backgroundOutline=1;darkOpacity=0.05;rounded=1;size=20;strokeWidth=2;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"15\" y=\"10\" width=\"80\" height=\"100\" as=\"geometry\"/></mxCell><mxCell id=\"8\" value=\"&lt;font data-font-src=&quot;https://fonts.googleapis.com/css?family=Maven+Pro&quot; face=&quot;Maven Pro&quot; color=&quot;#ffffff&quot;&gt;&lt;b&gt;&lt;font style=&quot;font-size: 20px;&quot;&gt;Fastq&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;\" style=\"rounded=1;whiteSpace=wrap;html=1;strokeWidth=2;fillColor=#000000;\" vertex=\"1\" parent=\"2\"><mxGeometry x=\"10\" y=\"50\" width=\"90\" height=\"40\" as=\"geometry\"/></mxCell></root></mxGraphModel>", + "w": 100, + "h": 110, + "aspect": "fixed", + "title": "triple_file" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 0.5714285714285714, + "aspect": "fixed", + "title": "line_0_bk" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;strokeColor=#7EB2DD;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 0.5714285714285714, + "aspect": "fixed", + "title": "line_0_bl" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;strokeColor=#24B064;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 0.5714285714285714, + "aspect": "fixed", + "title": "line_0_gr" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;strokeColor=#FF9914;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 0.5714285714285714, + "aspect": "fixed", + "title": "line_0_or" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" y=\"40\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\"/><mxPoint x=\"40\" y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 40, + "aspect": "fixed", + "title": "line_90_WS" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint y=\"40\" as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\" y=\"40\"/><mxPoint x=\"40\" y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 40, + "aspect": "fixed", + "title": "line_90_WN" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint x=\"40\" y=\"40\" as=\"sourcePoint\"/><mxPoint as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\" y=\"40\"/><mxPoint y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 40, + "aspect": "fixed", + "title": "line_90_EN" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint x=\"40\" as=\"sourcePoint\"/><mxPoint y=\"40\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\"/><mxPoint y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 40, + "aspect": "fixed", + "title": "line_90_ES" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint as=\"sourcePoint\"/><mxPoint x=\"40\" y=\"20\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 20, + "aspect": "fixed", + "title": "line_45_WS" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint y=\"20\" as=\"sourcePoint\"/><mxPoint x=\"40\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\" y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 20, + "aspect": "fixed", + "title": "line_45_WN" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint x=\"40\" y=\"20\" as=\"sourcePoint\"/><mxPoint as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\" y=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 20, + "aspect": "fixed", + "title": "line_45_EN" + }, + { + "xml": "<mxGraphModel><root><mxCell id=\"0\"/><mxCell id=\"1\" parent=\"0\"/><mxCell id=\"2\" value=\"\" style=\"endArrow=none;html=1;rounded=1;strokeWidth=2;\" edge=\"1\" parent=\"1\"><mxGeometry width=\"50\" height=\"50\" relative=\"1\" as=\"geometry\"><mxPoint x=\"40\" as=\"sourcePoint\"/><mxPoint y=\"20\" as=\"targetPoint\"/><Array as=\"points\"><mxPoint x=\"20\"/></Array></mxGeometry></mxCell></root></mxGraphModel>", + "w": 40, + "h": 20, + "aspect": "fixed", + "title": "line_45_ES" + } +] diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 00000000..c2bd3d19 --- /dev/null +++ b/docs/development.md @@ -0,0 +1,83 @@ +# Development + +## Features and tasks + +- [x] Add automatic detection of chromosome name to create a renaming file for the vcf files +- [] Add automatic detection of chromosome name to create a renaming file for the bam files +- [] Make the different tests workflows work + - [x] Simulation + - [x] Validation + - [] Preprocessing + - [x] Imputation + - [] Validation + - [] Postprocessing +- [] Add support of `anyOf()` or `oneOf()` in the nf-core schema for the map, panel and region files +- [] Add nf-test for all modules and subworkflows +- [] Remove all TODOs +- [] Check if panel is necessary depending on the tool selected +- [x] Set modules configuration as full path workflow:subworkflow:module +- [] Where should the map file go (separate csv or in panel csv) +- [] Add support for imputation by individuals or by groups of individuals + +## Run tests + +```bash +nextflow run main.nf -profile singularity,test --outdir results -resume +nextflow run main.nf -profile singularity,test_sim --outdir results -resume +nextflow run main.nf -profile singularity,test_validate --outdir results -resume +nextflow run main.nf -profile singularity,test_all --outdir results -resume +nextflow run main.nf -profile singularity,test_quilt --outdir results -resume +``` + +## Problematic + +### Channel management and combination + +If only one specie at a time, then only one fasta file and only one map file (normally ?) +Do we want to be able to compute multiple panel at the same time ? +If so we need to correctly combine the different channel depending on their meta map. + +All channel need to be identified by a meta map as follow: + +- I : individual id +- P : panel id +- R : region used +- M : map used +- T : tool used +- G : reference genome used (is it needed ?) +- S : simulation (depth or genotype array) + +## Open questions + +How to use different schema ? + +- Use nf-validation + For the moment use different input / step. + In the futur, if/else logic will be added in the yml nf-core schema. + +What's the use of dumpcustomsoftware ? +Will be deleted + +How to add to multiQC ? +Take exemple on Sarek. +All report file are in a dedicated channel. + +How to add nf-test ? +Add in `tests` folder and run with tag. +Add tags.yml + +How to run stub tests ? +Use nf-test + +How to run the tests ? +nf-test option tag + +What's the use of the template branch ? +TEMPLATE branch have the skeleton for all common part of the pipeline. +Will be asked to be merged to dev from time to time. + +When is it necessary to merge to master / main ? +First release, create a false PR to first commit that will be checked by whole community + 2 reviewers approval. + +What should be the Github action ? +All GA come from the TEMPLATE branch. diff --git a/docs/images/Logo.svg b/docs/images/Logo.svg new file mode 100644 index 00000000..9330090c --- /dev/null +++ b/docs/images/Logo.svg @@ -0,0 +1,208 @@ + + + + diff --git a/docs/images/metro/Concordance.png b/docs/images/metro/Concordance.png new file mode 100644 index 00000000..87f4e68c Binary files /dev/null and b/docs/images/metro/Concordance.png differ diff --git a/docs/images/metro/MetroMap.png b/docs/images/metro/MetroMap.png new file mode 100644 index 00000000..92412868 Binary files /dev/null and b/docs/images/metro/MetroMap.png differ diff --git a/docs/images/metro/MetroMap.xml b/docs/images/metro/MetroMap.xml new file mode 100644 index 00000000..7f105c75 --- /dev/null +++ b/docs/images/metro/MetroMap.xml @@ -0,0 +1 @@ +7X1Zc6LO9/er+VY9z8WkaHYu3XDDXUG8+RW7KPuqvvo/JDGTAEnUCJpMUqmZCNi253z67H36P6Rh7tqe4KwHtqwY/8GQvPsPaf4HwwCmoOS/9Mr++QoF8KcrmqfLz9f+XpjpB+X54vMbtVCXFf/Ng4FtG4HuvL0o2ZalSMGba4Ln2fHbx1TbePupjqApuQszSTDyVzldDtbPV3EI+nujo+ja+vjR8PGOKRyffr7grwXZjl9dQlr/IQ3PtoOnv8xdQzFS8h0J8/Q++umuLW7S7wdDhiAmRH58oG7vEto8Dn58j2qgK08TI3/wvyF94DvRxDL+YNjLx718iCN4ihUUfFDyx/NnFc+q+BOeuRoJRvhMuecvHeyPpPTs0JKVdBTwH1KP13qgzBxBSu/GCXqSa+vANJ5vq7phNGzD9h7fi8iCQqpSct0PPHurvLqDS6QiqsmdIz/+fq0PaJHMVPECZfeKyc/ftq3YphJ4++SR413yyNIjiNHn1/FfRKDHZ9avwUA+XxSeUai9DP6G3s80PofeRLn0FgWJlJEiesMIimJyqfQm8Lf0/gOTeYK/LKvXBEfg0ghOlgxwTCFltIjgJCwiOF4qwSH0LcEL8I2hReTGSyM3VS65FZAQnCgiN4UTiFAquZGT4E0W0PtFK16d3scplUVvVVVwqVB+ywQlJrKzRHrDGXQDuADeWAG5sWtIk8PaY9qbFg48CT/8r9enZX71B3xObcWSa6n9kryybEt5S923rHii6dE8gatAuyK/MZvytH9N2yLSPl/zFEMI9OitsVVE7udPGNu6FbxaSlh2KUEZlvl26EnK89v+ci0/EpUdKcv8QPA0JciNlDBJ2L96zEkf8D+YMvHOlP/i6WnIv+h6IevlgIN/AXcVwKFklnvIpYDLjZQV7lcCHAq9M+VSAYf8Au7OAPeizcoGHHYTwKG/gLsK4LCrqVQMVKRSMaRklQryrITefvVTbGyAnOBDXhOjF/mU4K7QCKCMXQYAdRkaAcA+GekdNF6Al3fYf4JP+yX2Jzz09svkBXR8wacvHrDjy+bu9c3m/vnVNUTbMTb7SMOPqHD80vcCMDIrpTDo4RijPhtjZFZ4YqdJvAswNneJWCKF1mjCsQbKJ+/rEX9OQJiWoMh5l385r/olsi6IxxGgYn4dvW0oY7EA6jlW+YqhJJxnKJkh+iXOdiFRjgz+iCqvVtlLmD5dGrLgrx9XXPpCMHTNSv6WEqopyZKop9TSJcGoPd8wdVlOR0zHcNKRzZ2W5kUenjIR8EM4Dz3reYHXH3Mi+AOSSujHBzqKkH4UjDxg6SKTdwmZm9QDgj+u/SCBuZ1+yp80rFy0Bj+ExMnhEgx9wCj41S/2hp3oA4kTFIxiMIlAgCTzzEWQBxQi/v6CAl6DB4DAEIGhBMApEsBlsf6EQMt3Yn0pfL8561SNaW3xP9YokPruH9VUh0LvlBDZkdS6+ZjEe+EJk2bIxravP1NNtIPANpMHHlNndUHaao+q9KjZZEUVQiMo4Gpgp2FNwXeeUouqvksRUX/8wNrxKnS8kg4lBMJ/SO3pJUw7lvYf3NDZ+mgaQ/22ZteSn+FssW4ttOQvlk/+acwbtUHyf1Pvkq1G8kd9xdYHbGuZ0u/xl0zftfXg0QHxGB/p8kOPjibQsJn4PPWxzhnOKhJhEhvNTSwUt8PmBLL7DT1CBg0LZhJK0/uVX3PZuZUs5bqqpoYIXYsHDdlaAZ6jN1GyiOsULFuihAxj0WOTlztfpLcxbx0So5Uet3f+arpg24YjcPSaX3qwBamdHdWZQsp0u+rUsQEhkaPJgG04Gt+uGQHHpw96/b1/GByGQZzOVQHisBZYIxwVpWTU/n6bjn0gVZNdpTOziDD9Rl0z+b5wZxeryf9+c7Luh0szuYH6+85BYKY4FcGEb0Voosfq8I6Ia7M6k34Fxd8JWzFEe0JnZ3O1vdMXQL87nMwWusNC0nq7W84W8m5lLrDunnaWdKgFeBep1VvJPDqNYWfSHJhqUwIaoHvKFiWEeetg0py27zhiYPZrcWdGQd0OhGLRI2vqo+UY8JMa3+Bra6gv7JDtdHXgeRXiJ/xmkMyLhB2Oxwy4c8DFDUIO0q8N9zjzYCNdjJRtDdZph5+totWaNJd+MGqAqUhP2p11qz9Yx53utLdihv48ocSchYzWhJ2i1giWk8/mxmKwXnZ6s8VMVnVdNNrcfOjOOpIJOwOBswbTjm+yvfZelWFhx2zhZYA6mANgjEkBoIzrkEYmMOwu7HiRLFg6JeGqN9aoaQclx8241uBnAHOEHSdSsIKIh+S+g0syhvjbhqsuPeTgIKKTooruUaZAkZC/GTodCiVChBEwJAogZKMcBg1PHU9dkUAIggghQiaiQ+SY9ELthAupk4xqI0GktsGKp0YHBWMxhOx2POD12O04uauPxSmhysgIDXVEmAXWOJ6qxGappm9F/H2zP6Jn84RhG24rd3zRWYz2Ybry6CWI1fq2FcorDN+7E4RtptntTSPi1N1yie1NQ5ybLgcp64MqKErQdV1fVI3mphO0xZ6kuqG56MMQFQTj0ZJfNeW52uqm0A3xJZ3wgJ6um8vZ3ncZxu1OcKwFQkYW9ga9Hnkbr9EetSaYrkVeayiya6mPY2xtyO933hRRIaWBjRYGpHltsOCkMTRpwT4sTbcYfCCSpbwV20pfx8ma1+c3dgMaAQ3uCAYXBgOwx3jqCX/RUB0bYzZoAHY8xdryXOrATN/d4WpTE22bo5Govx5K8jiAnEafS7OY9PM7LUtNV5/LsY68ASurJi8O+NPtHufO52pHE0Zghmi+MwT6fokI4wmRruLjLx1L0aDjDgYM3cMdQm16fXSa3IkX7cVQSKDJSt2p75vKGE0/ybXHMZ1AiD74fZcB7ZClXLTd1xDWQ+YojWxEgegsBCZ91p54tKcxjXR9B05rQY4PS5F3MdLv0+IyxO3eDAddQLMi6LQIGI+aEElrUtORRuyQ75ErHqLaqs0KYZsxFDHYBatgPR0KrOxyyfA+oRAhYSKLfvIBfTNkNyNgRazgpsuCUfUJqDltkSLJGYEvm7yrdEfJg25vjxosKwXDxN3oJupc2I87AlFrLAh+GWzdSWflonViY7kaJFkTURr0wYwmQDRjton4olWfHTr9rWsuYZf1hh26E2pjjTajaCsvpnXQ6y/kA7fYCTbGi0hv79ksLgwRx1h2xXajU/PpsHfo8geTDBJZRnYsvb6YNbA9vg7GkdPSqDFE4IeDiEehMdSWCHEYp4xq9gWfGm8i0EVkV4pDhNLorblMtTLDN2kmdmKatlYMGIWOTJu0PxyuViqKjgehY9e6NCrB6p51uC0VwF2f3QjGeB7RMqS4u77ljwcx7vjSwpkMeWNK72mOMLfoshHYiYNDbxLpB9cIedKhx414zgMWVhYkMxDwHc4J85FYN+dretk/MFFnGCeTNeq2iNaRRNW0gtW87fHJIAKn1TnQUoa1fYTw80Tt23V3sDG5nezORxN5Yor2cONvUmaGjHlYhdGsPkOJedupt5sswhB8QG+8ZKSuTB/AeGsQTaOZSlDGX/GYAsO72ELlFVBrgzHfMociRCd61NyLSzOut/CQY1icUqLWQaRa0sRNp8k5ArBrHB0sJzKwe3ZjBBJpa2gHqjYI+cmo69eZcOGxM6EuuR1x02vvOga/5AbojuEXja0z81q1ZQNqh7Eo0Ks6xIsr3giHzpjwnO4cSbjLsNP0o/x0xYgpfFKVA2aS7daIuisk3Ihxyk+J1nJpnsN7m9Ccex1ViKhotu56NWEzl9vOcCZQuldvchoberOgKYXroL7nDolEGnqOKMiBNiADZk+konGXCs2NJBPzuKt06wE7hQNLOwz7rhImhrLCgXQB85PxlOkNNdzjKB7ecGM7ZqkaNHOwtdeOVjGLjxMmeWi86jHQKOBMTbUXi86huSan0gDGd8mc60PVSRe61ndJ0DKg0Nozyn4162PefO/uvR68YVQ+8mbE2GkHxmCYvgmdzqRNMxBboS9Kh4GCS0QvwA/KaL7eYWGMkH4ngppzRZppnEt3Zi1twAJr0pqCtafU0R4JDwlc0sN9lICzXmMSE60OxuNBYkJP5/BaHrFLqNXQSJQbkHEEuNjkKYB350wToTzc96NEG9GRC43t8W63r8XKIJH3WgtX/SYy0g3abNGJpqEkHzPsZOh5ojVFYO1J2XX8sb4EvlfbAoZeefMnORubIERED8bCzhoXWuogFZ+Ysugq+4a4EsLapufaAOmO/HVs23LT7YE4WVxg3sccR1txE28Sk7DI1VJsq9O9b3OBB7tLnJqGy77P961ktOauxy+pLlNbN8RIT3zhupdoA34hsVS4atWiJVmvtZrapMkeZHXLaBjHzHfcZLyIpxxDz8mFzNSYrTtPjO+6qRMN3ZLr0zgiJ+nSNJA6NR9vtxPcimqGbLOJEu9Sq0NjFAmKvo5muy4xbXmLVFPDvD1c9GuGEs0UvUux8AwdEXuiFaZG3ypewBEuwzteZJ2ltzMWAUsSTX1lq0QQGsqWY905SqZmT7cG3L7cbi0nw5Gu6G18vmyPgqZgyAOOVp4ou4KNVNclxm66coJgP9gykD9P5aHLR8se0274MuXPkrsNwXFq7t5Y9Zp7v6n3GYqJ1vguNgm2R6B2zwUa3ydUqzOhODjRwjqpH6bDCFrUyCbWpqlpqhtWDVZM1uxoLonN5kyU3REzG7qRswimC1gfDlGW4EmPmTrWBHBEC+y3SJc51BRk3YFM0LUZv9bthKvhZo9Qpme2+pA/wPiY3fSY3Yi1V2I9+RR6krx73zRW8dRlib20mYNkyuOtrA0NXayt6127Bll1qFEPF5PNepYgoJZMt49PhIm9q/PjGX4wNnVus4+afbQxMneJ1U23RmwftLhWK7Fs1p6kgbbe2PLGSBjS8+ZmlQK5rjcO0YKZ0oNgMsOa/Hq/m/G9wWaZwB72FvOtBTl7YcjJM23tOhMMRhE8Ahg3adYOOJIuXG6x2iUuRLNhyVKH6hMrvrFsL2sHLVmuLuH5Q3U2HHRsbBByKUAH3Xg4iBFYF1KPJYSUmhwGIKxJa6pbE1uONgXaLAFacw7jrO7LNMRrLK1i7XBCYIbWRpTZHmnNNFwQ51MGOOZi7Qz5OgPMYb1LHpYx6NR1ab6nGGQM6txgu4ZIgnfr5IQ77DfMYNrfOM408QItysW7yigidG3b2ofpQktWzzbqx8spsR73Sd7VPb7mDZKFsdVm9MjuaL1FPyDRgYvXuMSyoNGZLHnSlFcEo9WjBURYjXjNb7GDPixSuEENyJEwGtUor6YMzBnerTMHptaPe5O46/W3g053MehtPJN0EN3bdNarMaMppO4L1NBs+q2euuElVlytTTFkPb23GPESNOmjpCA0WXI/S32UQEtmOq15HZrfYk2DHyzXy4Nf8+Ldkg82XW9IS+5oMbdFnj+s2RUbNBeyqKC1/j6Vj1MUp9taLID2+jAYCq2BKUg9x0J0dzhwEoOU6zhQC2bAskuzIyoRx/JgNa6torWxDWBkfQA932n1DS4S+Mh2l/ZqNOQoEbZnLj4yRiNk1IDlQ6eLMOiik3gqy8kmpjUqMeKm5M7qAlnjW7gXm95OFBRiKRwOc3G8ApP9gUMpg4zYkIRSnUwfUND2vPWaXXc4RWxPu1rbhbo4C01naip9EhFEN21GZk1irvdEVp57urQkNvIcjCfdiGqbUMpdWJp5VIr5NlRzZRj4jj4X8MSOY1SkrhncbgAcwer0SQzS/WRW+4E3A2smFTObxNUcNnsQdYC4FT/TQy1Sxt35OobbqcENCc35jE3tLnq58FMDPuoEMCEl9p9OiE5XULiDv9CjBqm3OamJwsZKDwVlLrm6adYMzOwKLtlrhmGvLU1QYT/vJ9gP4L7TRlwqWHWkNSNDcM9kWRAnCwTayiuH5XBcC1zO1+qrtcULiQjabuy2OevtAxMO2f1IHnQBNRSnra3f7zcMDQOcI0NGL2RDbNI3qRQDOHAtPDHzIq+OdhYzwMGdjrUMaVfxQrjRljRrbstOq3VwXHbejzsdmopjyBdQyNwj83k0OXScpuwZTk1vLNMFHvFNdx4EpsXWhNB41Em7mdSsu6mAnvWp2s6x+QOxp7BWE2liDuP1upzpdmxvEMOBLrgbiKyxmoNMRinCE3dFY0dLyhCSF/uVD9tiu8mtZoMpZ7VNk1616ozsYa639cT20JjU+/s1tGhg1Jwl6puEEYa4aPHysr8HYnO94czW3EwdT87rzUNzKi/gYMngzRm87Q58w3B6tWZ9SG9XC7rHmHXXFScLxZkH8a7GDx2yzvp+s9FUW8ls6G0rjb/UmoP5ONGWrWEax+CmITaVVqjF7rVJbwxZ05k1QuUemZCrwy30mUsl9qGrMk3M7ax4cTGTxtPuFAjWbNeX13GTE5pjxREXU2RiyJbpCPEhhgaJTU12HXo5XIRLcQD6+B64U1eJ16mW27Mx1McPLD7t2Qo1kMfWeNho+ybpdYWNu6N7oz5uAMOeGptDNBwza490u8teS3DZtjvTur2ZPpS80YzgeO+w2NXiFu3Qo32jvljVe9zA4sXefMQpzUACitFm9hAKiUMExbdQ1+NY6LDeyD18UNfWdY45TPe77rCebhWgI16cDMIxpa2XHC63u4IH7aU616FFjtxx485yI7UlBG4gGK62h5tkuWxbI29/WK8SUTugx8s6N08NSVjrtBuhnXp3fR+q4Wksoz3dUv4GmLzNMnRLFmk7WOjKeOzhCtjs4NDZELiwULtDa92QuyMAb2pdYslY7aVgd2mTCecNeE0su8tBf6V3hL0/6Kp9f7Kcqdv9qEFz/b0BU91thzV2QQjRO87H3EDZ9WJs1mWxbo/mxsaht1hPN12eq4HJsIeNujDVj+YAMhE+qC1E5jCM3boXY1FjSy2M9XrRk5tAWvZtyIbmFu5L/Z6sd/XDpG6M+gS+sZ29voHMsNMNJ/RhJTAyNV/73X7PPRwcr8/uZGm96U3G/cO0BskS35XnhlK3lDUkQKAhpQ5FB2cWdVrQVoeovxgmDFklduYSQSY9JhwzE27UHrfYNLghraj5QarhyZsOGIMm3rZYp/vdQGG5sRMh7T2yx1il39+hwCMn/cMEhBgLd6EhyhnEYdYe0Aty0IycyZrhWYsedWYE6+AWv2U5bTqdxzDPmNRkIZix30ZXib0bJX69sOtPkNQ3ndQEvCP3e7Y7nPcMldurjXF9p8Y62TZ2M5mk6vyEixbmfo3G+2h8IPVWvJur6yHD2TMlZjSBHbM1yDT9pagvhRFHelPdQ/BBs7fuLWn50FakoLMFm7qwaRD+BNoOke1U2rjLLdKH1bG77DbjLq6NMDsOd3gwnfqNzi4NFG6nrD3E2io1bnhNTqfJKArbG8ZUetEUJA/o7s5pTKaaulttw0ldNSlfPKh6Z2eY3rYbxqiN1objAR9EgTBxd7HIa3wq+OSD1ZER85BqKm3E+orq0K061OrHZqux4nbAQjm+xjFDqbVMPI0OOPRq3biJJ7ZpG92E0oj3/GYqrrtMt6Xt+SZKbtdcaHQTTd5o0uJhJDWH8ITat5twjd6wez8WdsbcpfSNKg2ZVaK6F0P3wEYWJ/alBt02Q7iztFxYsRbQ2PCFWcy7kR/MPN6mlzsRDLYuje7gxnJEA40I4tDTF5N1Q+P51UxKLe+ZpDmJYcP4lkaxGjTrt0x9wgFn3QmsbVynR8RoCIULZZb6B+zC0mDNS6Um0W3F2mJv1lh1NOuuBmqjHY3s1kRtT5vr5drc9qa0hm2nojYK5nxP6mn0MrYbsYklbmFc8wK/i9RMy950E7j1eLAZBOxQI3tWKnHEmPeFvrqdIqa+4axE7sq+se2ZacDCIRChB1INtamPp5IDrSwCRVfuoY636bG9N9oszcGyWCc7c2i2tfpArVnMZkAecJxxd1hnM2A9fN4jlFbQR4j9Pg50frRf4FCygsil4EcraTCnpg3D4M02MqMmskRPzbWCkC6645AJFuw0PZF+8QRd+4mvsxU64VBK7G92BYJpb2UJ9JRHuacYHjfrUkvYALjg9kIirjGJ6ZT4ebFvrrTaqLvlW/RBsWgtGniEIqBt3Z/ZiCAnwkZFvbZg2FtpZM7XSmMpt2vhPmzS08kShpL37ZvtVvIBPaY/A9PeprVl0zhEGqQP0xAB6WyDqdJsjyy8i6rQmI/5tS/1jKC7iJobpUGngibxmuylHUlC2w28xmKOgP5cSfRlYpxO+k1hGakzQpdRq9ZwcXnTbBtBy+P8qM5bHLE84F2uD0Uqk1oH9KShrYwaAzlibUbzJOQlwy/cRp+bJLY7uxiNXVrZbDuLeV0wzNTCQu2GsBcYRty4dEdL42b9qcnWosbKFxfQpD3F5drGtVmPbwZg72OdejBnoIZgp15tO1gtlvAm6kyFdorGuoZuLTGaGGM5dS/rccsP1hSHeGHqnaRJiJEhWO3d1iJ8ljHnk9Gkq0Izf44uMG2HM2oXU2bTPdtlsL7lRrbiLkV+N2HihiOJ2H5sKHFj7o3gEVYfcQPW5sbBeN1C1VQGG2pikW64YVNY2PKuv06cftWlG40Ik6A5Cda+kayHNJACtyiaCPtqf6ds5yBsLXx655p8uJw11qq+4mhfqeF1VNEWbZGZ6LyCqC0r7Cc2D2pMB+36JvQXBD2n29zMr2vqdtYdpbZK98BAtXRB1JbmjN6Ma1SvPhq1vaWiEY2e2FnjoNsarhKAqNGa1df6Tl4ewhrPDSWbayBMP9zTw4QKI8IdotASkVmty3YOCph5Ax+Z8cikTW6V9SyNPOOH9VqRFusWjVNDLYCmzsKcjNsdWRmOY0gKJm5q5ztKZ2tO1iORSRUT4UA0ZhoBOIi1VjRM2FeH2qQfi7UlMu4PtcR6HtZNyaWUFDOk1X80F9OnFLa9WrhtET7UGGIKFhK6UjabHbTcB5CCwRQmgpGIxUE0klZm5xAoHEYr/UPiujAHaroasmEyGVFZgAafuknqbshCjFuHALId6gSbJj2Znb8AmOVOsT3eSi6u0kgHUDjBnBsbYh6F3EZP15Wa4IccuAbT20ox3KGWB4fUkdmgPVwhkjJc0Osg3HljtefTrB/6ShT59VYMA0Ht4YTtdqWla9EHqCfQArlaepja2K7Gi9jCp9t4zI8RcsnKBNrwrdGaiZw6P/YdF/dGq25qi+/Y5nSgzSNL8+qUMYyIxqYbpdEGsPDN0B+RodIXNOTwFFiBE+Umi4ulPCGjHW+Nx0tTIAN2tU1GsmjvkSQ4lpgFjKV6EQEO8vgQueoClbhUvlKWGOD4erqeKWkaxQvBXlWRjrfHaEniROwpsSDIk82EsDlgC7vDWFIINjHg1DZQ96G+HHnQYQwd+qy4QB5jL5unNx08dxu342iRZktr9d50gbW8bU/TtDRbnP5+nkl/rwrm/WKJTLHLsd7nVU4dINTDcTvT60T6NfZO0UNiYYI/q52/X+qJiSMzg8YfUHaZobLTg5cys+TvV1Vmyau/RWbpi3dqzC7aD3RfpYnZQlmEvLAyMVsvjaCnFSaeXSYLF0/45Hnhb57/clVtMXhLL5L8BW8BeHNb3S4Gb1aslQReLFuB/cm8sCuXhBeCFz6h0vAq4AWXg1eEFETBi8ALKSREkt8AvCCzgRbNYu5U8GY312DZvTXX2oL1zoRPndfxdbngrXhT6g8AIp7ZWYwRdw7EnFQkzgPiUW2UC0T4F4jnAvG4t+SrQMSJzEAlARHHiid88rwqAWLZm1ivoc4vadRyX+AFOHjDW/zSDYc4+Xag0sD7zoRPnlcl6rziDbE/wCnKmmm5bkL35hRl1TNyplOEoFUA8diwpXSP/kVy8q9k6r/i0aMI+UC9+iHLgTKcbWl2L1CuxL//DU7dIjgFMOoNtAnivrGcjVWBT9z97LxAFfbB0Tb6xXKlWIaR741lGD0PyzBeBZbLDl1dwVG7qIPpnWGZKAfLBFoNlokzsZybVyVYrjj6dQIuL5Hk945lEn4bULrYQK4MvO84p6fOqxIDGakqYvYFo+IHCOLSwIveK3irCFQgZUfMXqyIBwiB3wAYxYgPIZy+GCuennzFtPvID4U1TkEPOAq9/GSipoB8AMgrc+NKkCfISowN7Lks6+R5UW+eLwnyVcXmvmA4/wAnEM80sXtpNnw2eLOFBGXJ66z8pT6R19l5gSrk9QlnTvxmON4aDplmdxeXfVUGxGwBwmeGQ2ZelZR9ISccxnEPjR1/AICzkhQD1AOJXIbhrBWdWwzXa9ZYCJujjP2VXxdX/uUO2DmV93BF4VOQbYZJfiy/cvOqQn6hVYVPE4n12N7waAn+gR4gCP2q5/MDcJ3ouQcCf8N5JHF3SDLnDH0Z6ETGb7oS0OGMu4OAjzcX5OZFVbC5AIUrU9R/dTP/RjUXK+rfQrAi6Y5khzgZ9EQ2PpYZqGzNXvHpFz9AAiKZCtZEfjxg8GXsR7LsLxirbASUHku8kktASkoxckQSQ7HvgJxsVjKtFrkUOVmXAJxYuHc12JTtSf680nc0Gze91mYgtKTTSr68GaiKXAha8VElPwCIuf3A1IXWS47hyGnWy1c3Ax0nfPK8wJvnSwJi2SWXPy+NRmWD95cGeakMw8sKklBnZoez86qktOHInN9UWbngBZ8EZU8GbyZSXFaGgsqYn59lKLLzqiRDgX2HAsnvf1pZFrwXGgAAgrKathzoZoPT580KrkD9Hx25uy4o+/7ABVDWiQbEVUp7c6ApqdoGQNk9F5+U2+QmVkW5DVZxXPAnIvPi40hzHD81uPNlLGaSeCfMrIoNQFhV5Y6/ovUtGrIZuIsBDCOZkUoDMPFxMi8/M6yCbB5WdvHiPwBGBMvo44vBiFSk2I8zPn1iyHU1O/SHncatQ9zeeCSh0poP+OiU0wp/o1SfmJz3HqbKq+kz41TgypH74gOWq0XiD0ggZiMyObvwdJmIFNtx10ciKJ5yqcgq2xU/H1l/68ogQL2xMlGI+tDOLCorexenT/z/aM1B+F0hOpnP2/YfgLpYzcOZkUpz4NHiKZcK6arrdH6AsHynt8X5yAJZGYaWJSzfke+lIqts5/pLwhIm37rk2MfNkq4tLMm7QjQAaCZqf3HcPoFaBtLlBO4ByNYpXTkWX8y4+6tqLA4x4ZWiGb0vNGdtUASHL0NzLoEKZQYqy5g9zvjUiSHkm+dLAn/VAv1S8GMfB1ivjX7svtEPkxfK8hzITiwh/zL6jzM+eWJQFaK/4lDrNxH99+b1ZcGfldiXgh9GMwOVBn7oPNEPw1WI/vvbZ34P4CfuG/wAvhL4URJ+wF/9ZCohSgvpwectBYBWsRSqOlrq3260l60ju1oFL1JSjOWTCt6SwHh/peFX7ZT3DX3PbK720uBgVbjNJ4srSaTAFQP3BxQuINk2ixeXgSHZdHFJ+7wAghdPuVxs/ZYYno0tNCO24Ky0uT9sodmCqypKC6o+UuQnYAvJYuvSjjUAzmKrrPJV5J3eOOVi67eY72xsYbnQ4aXYQrNd6svCFpothq0EWxUHcH4CtnJy62J7Kye3SrO3Su6dVYytijs2/ARswVlb/mJ7qzJswVlbvhJ7q+ImDL/Yuom9dRtsVR08+wHYysUgLra3cn5iabZ8FltV2FulHy31A7FVYgyiNFv+FjGI0o96+onYymbNL7blc35iafGtd3b9lost+BdbZ2MrG4O4WG5Vh61smUwlcus3Ln9P2CpNJ94EW79x+bOxBa61+xTAuRq2krAFshU/n2w/zc2skv2npR8p9APBCGfF0+VBi6rACH8s6E6YWRUlQuWfzPLb9yx9Q26LHXWdk2HfRU3pwrUah+M3wHtPsrK05MFNjMKyA7y/oq9Q9H2hwCOLx7KclJysqyIoXPVpVz8BWyXKutKSWbeQdZUdYPUr694w9+L67hweczscypJ1yCenLeRnVgmA4V/heD/CsTQwft1prqR1yW+o+mww5koxL06xZY8RKc0rwXK7havA1m+o+nxs5XZj3z+2sunbSpTob+T563LrYu81h62yPIyc3KrEe/0tIb8nuVUatm4ityouxfwBJ8Wimf5nf8BxM/i52ELIzEglnY+MQu/M+OSJXfmA5EIoln7IVQaKCkgEHVEERQonEOE7iLk8FC9tH5jjOMDL6TmVx+Lb9oEnzAytoOnUdzi06gfIUiRjT/25VE3nUFJS92sEpwonfPK8qgiilH5w1c+TpEimD+ufi4MoBZK0Iih+dtRPfmZVeC+lH/Xz88CYrRi9HIxZf/VPWY2uszXzWTCWg61je9lfbJ2MrayX+efi4tPKsJX1wP9kik9LwlbFmYgfgK2cNXex3MqWsudHegdbV+P+r9b6sta6XLLgn41UNvcrjrj+AO5/zrNTuY9ntnbmfMJrqRXyY7Xy+cRAJTZOVWXkXzgq8wfgl8yF0q6F3nLiaDicBeN504IrCKLhpXfstZIJvmA3ffEKvOnLv+h9fPVz4Zsz6xOD6QG98OAtPCsZq7a+jl/mnmXeT8iSEpks6R8EeoBe/cDwpXtniEy7ElCODgdE1uxAPqlqy07sD6hAhxO/jSPOxmZWCF2ewSdyqrKsFD7xcVeSz6dWycZEAv5F49mS8h3O3rFkfKepz8kTqwaLZUfZrmQi/gQMX0+iklmoIB9vuS3bXCw7WHctEH3/00EA+U7DwPNBRL2zIaEy1JRdsvsOaj7ADIo8wAmBEYJAMBKlIPx4N3t20o2PqnnizNOluUvEEim0RhOONVA+mWyP+APuC7Vw9nCQi1ELQ+9UaVaG2t/Q9PkyK5fzvHjvIJlLepZUaATIXDqlitIhouKW0j8SXl8wrHJ5z7J2yeTgVUm5+bHA+bfc/Ax4ZYN01zO5yoMXBW4Br6Ms/oXX6fCi3qvcumt4vVchVy68Kg7j/gR45aTX5bZXDl6l2V456VWJ7fV8lPxdZ71+ACRh6GrltVVC8jYS77cI8myJl0uqXg6vCt2BnL1WCbx+qyzvCl7lBTNuA6/fpgxfhxegLqwiKsi7Y5mhrpbrzBV1PE/6jLkRb95REiDLDt5eIS/wA2QkcT2XlszVTVZcCkf+BmTPlmF4tijscv5n6z7Kqx16t+jyjLlVsZ+TLL2o97b16E/Q+IAAAOD3BfeccYVlqjiRuwd/Th1j52KfqAL6pbe1/i1Wej/rcWGTnXzkBKE+XB7lpUA+brqTDyYhFXTdOZL53jH9E+ySHKaxzBiXYxoFt8E09vG2ozymq2jeQ5VeVPrbkrvQ8Lja9pFcSQVymuFxNQR9k4rSH5COySegUegBoJcBJ5/bqbg8D5R+1uKVkKOqCi4Vln/KBCVC0EfIeV3+Wbx8qLuCWHZzJEJBl8ELzzZrzw5UOroqTpicgJJL8Xg5uo7b7u8VXtjFnQ/AJwOVDq/fTsPnMj8bubi0h3V2I2tJTS+ymZK3EcNPZ1VFGQwo/eDE81H4EpN8gEDG6Ic/NvvTF9ndG/eOaRJOt3O9suPfYOBCnzUH8HL6YhCZMPrH/ml2UlV0xXhpk3ZH+L53RAKCAm8Y9YW0NIlkhkoulQLFxD9+Z9Il4+u328D5+Mr1Rr0cX7lOQkRZ+Mr10q4GX/Dd4euVfobe6mf85fUV9PPnucFjxOtecE1mmwTnanDuEdfZZszHSZeM69+zx87HF5FVpsg30MvYO5O+EF+2uFGk9AMMQVSMpzHHni0pvv/0OR/HmvFXU30G6muOZ12u5087J5p9gkP1FrvxWg+UmSM8CrvYE5y3QM9DW/CkmX5In8agx5UQJLCyreR1mmEsAvIHtEhmqniBsvsQ4MeIL/VWVsDHkNyrBQAXLIBsJcNrrL+Jq5ydOjilPsYwdMdXPie04DuPvG6q+i5lTkbEvCIzVMiW0sie7ckMsNPIno1oXY/sp9RmfHuy49lINnljsr/EYv8tur8cXXc7up/i5H17umfFzB3Q/RTn58fR/eXw4NvR/QTj/LvaMMi92TAAOiXP+O1RDrLNfJBbWzEAOmXL0s8jPEzcnPCnpD5/HuHBiaKmRMKfku37gYS/vSXzTziqAGQ91dsTvlRXtUraIvC9SZMjc78/bVHq7mhbqst5S9ze3vwA/4RbmTvWC9zcrwSnJH1+HuHhm2tB8E+4mLmzC++A8Ce4mFpCP+erRIEhybashC+CeBwX+lgu55pP4KcRK1sseUVifUO38C+HP4fp7VD4Dd2+cwi7f0vA29GZzJNV1pTZ80vbC9a2ZluC0fp7NUO+v88wtu08X9woQbB/DpcKYWC/ZUjRHv5P6nDObpZezMp8gUZyhdZTqr1T2/Mxq09vB/D5vgVwanHQyVUZX8QGdUtsgPOxAc7BxmdI+LoY+MnYOEqk0/It0Kn5lldFTvTjT1Xy+GiPvbEvju9/JZ1BgXTOHtFzRSqf4FLfyBbLbcG9uS12pM23MhleOHzHthj8DX3gcwh7L7YYjP7aYpfYYmeUX8On6tsj5u9G32K3xMb3tcX+CWycEIS4uS12hjy+U1vsG0YkLsh95A9DunnyA/43MtH5c4JuXm503NLzwymfa5d1c8IjJ7jX3yOXmj8L9Oa0/SdLo0vcCvDB/qb/WYKpPA14ZErEM6sOglu72UqK5/q+QxGJkoFffZmvbnIq/oS8Ezvr1Mat7jwxLHAjGasueslfWvqXsxZ8JQeKhPLBR4bSs+n6ehvf8yXB0LUUA1LyhdJtpPWUj7okGLXnG6Yuy8Z7cHtru+UR8wFFz9oqdTx97aXqAeQQgxcgJtt77ZKVWvwV8ingNtMdjGctkGeZtA6t7b/GMiyTty+y1aplWT55/Mwy+JdlhSxDCxL+1bIs7z2+zzLfMfTgf56iKglhpH9OROLYvTEv75TWG/R8NGJmBevNttKv+q8xLbviMOTWei3vz04WXWae59jxiqxHL6aJp6Q0er0Gn55JpvLqsX+NyXjGzUALzN1qmVzkOue4OWm8x77co//Psj0z4cXhyaGAT3+np5h2lOAEMkMj0AXDUAxd8h9HgJRA+v/v4+7fxhSS6TPykrt8HYMs8qGIskBVkHVv7QJPkALd0h6ve/pjBjehh+3rKVD8f41rWXEPFySmK5UEhUn83CLNqu1PV7Xp6IaSZv5/F2/xjqqMrXZrhXCMMp1kZ/+TAYgcy4oEbrU8KwgazbvzRuefZw24uVTNB4eejeh/nDNIQR1LtZx5NwZUELb7NwUdQO5MNxVVEOQsjuMl1X5sl/aXX7gb2scbf/zHYpJa8gCAnN3T257vHwd6hkPySMIUIXjtwj6N/Q0NmUympzzsINC9SeKiwogqwOMLpmP8guesXcPZNgM3B09RbccHzlFapywJwS93TxENRY2CquVuPgjWtGPruHDvklulsQfNHodVtGW/UvYUVJnMhuOUKYqhSE9hxn+dR7fO8BRWq/xGjyoOIv65eSgCyYePZpK+1YM8Cx8j/P+ceAUFx7rfmGX56JFgWbIQCP8cb8hcn4Nb8yYfPnpSfY7iqWmi7R+sbgDU0RrPHqt5My6dE0pKPQPbk/9JzuXX181Zd05Z0S/rXpucBdXn1bIuH9ChdSMl4d+0cvKnKPgJeWDo8YVuOuGzl/tvy9BsjxoYS8+6zPKzuGzg/B1iRaXXCa+UzMESxWwmX32TUmqu8RO2TPlrwVEemRy8x7pXWBEFaas9snIUBoae4uLxuix421HyLj143FL4AGH/ZUvxn0KLTbi4Dj+RQIGQjOedB5DjaSZnFCqRb/e1/SmqTyAL0PHSSv/q6x1/PwZ3crQWzUVrIVWQ3r5nIERKKh7Gnp0P674ohQs/7mkUtkEntwMvTCj515EVr/shr/XXFYYpmmAumJ1ZNV9sZZ45aUgBMqYQOVGc3KFwAhHwgvWRzu552zBAy10vWcumwHWgCtbLNRpNFH4H8vIGaefR5OymHDCapVWBbCmiVYmy5YQ2ft9bBZCXNpm5nbwvAvD3lPf12iC53Umo9rhaEh4ImnKWYP2e3/vHqaATDru7kgo6Y71mztq+mcI5ob/DnSicF6F1M4VDwp8T65srnPP3At9Y4ZBFDaTeCN5TZeg1ROaXRnkKEktr3XkTDPlh0jiBPCkjRdIYRlAUk68ojU8H871I4xOCGreRxtnanhMdpfLW/Skn2X9zWUy8w7y7lcWFp79fJItzYvXpwliwFCMf4n+IJPXDaoMbizxVVXBJKhJ5MkGJEHRFkXc6Zu5F5J3QgOkuRN7t4x3oCbb6Nxd5Z4fwbi7y3i9Qvo7IayuWbSoFMk8V/LTY436lXoWG3umwuRepd0Lzs7uQeujNnW70hA7B31zqUe8w736lXlHk/Xs63Uf5Cj2YwpuS1Z/hbEMKouBFMhhSSCiRm9eTwaeD+E5kMHWCB3kXMhg7kVLlrXfiBHX1vWXwCxi+jQwmi+D7PWXwO2795bm2ar/5s0FufDTfrxFI0RRL/i+VGnejWCoz7s9YmfeiWC4/WaVkxUIeN80dKXXzmAYG/3jNcmlz+ptpFuznpNTG6+dCYierYn6EeV9dYPkMFN+LFD5BstyHFL59ZBk7oezwm0th+B3u3a8U/qRvyhUE2LeT5pmaiDMcgw82Lf+DhHwI/OgWyvCk7d9XLv4TFFItVJG4RCqiekUVebqIuRcVeUJo/T5U5O1r/9CfHwJD3uHe/arI90Ng/6pkb6TnHvhXU5Df7es/BLvgrC9frntGSkqxeyaSGIpd0z07fe3ei+45YfPOfeieotOHK3bPTqiR+ua6B32He/ere97vwf/dhObTRqeUHDD0uMn1h0XJqtv/cwaM70UM3+2GU4Bluh2Agk62FW84PSGx883l8PfbcFoU5v2eGy+fGgw8djtJcxa/G01/Fc0V1um9KJq73WgKqOxJhzdXNMSPr/S/4NDZGysa4pNW5Dc346t3P8Z/z4H6LLZUJJkL4k3fjQLlJFI+p+FPSZl8ux261N3u0IWzh7XcQcME4vvs3XBsPW3910okROC/QjUtmLqRjtlRjEhJewJm4P4VJfjtdvgSP2fjx/h1I8mrSPAHR1avrwyq9mkuXghoXpNcUCV8/aVzL5rjbjc65zXH7SvSjltgvoHmuBTB326vM1EU/rzmXueG7XmKIWTb3jxLV+mtqX1zQXm7HRF3tN25qEnwXA9O6xKMv5p0KU2C33fZc2DxHcE6SWUT76jsP39MW043kr6C39OgT/cfK9D/lx6xfpJLdx+9rF8jlDgdoWcHd7LN3or6C2FYAWRLa1T+ft/i08zOT2Hi62ZovD2g8BcNz0zFs/ZIQUS2UPmVBof3s/1XgsNT2udM9+G9UX8xVXCWbUbEFOjEl43WlUAKFJm4V8WUp6ipnfuLqbIwRWbPSy0A1UsHlWpA9b6NfiVQJSPrsvArqkqEFXGS/is65uMCXCUvPdsOXt1rJ19+PUi4nT7xfw==7V1Zc9s4Ev41qdp9sIr38eg48SRbOZx1ZjYzL1OURNtMKFGhaMeeX7+kRFIkGhRBCABBiqmtHeuCoO4P3Y0+X+lXq+ffYm/z8DFa+uErTVk+v9LfvNI0Vbes9D/ZMy/7Z2w3f+I+Dpb5mw5P3Ab/+PmTSv7sY7D0t7U3JlEUJsGm/uQiWq/9RVJ7zovj6Ff9bXdRWP/WjXfvgyduF14In/1fsEwe8mctRTm88M4P7h+Kr9aKV1Ze8e78ie2Dt4x+VZ7S377Sr+IoSvZ/rZ6v/DCjXkGY/eeu969G8+/Z79OU0JunNN694SbaJps4WvjbbbC+339L8eEf/9Xu/vz98uPL27++2e8uvr27/N/iQi2/tvyyjRf76wTzhekf+Xfid4f9Am2/iScvfMwJmP/25KWgaBw9rpd+toj6Sn/96yFI/NuNt8he/ZViKH3uIVmF+ct3QRheRWEU7z6rLz3fuVukz2+TOPrhV16xFo4/v0tfKdhy+FXNlEg36seJ/1xhdf5bf/OjlZ/EL+lb8ldLNr4Uj638iV8HXBjFmx6qkHDyJ70ci/fl4jVq5xTuQG29ndr3Kbk3J9KkPF3evFhWOUorE6WVrtqAVo4GSeVYnChltFOqgrrypCrpg6W3fdjBNXvghcH9Ov17kVLST6H3OiNWkIqLy/yFVbBcZitma2yylVfP95lknO2FkTZ7/PoYry+zB9naGYGsmW5kS2fPvfO97Ks0fWZa2evPKZXfuDM9exBHiZcEUfYtFxmmiLGuN/C1mX/GzHS1yv/MGjuNmWPZrmaYmqMrquNA3ur6zFDsw/9UDKvVmaprim0atmq5jqpx4rw5Ls5zYbucnLPaOeevlzlB36yjtV/XHXVFs9cYhQ5PqfA6pV788i2n6e7Bn9mDmVk8fPNcffHNS/6It07ylzWT5Ci7TIy+KZ6L/TBFzVPdkMFxKf+GmyhYJwchoCuoELcQPbaNHuOFn3/swGy4kt22UuLF934CVtrBpvzh9Eiy+0HSERwZ+ixVdopu27rpGK5iFa/e+HGQ/txMwuw/8hwk3/Ivzv6uQDR9dFg5eyAKoHtutctcSYBsWKyAbKo9A9kBQL4OwgwsKfOibbDTEZoy97YpWDVl9yBYbR5z7ZG+yY/vonjlrVNzGz0BqZJI6rCv4yU/FlVw5U+Rq0actX84XB10W3fzXUdZZ2fHCDVcVJwBbx+B3Ekazm2XS73Y71BgK/Cu4+KuOgqvq05x8o7RqjC81lHSBLYKuufe4sf9DnyfH5MwyJC8e37pxT8+p58Kkp08nSkm1AB7N4WmYLVBSuzES9eLu0HabeBes5UOWOKIZYmKYYkVZkJkGTzVWGP9fMy8HK/vUtJc7Kl3mb5BNTbPOwIVr6d/3ef/3a0zZ7JK9v7M97NHwuF9H70nP5OKN3FU/fAJ33iTC+Ft8dUpYfffXt9R+vQc89yObMOmwCzZPpV7j5lsnoiGpziZ4BHubkGhhz7bce7NVA0eQqC4i5MJZoObEMB5/LpA127g/sXFKlr6O+vvLhPO3TDVtCoReOSwhKoIsnlaRgai7LGOTQUDKp0bqGR1bEpoFxF4NodtF6nd3Zd9G0Y4n2NNEdMoRWLtzdKA+OPqurhDZlfKo9YLTkkP7feyN13aKcXXdDF9Z2ngTBdHm2fRWWamS4dzKovtMhQPczsPuynCVgdeEa6TxIOnAg+e6c6K+GFXJ54G/NqYxTj78VQSj3QYBpstgcL2tpt95sNd8JyBEQNEbrajA2gJzzTOpaZ1P9O4BIiraL2I4mXmz8wJWEsqYZTpcP/u0zJ8en9tXN+vjZcP37/+/fMpFXcEkcVTch1SO8xZ6rhTr+mGYS6xbO3OQeRoqY6D4aCBs/4peEhKWt6CGSG1r6Yi1saR2rVs3cMLWMmidejVREHNW+JoHeIpV100f6VBOKbs8F4qb9tkb9ge2bLZsOUDdvZL0oreBnDxjgaOH1yqi559enCh1h0ncJVb5gwuGKHjCq5TlIQs4HJZgcsQBS5D7QdcBPHCCVx1TgHJRRv7FwcuILksEeAqfs0ELmrJRW9zQXBxsrlQySXI5jJwoWKmnpY8TarIhaJIkxoBIM0m0SExIHuSdppYaTeCS4AG7DTqfFBhlwBgp4kBF0H4UjC4aNJIGwG55/MxChTuVEmgqzr8oMvLOdITdAnCySyhSxXakF4uanTgAiVmioKsxAhcaMpHuWXSnaWwr32AFxoJXOyTlj4aKqMXdSBOBm43jNDYKJ1Jd1YKYc5o5B4uPv0ScwqCCdS6XGFhVEdqqio71oGIL7ZMvjNXCNZ7KpkSkhohl8QGXiRVR9YgvrSjATUV3Q0vL1KxZVJ3QrkzzijmHY1hhOIR2B0QxYrLCsUqshI3FOdbJkexWvsALxTzDvuwqDEdgRyGQo3WSYDmYoCV2KWX4SFTkER2wTcCnzsaYcyEwgztY0JcYAw1uGjkcI/U9K8yq+mvDVSQ/aKjpBgzKKUT1K+iMaZxxtj4lJPmqjWm0TsZdR1dydCI2N/dg92wZb7Wkik4+DIGcDkuK3Bp6EqEifWdwaU0bJkzuOQLj5ROw5mi1A14q3yMN+GzB2i7mUaotjsHC8xIAmrdqiMEBlPkA7XRsGXOoBYcZRmBxNRtVLdptOASpo7Nhi1zBhdB0KShlJtfOxsVU7aNq1JiUXl4/ILBpwCsfiCR7oEk1WEtvOzUUZBR+RcpYUlcvgMm7EudgP3R2YVkTWXzbf4wipOH6D5ae+Hbw7MI+Q7v+RBFm/zJ736SvOTVx95jEtUZgouQthg5nVvz4VkJFWD6zHWQUa3BcDrO6nZ7qhARBN4Kh1A5Emu907BRZBz0gw21OzbULthoQ8LpYmDU2CDwL9b74BB1QKgYkde7f6Lkcb3lSGFbYRoLYqQz6ndjSGVNai3YuUoZFiC6fStAi8TPNGgSq72TmMTbMiASYyLbvZMY3v1vF8GPIKk2wtk3qvHC0A+H0za2eycEcEvE9KexMNxh0QatgTvw8uyt10sv8cbLBhie6p8N8LJ+++nm1TDbKXeXWpqDMETvnSHwkv/bh/cfb27fqlBuLQ49YsbLJNjuoH8mwbSlnEnaxKRcrWP0v1gmFTqvT/cs2kNKwfSQ4tZUs4EsBLfUwXTVbOG8jG01G7aMu9UOs8/k1FfzpGWIKNXoVZK2r+bJJzUXoQIbazZsmb6BMzO1AnIpFUxfa9F6ZTjdmnfxZD9+m4qGZFtB8LW3CsJszXd++ORn1hQC7ZOUksy9nhu23NrsmeHUBY1qXgZmTkX14npUMmMmPGyWd8MX59TwNqAuIMhiF3AgpJH9EmR8wEY4Ihv1N9CFIOVDEtlPjVbauFmP4huXL0IlvoEk3j9xFcX70Go2gA11e8wW9Wk5vYvF0yUbtZXbOQemf0mHK8xjOvsmXTlYesk0/UZcq2tc/I2f/63YgUT+Nxl0pTN+/1vJ+eHoyiLtfTT+tyR+TEl5RNtOTq+uRZeszIEOx0MWc8DhXXnWc51+e1pi4aCXpEoDbSx1Qn8g2JmNrGa2c38gtN1Pa38gsDMh/YEc+QrhGnErCxphowBaNAKQEHYJ6NyZDTR1UYWAi3dBGoNGa3wFqUOatS2oQBg0vqFtjSIMulpTNxfO0JXA+Yk2JJWg3M0ZYrmbM4ByN2eI5W4dCCtLuVvRIWEqd+vo5SAuaXKIS5qIlaOgkqZC3E7lbnRiYNTYGEK5Wwd5LLjcDTfz8zZYPYa7mIhWIX/4+Jf782W5nYefLn5Xr778bT17xdyZOsOox4Biv0JTCTTwKWNAqRqdHaNGFz6jF0DVwIVMcG5oJsWNDRTn3Q+TRZPC4xyQ5V5XVPEX7LUMZA3ie52KruQgK7G61+kNWybfmcW29TAepIUpKvXogoGCtIDAySA1XT6dhSFI9eOdhSFIGXcWbgAp92afp0csxit9wagOm3bwKUb68nIIu/gtE+/MZjxvsAHY2iR9WUUtUJlJPU8GgJTXPBkUpGbbPBmgFxjPk2kAqXxj4gYKUpDCRA1SHZXJFqf4hI7OMDIZz3drwJzguO14MWegDLTQ8Il8mEOPSbFlzpgbwhS3QWAOWIz0yrh1JXZN4BtQwXua2tmgAmg/ekmEuvvAStxRMeaJTYNAE9BrpunOiokNneeZoFNsxYsZEmdxfwFyVo56IM8tTP8XXiFzPOWLDZwb5U1M+xLBlIfejzfRr/XWW20G1MCNGYPQYccmpg8Ar647DQyCN/99Q7GtH6Yw35UKnjuXLMwxEswl3F15X3axDJ6KkovXV9dfP3/+cFupyKi8ivnAahOEfpYAeKTss7bCmSEBXIdNTPaXYCQQ3GAbsjo7UqVzsifatdUUWbzXQC2Cm91givdaACFj8V7DlnEpuMMs3nt9+TF9+V1Ktt15SZng3fudquSG+btHV0/IpYnWySdWfD1hw5YJfCGSaB1LZGutBmoRXPSHrnWcBv7Jq3WK0yGkO1ar3Dxplf0NbfEQbDJmRdsAbekyCpHMpePLyXCWRSSXElQ+kQxcrRYhtXiefuiIGptM7n6/7V8m4/JOWLa82tvn2S41Zd9oY5vn7C9lbnYl0BrtgBppRF+zN4xRs6ttWdgxoi5V/Vdi8HOLGfK6xWDgsX+/mDF+v5gxPL+YMR6/2NTUSpqmVicfD2nUPu+EnPNNrKFO0zJArJQwq6ZrwqihNWyZb8KowbvAk0H/Hw44rdbBH7+5SYJoE0W0japp+RCN5kqyrkf6dPPy4yb6+PKfm/d/Pv71RdEu//pxMSVAc+rod0ICNCraTMJU11M7+jEv9MAibkquZoQ4HVwkqVufgZVMTojTG7bMFXEkHb4mxAlFHBgKyQtxei+Im24mrBDHrnwS5Npy0qowSVBI+aRO0jBh+DntEBACc9rxPaoA3W8evK0PSVBDAdnaWjtPT2ngRNUp9igRyJOqiyY8OR8dTJpOeUJrMQNUADBjpOCa96XnO3cLHPGthePP7zoTXxKRrSKcVVVaZ1L55eVK3FqbN2yZfGegXxuHFr68C+QZtMS5u/OtBRbUS9udK136c8oFalSp69Sg1hHo6LxAraJf1AZqsDMRoBbVUF30ZIoBgtpBlyDGtIEqc04d9hCxW2yYeF/1hnx8AM3bv9XXhIABAFpHRJ6r2zO38s9hBG90IVbwttDtO5325aq19/OBt2Bn2rlAVVWoO+yhKODUYA9FZ7lj0o2pCuOAFpbN3Lt0TOKXHNNoE1NaTKuKiazEDdRuSztUsDObbTtULJ8Ft5U+G4CWM5ZPNglMMSZBuWNifKqmAKOgiAtNvjPGBqz8rjPU+ujoOXMZB0fwTCaJjTCE552z8PFesLljGuZgvWCohwiYmsTwtBtMQ+bwNPE75gs3TTq4lVbrTFHdmuVqKO5R2zV7cOPHQUqUzoOk81/dNkTHkGukoK4gPZcVldZKsOorAQ8EK5Q7DTvmC3PBAbOxSlUVdfvQSlUdFXacMlX1BjXAF22C20N3E6qaU3cHmC3zzvgL1QKckoAc7d/q0o7zQTPHXE6xBt3Gb5gvyAWnYxOEa/H+LqtveBcVi5LAGxi0qqNR2gxoFLacGMLZNC63TL4zpfYBTkdCcL449ZEwj/uARZwJW/IzoVDKfIA8FdUe3M6E0hJhBjvTRKgJwUG4AasJua6W0O+BCnfqI6FYgtREsWXynRki1ITglPsBHwlX6iPhGqxOhK5oM6vyzxZyPlyj2/FwLRGnQ1TZ8ilZdiON6YDUUYb50JwcPG350HxAKnouLQHgaJI9WItryS66ICBNHQMShmYQEhfhr+Q+wPZMcjZMtJyEOlHOQgGncKoAtFDxiWTKcQKcNgGOBeAskBJGW3IqDnAgKU1EVoXoIbFjBZwJVCptrxoTAI5TLrAJVKqI1F7RE2LHCjgbejlpJZwrCHA2yNwUAjjB4bixAg5IOGobDko4TioVSDgxNpzgYNdYAWeASwOtDScOcODSIMSGm8q5pAccLxuuH8AJjtOMFXDALUJvw4kCHHCLiLHhptopJoDj6RbhBbhe3CJEw6UnwLXfUkEeAbXjF9xSOdlw4JYq5NJA1PltAly7hAOZLtQSThTgLJBPJETCaRPgZAccL5XaD+CmSAMTwBmsyosNmBTIB28Gmi3VUl6MbkxIebE+xSXYwBNIMVp5KA6fLfKQYGciUqv0qcteb6hGqyVVRXVZdCVrRBJvGSzoVjOFQqQXqrwA14+RyTsUMslIchlJ7c2GIOV0EwJSUYg3W5/CJ9JLRV6A60cqDqFyZKQghVKRNtcegBTUoHCTik7H+5AqAtXFr5/EqLRilBtCT7+xiygfKSfjTAhlm+tKHUW0hUURYZG4CMBpE+BYAM6BhfmyA84BYWshOngK6vCRcNR3Zwg4Xtn8sDOMCMBNYRrpJRwnwPUk4abyESaAA/0mC6dE596BbstCDXBjBgiSmEZ/MzrZTHbENNnPYdDXhE6DxLM/OrIrbt9kJ/FVj4/sat9kJ/HYjo3srt4z1QtVN26qA/3ZN9VJvGTUVBdHWFgg27cUMUncQQOgLKwi6J2yJH6PAVAWYrZvg8PEXfCtMP3a18vgKf3zPvvz9dX118+fP9wWL6VfVXkV84HVJgj9x03xyjxG34uugDAzpWdS51r9fpbf+6qXufwpLwzuMym/SFmXtYN8nXEnWHjhZf7CKlguwyaYHJRG97ZkHQa5ozpBgSiwMCjQuaEA3rp/+/D+483tWw2ycPPgpcfsvDl28N/0xjJ4L779+v7r1bsz50w5CaQ3xsCb85ff33/4euZ8OQSne2MMvFvnQk6dhNyO8mhb4d7VEu5aDmyN4qm7aOe3PLDL+vkYFS9cbIN/0mcv0zeoyuZ5/7H89WKhHA3pW1KeeIlfsVX2aw/QhEHuvdwM26aByn1Bp/gy4dDZeqtNOEGnw20TdUv1Dh2cg+TIlUhZROuFl0y8bRcLqorxJIhlLoGP5j6l1eZEkuxQsfYXiTcvllWOHwOlTikNcwxcDKVA5gI7UhE4XbYP3sbfQTQh8LzMvcWP+x0QPz8mYZChevf80ot/fE4/FSS7bOWZYr4CgdydDH6j4Z23Ka0TL10v7obnEgzkgAY8ccTypNldk+uwO29RV2EfvSd/nb5wE0dVvUWg7IwGZVdayyetcpNZ2Msdm9Z+WLXAGWwtJe4c81xdFyNYrgOuBcoshp2gqM12d7vHuWpwQXEuXDA3MZxoQVPn2KGYIMlDCiGs9C+ECdIfBi6EzQbuySuEcbkRFQORgfwanDCPtsHOnuwif1tjCWdIyFmyfepDFxLdXdhqSIJZMqw0JLmIkUVDEuQByaAhXa13BUmQujNwBek0ME9aBVmAZFKQ5SpXD4/rH1tm6nFoP3+WPCedfjzfu1n7nHpWmof87EqieWyc91NCzWNjAvBiNY9N4Eoctuaxuzt8+9Y8OJ/lMEXm68uPr7IjcZ3+/9V/Lz+WyVejcZER1N4wEsMdgCyLGCYovOtFDMNKNkyWgGBBTOBOHLggNhrYJ68gxvktOwbboQQjlt4sBfEfV5kIDlabxySLV3S24If2eyc9Q69nyM+pLHqm2ZdNhiK7gf0XF6to6ZcHpyOomlbFgwpBjxwpFlUI2eQQ6p5oiUy1cTAXI9PAYIoixyKaf89KJzQl9OZ+uP/YTexv4mjhb7fB+n6/YoG48PEv9+fLcjsPP138rl59+dt69nJtXYdolSpokXn+nXhM47+BIGbWScJ0D+1CZh+hRSduG/VrsIpxwJrYunIWJST4HyGqCWiXifLcWSZJnX/J6xwPukNZ528qyEIGshCrEWoafsPE+7Jq7z+5CwWeybxbhk6QJoc0kFzUkEZ7rnCCtFnXYa37Mhl38sEzeQBNSeeKr/sWDtKK7yiOM1hIFw06S6ufuuWjg0AHNeJYdXxs2DDpvgzG/R6xTC6+U1SvoNHC00KMStOWHJ5Agtrd4FmoGL7wFNyOdLzwLFpOnQpPy0YW4gRPy8RvmHhfQuCpDcAg8BbOUsdBOr2fGOZysJBWkUYQFvUkYsT7ww3SDRsm3pcQg0Bwe9TRXsFQ80+X/QqGKni94xVMZzxRDA9P3s1US69CKWX/rMjf8/YqGLqDmyDGGuAar0kmpwJchI/BETUyb3KbtbvNVLM+Mq9w08uKcNSLpra4HNB9qUIsDN4T9iaEkyNc04eNcK1lMC/YF+MG3HiED2GkX3vl0EARbvNBuG2IQbjdEeFgX0IQLnoeYDtaaaT+MBHuaHVXF7XhLQzSDVdh0n2JMbxFBfdOMEtGK7S5QdqQFdIinCVFFiR/O2Sm6FoN1oZpHwV29uDGj4P0Jx4SRs8E7JarzCxDKf8hXl7Vmal6xWBhdBBsZN47J3Ml3X63fbm193M6CLzDiAwM8tFeOS0XEX0urWxHEyp4yXZUVrstsh3dlypCtmucIX0u8HRURIBRmx6i4IkmYrSZHsi+hKTKubzDiCnG4pfSnM4eVGRu9vAgdHePzl3qmqo7K2K7XZGNWufgiLCbX4YHE++g37mAAs2hpB7srgly96oOfsPE+xIi64SF7GamaVUszAtlpijGqfes0aI91ZSzoo6pMOXSy5XjgKvXyfC3kVsaI/hryOVKV48XeoB9uQIKPVzu8bxS1R+0+5815Y5X9VPyHLkm0NEliI+CjXrukIV42wa8Y23nIi11JEM4lTWzYhpb54G3KCgwa/HGBffyM0YXkPY+WsPEExqHzXJpaPGEXkBUwhRIZmDiHRs6lzIEA/X+siriMgivpMKLuETEecoWNBM+WZeCu5RmEcCBTmYWnVrFVWyYeF9q7f288Mk9/iKrN3DP5qO0kewE2CigKCW0pbYsxFlnq4o2ga6RNpbUoKOOwaCgE+2pVhWCuMeJnbm8eJH3RDKRMVRZwgfPvjlu3QrXCAcvozYes645qkISGKCeaV3nEzLui2TgNSO6w6oEMrpz61ZU6qxx091CVSFmko1gupO4WkdHdxMz8E4w3UmcjIOnO2ga1D/dSZx4o6N72VWwP7oT+LuGasXo0lkxxTE7L5T3r01V6JW4fXd58/b9V7PaanWIw8xZWfxIHEcrmsALGD3bwDN4qW8eQL/IBsacG8/K7iXFbcGG50wwz+CdOOeZNvEMyzND4IjnBp7Bm3Uzz7abMEj+jv07P6XM4uykpGVKxz14Pz/MWgcnLlpnv/XcuIaeOVPvXbfB2/2X399/+ApZVpsEl9snsZ8RqXoKieZRjpzLaFtGA2N1CuZyy+TbPZ++XB0fXld56l/rKF6lzPhnb9hr5J+M/VX0lM0XWD2GSeCFoR8Gi+1uBcVPFv9uBt55g6rs642U6FdBpeLuMmjTRYaogp6St89J7C2S3USB9Pk42I0nSgmSzxnenhvbUImvYaY8CJYFBH6WhglV3Why8hR3kTNq8bQqdjCKCVUtcJBxRFUDV3CZHVTDAsH4pXwqurf2Q6iIZk+Lu6Pqqe/xqO1B/e7zkk5GTWEEkR1mFgOTGlADvThyyj2cihAt9wgyCgYu90o4DEju4TIPWMq93/x1uleM4Lvztokns+gjqJlhJfo6AEca0UeQOiGF6DNEziRtoBVBusPQRV/3W0zvoq/ZgTG0+dCFkFVmqxQpY5vZSZClz0wQk8NYGkFMkFsihSA2CUnF88gT+CmGLoidBvbJK4gLv9QIBHHDLZ9+UrTYX56b5uGx/Z5GIP/eXy9fZYJDGu0i0MwnP5uyaBcd5xeTQrugtbRG/y4OncAdNHD1UuJhQOoF53gaqHrJkveWO0YhemYUhr5AZ3MHHEsjigmKhOQQxRJ4m3UC99TQRbHRwD+JRTHOEVZJ52AgxQYn0vMsBoorwpG8mDMk5CzZPvWhEYkSidjqSYKO4sz0JLmQkUZPEhS5yaEn1f5DEzqB93DoetJu4J/EehLnpzxv8X6V1VNsmWnJof38WfKcSHRRa2/2xkwBkZ9eWRRQsTLm9AK+bdObPBEw7AZgXFysomUW+qsgYb9onk+feQr+zrLq8fBBcCJH6mwVLDY5WCgKA5G0elx+qIlrGEORSxvNv2cVrZoSenM/3H/sKlovonjpldVGBbJe1t8+3ei3P3++/Pr1n+8X92+/fbq6qP6WHIlVmijIl+dfiIcu9gsI/JedBAltF1qMqVX9oa3EwTAa6X+nmRgfRYmGmppnkeyO3bHgCeW+ujR9G0d917J1zyqpjzRToqe9oIavINeRthu8DhLrOY0o19FOCRbjgbVYLgmePnA+eEMvAvR449T9E+KN8TwBLJd4jxOg1S5Dw5vLCm+GKLwZah94492//0zwZgD5hl6V5MMbkG+M2xVjuSS4Y/9Y8QblG639BvHGyX6D8k2E/cZ9EsDpg33HilHQ2oyhTOSF0V5kouABA2O9Y2hAJtLiTdwdA8hEEXgj6jk3Aa4dcA4/wPFyovQDOMESLoWbszRwgHO0uW4NF3DFRIoD9zRawKErachKzADXsGWugCPpFzEBjkTCoUmK1IAD6Y4WJ8ApDVvmCzjeQ066A+4wODQbdVm9nVjlY/K5oSfBNyXDDibHCOj2iXPdQkHjSI9ztA9VsWW+ONekw/kgBatuo2pRoQWcME1uNmyZL+BITEex/bHZB9phdAzTt5dXf2x8mJ3EgBod2U3CdvD8yA7Df0c6XKN5MNKlJHFjHHDTCey6i2ccjKMdaXM9Ma5kHObEiWVct8R34nz2k0kFfEsi68PwpBp62jsNnvf4aOSSwKx3/Paak97zbG2a9Gn6/mUnZGv/cXWd/drV5jHJynVpmz8M5veOrhKZ4K5GluDO8pyKz2/HpxES5K32omZAkNbENEkXq2cMgpzL0emZPUCk1TMGLi+RZcvLqyjeu1QOVb2VvpeLepmqbJKPWTsclsiRRfI1l6+T6c/WIp505WDpJX5Hhdq07uBLfvoqBMF5Srjd24yh3NtwgzQF69NzvLcZct/bjLHd25L4MaUktxZ4o7ssEeREcTQZ5L4sFcplyk87LciogvgGWpZJGmTU0LSxMlLCOMioojmcxZb5FsHwzt4YWfQaAMtGI2LUwLJRlcsLWMWW+QJLGwuw6homJ+KFAccm8sxGs506D6mz0YTBTjMbtoyFHfj4BZI9WSTSFl8f3d1t/QRZkA1yR19oj0U0xvk5iARyYQoauAKOJpBDQBcTjQoJYtrCEC24tHqsqWsY0FJK4ZHnBFuCa6tHCzhYW31ugEsfxlHmyTi8PfY2Dx+jZdYU6e3/AQ==7Vxdc9o6EP01PNLxt+GRAEm4U0Im5KbpU0bYwqgxlq8tEri/vpItgz9EMIEU45LpTGAlJHnP2T3SorShdufLmwD4syG2odtQJHvZUHsNRZFlQ6a/mGUVW5qqqcUWJ0A277UxjNH/kBslbl0gG4aZjgRjlyA/a7Sw50GLZGwgCPB7ttsUu9lZfeDAgmFsAbdo/YFsMoutLcXc2G8hcmZk/cTtuGUOks78ScIZsPF7yqT2G4rRUNQlaKhXDWZL/qndAGPyQYds5/myC13m98Sj8YzXnx+gOxp/B/c9LD++/vzn16L7MHH8JnfFG3AX3DljNF+4gMDEFwH0yFeuQCmsgLuWrBK8ArzwbMgGkemM7zNE4NgHFmt9pxSlthmZu7x5ily3i10cRJ9VoWzr0KT2kAT4FaZa2oapAiN6hPyTfugqGBC4TBGpvDtuIJ5DEqzo+2SANicSDyVZTZj1viGmKXHbLEVKTedGwIPBWQ++F0i0L8fpyMQyd+PqUGD9A72/zhJgkgy7IxB3o2LmQNGlIihtASayVHFMWrsxoRnNZy89TFPAzmCbAOvVicJztCAu8iC32yB4HdFPIcJ8KH2TdGrMhnEYywJbPo/NJBdT8K4oqATQ8VisSnvEqLmFJofQQS+A3zpH8NsC8A2XsJRJnZ1hgfHfAicNzRioDu0ga/4yWkrSTm3TmBob2xC8QY823Ac43dFw+O9oxsknp4tHueoMafMt9X0U/hRsJvh8aOq/SX46aoufMWs+t+dejxIcZZhSnjpEiothnRNnKs0tWxOJc0uZqIYhSARsdXxHKWunSgxcGLRysqBVOzEkO446KLVh1EOpleK+vGZKvWbdRakLrhEdiTKKVVZ8jqE1B40yvrtncT9DPuMEDhFB2KudjNHYatmqSMYUVdN0+3gydtyoqZeMqaKoocPLcVba+fJMJE+WZD0neiUBrHze02oveuoWolxEL3mMw0WvoF/pQ2PIq5t2WoSqpid/7lh0XELWS0+MrYQstycyt+xbms05tmGKi2Li5RhG4SEiGiXk8DDLbhkmcRNwkePRtxalAuPLFQMbWcDt8IY5sm13WzbdMDtPM3OP0/cXpL1SpXNhKlQrTrwzrpzLqppHpS6bk9rXzpVL7Xyra+pTO3/qXtNmEiwoHh9seC6F6n2/RT7WjuxSqN7mGrVEURR6dofdENlsf1Iw55JoAXS4ROSZw8de/4zyr87f9Zappt4qgfmLrxuEeBHwLPHxwQnamUsvn6JLih+6gB+JLYB024resrdpRKQpt4j8gu4xinLquuAh5QoeZl4uYifxz+3B1R0zt7V8pcXITUxA4EBy6MSUsWCVGtVn44VHc2C7tcV/n1st/Vy84L3ywZplp84iSZJLZREai2AefrOhz0QxLyQzPJ8swt0icsKz0HqTnSCsGcWNl6wIQjrZjVU05WvFlM/BciDdZa98WA+8jORq5Png1Rx0Bs/e6LH970B/6pChb/0aCa4TXnWGLw/9m8HoroBUNUoaJ6SBkj8s66aABqLbf9UpYQhpUKKQX9UKRv5GwRnWL4SQiMrr1SxfFEnxIcf+6mqF0DPbC9e8WGEDAprxITywMifzGSHsDn6HPUZ8wKZKi7HjQuCj8JuF59RshbTL9RTMkcsg5kWLq/zdg9KXFKz1IWljVKfRT4nvdfYtxFBqiasND9BB2Au3FhiqUouQop8TBcq5FhSEnilRaD8QOhBYvAakS1EaJCC6AaP22qfeexQuuQswFe1AK45piTI9HQX5YQmJA6Ef//nRFC0ZB3IylkKzrJL9MXh1JQev1i6KmwhepdrwFgvx487wcTT6Pn7pjp76D52b/uWcUSCDlCPDmh0pMhgCMlT8mJEcmv/2YDf0HL41CXa5WE54GvR/vPT694+3lzAv0KBQTqhJmCuXMI9KE/ktW13CvHhzd3DX6z9fIrzAgHyir2CER712/Kl78p3Q5v8XUPu/AQ== diff --git a/docs/images/metro/Phase.png b/docs/images/metro/Phase.png new file mode 100644 index 00000000..3946e578 Binary files /dev/null and b/docs/images/metro/Phase.png differ diff --git a/docs/images/metro/PostProcessing.png b/docs/images/metro/PostProcessing.png new file mode 100644 index 00000000..2780766b Binary files /dev/null and b/docs/images/metro/PostProcessing.png differ diff --git a/docs/images/metro/PreProcessing.png b/docs/images/metro/PreProcessing.png new file mode 100644 index 00000000..d1ad65d6 Binary files /dev/null and b/docs/images/metro/PreProcessing.png differ diff --git a/docs/images/metro/Simulate.png b/docs/images/metro/Simulate.png new file mode 100644 index 00000000..a6d2f0eb Binary files /dev/null and b/docs/images/metro/Simulate.png differ diff --git a/docs/images/metro/txt2image.md b/docs/images/metro/txt2image.md new file mode 100644 index 00000000..33aaba3d --- /dev/null +++ b/docs/images/metro/txt2image.md @@ -0,0 +1,22 @@ +# Install desktop app + +Got to `https://github.com/jgraph/drawio-desktop/releases/` and download the latest version for your OS. + +To install it on wsl + +```bash +sudo apt install /mnt/c/Users/llenezet/Dowlnoads/drawio-amd64-21.6.8.deb +``` + +To use drawio + +```bash +drawio --version +drawio docs/images/metro/MetroMap.xml --export --format png --page-index 0 --output docs/images/metro/MetroMap.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 0 --page-index 1 --output docs/images/metro/PostProcessing.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 1 --page-index 1 --output docs/images/metro/Concordance2.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 2 --page-index 1 --output docs/images/metro/Simulate.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 3 --page-index 1 --output docs/images/metro/Phase.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 4 --page-index 1 --output docs/images/metro/PreProcessing.png --scale 3 +drawio docs/images/metro/MetroMap.xml --export --format png --layers 5 --page-index 1 --output docs/images/metro/Concordance.png --scale 3 +``` diff --git a/docs/output.md b/docs/output.md index d7417cae..97d7d4d7 100644 --- a/docs/output.md +++ b/docs/output.md @@ -10,34 +10,68 @@ The directories listed below will be created in the results directory after the ## Pipeline overview +## QUILT imputation mode + The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -- [FastQC](#fastqc) - Raw read QC -- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline -- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- [Glimpse Chunk](#glimpse) - Create chunks of the reference panel +- [Remove Multiallelics](#multiallelics) - Remove multiallelic sites from the reference panel +- [Convert](#convert) - Convert reference panel to .hap and .legend files +- [QUILT](#quilt) - Perform imputation +- [Concatenate](#concatenate) - Concatenate all imputed chunks into a single VCF. -### FastQC +### Glimpse Chunk -
-Output files +- `imputation/glimpse_chunk/` + - `*.txt`: TXT file containing the chunks obtained from running Glimpse chunks. -- `fastqc/` - - `*_fastqc.html`: FastQC report containing quality metrics. - - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. +[Glimpse chunk](https://odelaneau.github.io/GLIMPSE/) defines chunks where to run imputation. For further reading and documentation see the [Glimpse documentation](https://odelaneau.github.io/GLIMPSE/glimpse1/commands.html). Once that you have generated the chunks for your reference panel, you can skip the reference preparation step and directly submit this file for imputation. -
+### Convert -[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +- `imputation/bcftools/convert/` + - `*.hap`: a .hap file for the reference panel. + - `*.legend*`: a .legend file for the reference panel. -![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png) +[bcftools](https://samtools.github.io/bcftools/bcftools.html) aids in the conversion of vcf files to .hap and .legend files. A .samples file is also generated. Once that you have generated the hap and legend files for your reference panel, you can skip the reference preparation step and directly submit these files for imputation (to be developed). -![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png) +### QUILT -![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png) +- `imputation/quilt/` +- `quilt.*.vcf.gz`: Imputed VCF for a specific chunk. +- `quilt.*.vcf.gz.tbi`: TBI for the Imputed VCF for a specific chunk. -:::note -The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. -::: +[quilt](https://github.com/rwdavies/QUILT) performs the imputation. This step will contain the VCF for each of the chunks. + +### Concat + +- `imputation/bcftools/concat` +- `.*.vcf.gz`: Imputed and ligated VCF for all the input samples. + +[bcftools concat](https://samtools.github.io/bcftools/bcftools.html) will produce a single VCF from a list of imputed VCFs in chunks. + +## STITCH imputation mode + +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: + +- [Remove Multiallelics](#multiallelics) - Remove multiallelic sites +- [STITCH](#quilt) - Perform imputation +- [Concatenate](#concatenate) - Concatenate all imputed chunks into a single VCF + +### Concat + +- `imputation/bcftools/concat` +- `.*.vcf.gz`: Imputed and concatenated VCF for all the input samples. + +[bcftools concat](https://samtools.github.io/bcftools/bcftools.html) will produce a single VCF from a list of imputed VCFs. + +## Reports + +Reports contain useful metrics and pipeline information for the different modes. + +- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline +- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution ### MultiQC @@ -51,6 +85,7 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. [MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . diff --git a/docs/usage.md b/docs/usage.md index 47978da7..d7670e1c 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -16,41 +16,67 @@ You will need to create a samplesheet with information about the samples you wou --input '[path to samplesheet file]' ``` -### Multiple runs of the same sample +### Structure -The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: +The samplesheet can have as many columns as you desire, however, there is a strict requirement for at least 3 columns to match those defined in the table below. -```csv title="samplesheet.csv" -sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz +A final samplesheet file may look something like the one below. This is for 6 samples. + +```console +sample,bam,bai +SAMPLE1,AEG588A1.bam,AEG588A1.bai +SAMPLE2,AEG588A2.bam,AEG588A2.bai +SAMPLE3,AEG588A3.bam,AEG588A3.bai +SAMPLE4,AEG588A4.bam,AEG588A4.bai +SAMPLE5,AEG588A5.bam,AEG588A5.bai +SAMPLE6,AEG588A6.bam,AEG588A6.bai ``` -### Full samplesheet +| Column | Description | +| -------- | -------------------------------------------------------------------------------------------- | +| `sample` | Custom sample name. Spaces in sample names are automatically converted to underscores (`_`). | +| `bam` | Full path to a BAM file. File has to be gzipped and have the extension ".bam.gz".gz". | +| `bai` | Full path to a BAI file. File has to be gzipped and have the extension ".bam" or ".fq.gz". | + +An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. -The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. +## Samplesheet reference panel -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. +You will need to create a samplesheet with information about the reference panel you would like to use. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. -```csv title="samplesheet.csv" -sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz -CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz -TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, -TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, +```bash +--panel '[path to samplesheet file]' ``` -| Column | Description | -| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | -| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | -| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +### Structure -An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. +A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes. + +```console +chr,vcf +1,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz +2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz +3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz +``` + +| Column | Description | +| ------ | --------------------------------------------------------------------------------------------------------- | +| `chr` | Name of the chromosome. Use the prefix 'chr' if the panel uses the prefix. | +| `vcf` | Full path to a VCF file for that chromosome. File has to be gzipped and have the extension ".vcf.gz".gz". | + +An [example samplesheet](../assets/samplesheet_reference.csv) has been provided with the pipeline. + +Remember to use the same reference genome for all the files. You can specify the [reference genome](https://nf-co.re/docs/usage/reference_genomes) using: + +```bash +--genome GRCh37 +``` + +or you can specify a custom genome using: + +```bash +--fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz +``` ## Running the pipeline @@ -68,6 +94,9 @@ Note that the pipeline will create the following files in your working directory work # Directory containing the nextflow working files # Finished results in specified location (defined with --outdir) .nextflow_log # Log file from Nextflow +work # Directory containing the nextflow working files + # Finished results in specified location (defined with --outdir) +.nextflow_log # Log file from Nextflow # Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` @@ -96,6 +125,58 @@ genome: 'GRCh37' You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). +### Imputation tools `--step impute --tools [glimpse1, quilt, stitch]` + +You can choose different software to perform the imputation. In the following sections, the typical commands for running the pipeline with each software are included. + +#### QUILT + +```bash +nextflow run nf-core/phaseimpute --input samplesheet.csv --panel samplesheet_reference.csv --step impute --tool quilt --outdir results --genome GRCh37 -profile docker +``` + +#### STITCH + +[STITCH](https://github.com/rwdavies/STITCH) is an R program for low coverage sequencing genotype imputation without using a reference panel. The required inputs for this program are bam samples provided in the input samplesheet (`--input`) and a tsv file with the list of positions to genotype (`--posfile`). + +If you do not have a list of position to genotype, you can provide a reference panel to run the `--mode panelprep` which produces a tsv with this list. + +```bash +nextflow run nf-core/phaseimpute --input samplesheet.csv --step panelprep --panel samplesheet_reference.csv --outdir results --genome GRCh37 -profile docker +``` + +Otherwise, you can provide your own position file in the `--mode impute` with STITCH using the the `--posfile` parameter. + +```bash +nextflow run nf-core/phaseimpute --input samplesheet.csv --step impute --posfile samplesheet_posfile.csv --tool stitch --outdir results --genome GRCh37 -profile docker +``` + +The csv provided in `--posfile` must contain two columns [chr, file]. The first column is the chromosome and the file column are tsvs with the list of positions, unique to each chromosome. + +```console +chr,file +chr1,posfile_chr1.txt +chr2,posfile_chr2.txt +chr3,posfile_chr3.txt +``` + +The file column should contain a TSV with the following structure, from STITCH documentation: "File is tab separated with no header, one row per SNP, with col 1 = chromosome, col 2 = physical position (sorted from smallest to largest), col 3 = reference base, col 4 = alternate base. Bases are capitalized. STITCH only handles bi-allelic SNPs" [STITCH](https://github.com/rwdavies/STITCH/blob/master/Options.md). + +As an example, chr22 tsv file: + +```console +chr22 16570065 A G +chr22 16570067 A C +chr22 16570176 C A +chr22 16570211 T C +``` + +#### GLIMPSE1 + +```bash +nextflow run nf-core/phaseimpute --input samplesheet.csv --panel samplesheet_reference.csv --step impute --tool glimpse1 --outdir results --genome GRCh37 -profile docker +``` + ### Updating the pipeline When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: @@ -106,8 +187,10 @@ nextflow pull nf-core/phaseimpute ### Reproducibility +It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +First, go to the [nf-core/phaseimpute releases page](https://github.com/nf-core/phaseimpute/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. First, go to the [nf-core/phaseimpute releases page](https://github.com/nf-core/phaseimpute/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports. @@ -128,6 +211,7 @@ These options are part of Nextflow and use a _single_ hyphen (pipeline parameter Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below. Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below. :::info @@ -163,6 +247,7 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof ### `-resume` +Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html). Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html). You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. @@ -218,9 +303,66 @@ Some HPC setups also allow you to run nextflow within a cluster job submitted yo ## Nextflow memory requirements +In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. +We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): +Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. + +## Custom configuration + +### Resource requests + +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. + +To change the resource requests, please see the [max resources](https://nf-co.re/docs/usage/configuration#max-resources) and [tuning workflow resources](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources) section of the nf-core website. + +### Custom Containers + +In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date. + +To use a different container from the default container or conda environment specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website. + +### Custom Tool Arguments + +A pipeline might not always support every possible argument or option of a particular tool used in pipeline. Fortunately, nf-core pipelines provide some freedom to users to insert additional parameters that the pipeline does not include by default. + +To learn how to provide additional arguments to a particular tool of the pipeline, please see the [customising tool arguments](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) section of the nf-core website. + +### nf-core/configs + +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. + +If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). + +## Azure Resource Requests + +To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. +We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. + +Note that the choice of VM size depends on your quota and the overall workload during the analysis. +For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). + +## Running in the background + +Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. + +The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file. + +Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. +Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). + +## Nextflow memory requirements + In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): ```bash NXF_OPTS='-Xms1g -Xmx4g' ``` + +NXF_OPTS='-Xms1g -Xmx4g' + +``` + +``` diff --git a/main.nf b/main.nf index fdb2b251..1e1c3dd7 100644 --- a/main.nf +++ b/main.nf @@ -6,7 +6,6 @@ Github : https://github.com/nf-core/phaseimpute Website: https://nf-co.re/phaseimpute Slack : https://nfcore.slack.com/channels/phaseimpute ----------------------------------------------------------------------------------------- */ nextflow.enable.dsl = 2 @@ -17,23 +16,11 @@ nextflow.enable.dsl = 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { PHASEIMPUTE } from './workflows/phaseimpute' +include { PHASEIMPUTE } from './workflows/phaseimpute' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_phaseimpute_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_phaseimpute_pipeline' - include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_phaseimpute_pipeline' -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - GENOME PARAMETER VALUES -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -// TODO nf-core: Remove this line if you don't need a FASTA file -// This is an example of how to use getGenomeAttribute() to fetch parameters -// from igenomes.config using `--genome` -params.fasta = getGenomeAttribute('fasta') - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NAMED WORKFLOWS FOR PIPELINE @@ -46,21 +33,70 @@ params.fasta = getGenomeAttribute('fasta') workflow NFCORE_PHASEIMPUTE { take: - samplesheet // channel: samplesheet read in from --input + ch_input // channel: samplesheet read in from --input + ch_input_truth // channel: samplesheet read in from --input-truth + ch_fasta // channel: reference genome FASTA file with index + ch_panel // channel: reference panel variants file + ch_regions // channel: regions to use [[chr, region], region] + ch_depth // channel: depth of coverage file [[depth], depth] + ch_map // channel: map file for imputation + ch_posfile // channel: samplesheet read in from --posfile + ch_versions // channel: versions of software used main: + // + // Initialise input channels + // + + input_impute = Channel.empty() + input_simulate = Channel.empty() + input_validate = Channel.empty() + + if (params.step.split(',').contains("impute")) { + input_impute = ch_input + .combine(ch_regions) + .map { metaI, file, index, metaCR, region -> + [ metaI+metaCR, file, index ] + } + } else if (params.step.split(',').contains("simulate") || params.step.split(',').contains("all")) { + input_simulate = ch_input + } else if (params.step.split(',').contains("validate")) { + input_validate = ch_input + .combine(ch_regions) + .map { metaI, file, index, metaCR, region -> + [ metaI+metaCR, file, index ] + } + ch_input_truth = ch_input_truth + .combine(ch_regions) + .map { metaI, file, index, metaCR, region -> + [ metaI+metaCR, file, index ] + } + } + // // WORKFLOW: Run pipeline // PHASEIMPUTE ( - samplesheet + input_impute, + input_simulate, + input_validate, + ch_input_truth, + ch_fasta, + ch_panel, + ch_regions, + ch_depth, + ch_map, + ch_posfile, + ch_versions ) + emit: multiqc_report = PHASEIMPUTE.out.multiqc_report // channel: /path/to/multiqc_report.html } + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN MAIN WORKFLOW @@ -70,7 +106,6 @@ workflow NFCORE_PHASEIMPUTE { workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // @@ -88,7 +123,15 @@ workflow { // WORKFLOW: Run main workflow // NFCORE_PHASEIMPUTE ( - PIPELINE_INITIALISATION.out.samplesheet + PIPELINE_INITIALISATION.out.input, + PIPELINE_INITIALISATION.out.input_truth, + PIPELINE_INITIALISATION.out.fasta, + PIPELINE_INITIALISATION.out.panel, + PIPELINE_INITIALISATION.out.regions, + PIPELINE_INITIALISATION.out.depth, + PIPELINE_INITIALISATION.out.map, + PIPELINE_INITIALISATION.out.posfile, + PIPELINE_INITIALISATION.out.versions ) // diff --git a/modules.json b/modules.json index b6b8764b..d3c2bc4c 100644 --- a/modules.json +++ b/modules.json @@ -5,20 +5,187 @@ "https://github.com/nf-core/modules.git": { "modules": { "nf-core": { + "bcftools/annotate": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"], + "patch": "modules/nf-core/bcftools/annotate/bcftools-annotate.diff" + }, + "bcftools/concat": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"], + "patch": "modules/nf-core/bcftools/concat/bcftools-concat.diff" + }, + "bcftools/convert": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"] + }, + "bcftools/index": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": [ + "modules", + "multiple_impute_glimpse2", + "vcf_impute_glimpse", + "vcf_phase_shapeit5" + ] + }, + "bcftools/mpileup": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"], + "patch": "modules/nf-core/bcftools/mpileup/bcftools-mpileup.diff" + }, + "bcftools/norm": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"] + }, + "bcftools/query": { + "branch": "master", + "git_sha": "44096c08ffdbc694f5f92ae174ea0f7ba0f37e09", + "installed_by": ["modules"] + }, + "bcftools/view": { + "branch": "master", + "git_sha": "1013101da4252623fd7acf19cc581bae91d4f839", + "installed_by": ["modules"], + "patch": "modules/nf-core/bcftools/view/bcftools-view.diff" + }, + "bedtools/makewindows": { + "branch": "master", + "git_sha": "3b248b84694d1939ac4bb33df84bf6233a34d668", + "installed_by": ["vcf_phase_shapeit5"] + }, + "custom/dumpsoftwareversions": { + "branch": "master", + "git_sha": "de45447d060b8c8b98575bc637a4a575fd0638e1", + "installed_by": ["modules"] + }, "fastqc": { "branch": "master", "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd", "installed_by": ["modules"] }, + "gawk": { + "branch": "master", + "git_sha": "da4d05d04e65227d4307e87940842f1a14de62c7", + "installed_by": ["modules"] + }, + "glimpse/chunk": { + "branch": "master", + "git_sha": "7e56daae390ff896b292ddc70823447683a79936", + "installed_by": ["vcf_impute_glimpse"] + }, + "glimpse/ligate": { + "branch": "master", + "git_sha": "7e56daae390ff896b292ddc70823447683a79936", + "installed_by": ["vcf_impute_glimpse"] + }, + "glimpse/phase": { + "branch": "master", + "git_sha": "7e56daae390ff896b292ddc70823447683a79936", + "installed_by": ["vcf_impute_glimpse"] + }, + "glimpse2/chunk": { + "branch": "master", + "git_sha": "14ba46490cae3c78ed8e8f48d2c0f8f3be1e7c03", + "installed_by": ["multiple_impute_glimpse2"] + }, + "glimpse2/concordance": { + "branch": "master", + "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5", + "installed_by": ["modules"] + }, + "glimpse2/ligate": { + "branch": "master", + "git_sha": "ee7fee68281944b002bd27a8ff3f19200b4d3fad", + "installed_by": ["multiple_impute_glimpse2"] + }, + "glimpse2/phase": { + "branch": "master", + "git_sha": "9c71d32e372650e8bb3e1fb15339017aad5e3f7f", + "installed_by": ["multiple_impute_glimpse2"] + }, + "glimpse2/splitreference": { + "branch": "master", + "git_sha": "fa12139827a18b324bd63fce654818586a8e9cc7", + "installed_by": ["multiple_impute_glimpse2"] + }, + "gunzip": { + "branch": "master", + "git_sha": "3a5fef109d113b4997c9822198664ca5f2716208", + "installed_by": ["modules"] + }, "multiqc": { "branch": "master", "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a", "installed_by": ["modules"] + }, + "quilt/quilt": { + "branch": "master", + "git_sha": "46265545d61e7f482adf40de941cc9a94e479bbe", + "installed_by": ["modules"] + }, + "samtools/coverage": { + "branch": "master", + "git_sha": "38afbe42f7db7f19c7a89607c0a71c68f3be3131", + "installed_by": ["modules"], + "patch": "modules/nf-core/samtools/coverage/samtools-coverage.diff" + }, + "samtools/faidx": { + "branch": "master", + "git_sha": "f153f1f10e1083c49935565844cccb7453021682", + "installed_by": ["modules"] + }, + "samtools/index": { + "branch": "master", + "git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62", + "installed_by": ["modules"] + }, + "samtools/view": { + "branch": "master", + "git_sha": "0bd7d2333a88483aa0476acea172e9f5f6dd83bb", + "installed_by": ["modules"], + "patch": "modules/nf-core/samtools/view/samtools-view.diff" + }, + "shapeit5/ligate": { + "branch": "master", + "git_sha": "dcf17cc0ed8fd5ea57e61a13e0147cddb5c1ee30", + "installed_by": ["vcf_phase_shapeit5"] + }, + "shapeit5/phasecommon": { + "branch": "master", + "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5", + "installed_by": ["vcf_phase_shapeit5"] + }, + "stitch": { + "branch": "master", + "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5", + "installed_by": ["modules"], + "patch": "modules/nf-core/stitch/stitch.diff" + }, + "tabix/bgzip": { + "branch": "master", + "git_sha": "09d3c8c29b31a2dfd610305b10550f0e1dbcd4a9", + "installed_by": ["modules"] + }, + "tabix/tabix": { + "branch": "master", + "git_sha": "9502adb23c0b97ed8e616bbbdfa73b4585aec9a1", + "installed_by": ["modules"] } } }, "subworkflows": { "nf-core": { + "multiple_impute_glimpse2": { + "branch": "master", + "git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f", + "installed_by": ["subworkflows"] + }, "utils_nextflow_pipeline": { "branch": "master", "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", @@ -33,6 +200,11 @@ "branch": "master", "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", "installed_by": ["subworkflows"] + }, + "vcf_impute_glimpse": { + "branch": "master", + "git_sha": "7e56daae390ff896b292ddc70823447683a79936", + "installed_by": ["subworkflows"] } } } diff --git a/modules/nf-core/fastqc/environment.yml b/modules/local/addcolumns/environment.yml similarity index 61% rename from modules/nf-core/fastqc/environment.yml rename to modules/local/addcolumns/environment.yml index 1787b38a..34513c7f 100644 --- a/modules/nf-core/fastqc/environment.yml +++ b/modules/local/addcolumns/environment.yml @@ -1,7 +1,7 @@ -name: fastqc +name: gawk channels: - conda-forge - bioconda - defaults dependencies: - - bioconda::fastqc=0.12.1 + - anaconda::gawk=5.1.0 diff --git a/modules/local/addcolumns/main.nf b/modules/local/addcolumns/main.nf new file mode 100644 index 00000000..7789f76d --- /dev/null +++ b/modules/local/addcolumns/main.nf @@ -0,0 +1,42 @@ +process ADD_COLUMNS { + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gawk:5.1.0' : + 'biocontainers/gawk:5.1.0' }" + + input: + tuple val(meta), path(input) + + output: + tuple val(meta), path('*.txt'), emit: txt + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + # Find the header line + HEADER_STR="#Genotype concordance by allele frequency bin (Variants: SNPs + indels)" + HEADER_LINE=\$(grep -n -m 1 "^\${HEADER_STR}" $input | cut -d: -f1 ) + HEADER_START=\$((HEADER_LINE + 1)) + + tail -n +\$HEADER_START $input | \\ + awk 'NR==1{\$(NF+1)="ID"} NR>1{\$(NF+1)="${meta.id}"}1' | \\ + awk 'NR==1{\$(NF+1)="Region"} NR>1{\$(NF+1)="${meta.region}"}1' | \\ + awk 'NR==1{\$(NF+1)="Depth"} NR>1{\$(NF+1)="${meta.depth}"}1' | \\ + awk 'NR==1{\$(NF+1)="GPArray"} NR>1{\$(NF+1)="${meta.gparray}"}1' | \\ + awk 'NR==1{\$(NF+1)="Tools"} NR>1{\$(NF+1)="${meta.tools}"}1' | \\ + awk 'NR==1{\$(NF+1)="Panel"} NR>1{\$(NF+1)="${meta.panel}"}1' > \\ + ${prefix}.txt + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//') + END_VERSIONS + """ +} diff --git a/modules/local/vcfchrextract/environment.yml b/modules/local/vcfchrextract/environment.yml new file mode 100644 index 00000000..3280dfaf --- /dev/null +++ b/modules/local/vcfchrextract/environment.yml @@ -0,0 +1,7 @@ +name: vcfchrextract +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/local/vcfchrextract/main.nf b/modules/local/vcfchrextract/main.nf new file mode 100644 index 00000000..6283e4eb --- /dev/null +++ b/modules/local/vcfchrextract/main.nf @@ -0,0 +1,49 @@ +process VCFCHREXTRACT { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(input) + + output: + tuple val(meta), path("*.txt"), emit: chr + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + bcftools \\ + head \\ + $input \\ + \| grep -o -E '^##contig=]*)' | cut -d'=' -f3 \\ + > ${prefix}.txt + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$( bcftools --version |& sed '1!d; s/^.*bcftools //' ) + grep: \$( grep --version |& grep -o -E '[0-9]+\\.[0-9]+' ) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$( bcftools --version |& sed '1!d; s/^.*bcftools //' ) + grep: \$( grep --help |& grep -o -E '[0-9]+\\.[0-9]+\\.[0-9]+' ) + END_VERSIONS + """ +} diff --git a/modules/local/vcfchrextract/meta.yml b/modules/local/vcfchrextract/meta.yml new file mode 100644 index 00000000..19d523d4 --- /dev/null +++ b/modules/local/vcfchrextract/meta.yml @@ -0,0 +1,41 @@ +name: vcfchrextract +description: Extract all contigs name into txt file +keywords: + - bcftools + - vcf + - head + - contig +tools: + - head: + description: Extract header from variant calling file. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: https://samtools.github.io/bcftools/bcftools.html#head + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: Query VCF or BCF file, can be either uncompressed or compressed +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - chr: + type: file + description: List of contigs in the VCF file + pattern: "*{txt}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/local/vcfchrextract/tests/main.nf.test b/modules/local/vcfchrextract/tests/main.nf.test new file mode 100644 index 00000000..a004135b --- /dev/null +++ b/modules/local/vcfchrextract/tests/main.nf.test @@ -0,0 +1,32 @@ +nextflow_process { + + name "Test Process VCFCHREXTRACT" + script "../main.nf" + process "VCFCHREXTRACT" + + tag "modules" + tag "modules_local" + tag "vcfchrextract" + + test("Extract chr from vcf") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } +} diff --git a/modules/local/vcfchrextract/tests/main.nf.test.snap b/modules/local/vcfchrextract/tests/main.nf.test.snap new file mode 100644 index 00000000..3431bbe9 --- /dev/null +++ b/modules/local/vcfchrextract/tests/main.nf.test.snap @@ -0,0 +1,35 @@ +{ + "Extract chr from vcf": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.txt:md5,3a9ea6d336e113a74d7fdca5e7b623fc" + ] + ], + "1": [ + "versions.yml:md5,7e6d75a47df5ce3a975172dcd47fd247" + ], + "chr": [ + [ + { + "id": "test" + }, + "test.txt:md5,3a9ea6d336e113a74d7fdca5e7b623fc" + ] + ], + "versions": [ + "versions.yml:md5,7e6d75a47df5ce3a975172dcd47fd247" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-22T15:09:21.585363234" + } +} \ No newline at end of file diff --git a/modules/local/vcfchrextract/tests/tags.yml b/modules/local/vcfchrextract/tests/tags.yml new file mode 100644 index 00000000..429a601f --- /dev/null +++ b/modules/local/vcfchrextract/tests/tags.yml @@ -0,0 +1,2 @@ +vcfchrextract: + - "modules/local/vcfchrextract/**" diff --git a/modules/nf-core/bcftools/annotate/bcftools-annotate.diff b/modules/nf-core/bcftools/annotate/bcftools-annotate.diff new file mode 100644 index 00000000..79f915db --- /dev/null +++ b/modules/nf-core/bcftools/annotate/bcftools-annotate.diff @@ -0,0 +1,37 @@ +Changes in module 'nf-core/bcftools/annotate' +--- modules/nf-core/bcftools/annotate/main.nf ++++ modules/nf-core/bcftools/annotate/main.nf +@@ -8,7 +8,7 @@ + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: +- tuple val(meta), path(input), path(index), path(annotations), path(annotations_index), path(header_lines) ++ tuple val(meta), path(input), path(index), path(annotations), path(annotations_index), path(header_lines), path(rename_chr) + + output: + tuple val(meta), path("*.{vcf,vcf.gz,bcf,bcf.gz}"), emit: vcf +@@ -18,10 +18,11 @@ + task.ext.when == null || task.ext.when + + script: +- def args = task.ext.args ?: '' ++ def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" +- def header_file = header_lines ? "--header-lines ${header_lines}" : '' +- def annotations_file = annotations ? "--annotations ${annotations}" : '' ++ def header_file = header_lines ? "--header-lines ${header_lines}" : '' ++ def annotations_file = annotations ? "--annotations ${annotations}" : '' ++ def rename_chr_cmd = rename_chr ? "--rename-chrs ${rename_chr}" : '' + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : +@@ -34,6 +35,7 @@ + $args \\ + $annotations_file \\ + $header_file \\ ++ $rename_chr_cmd \\ + --output ${prefix}.${extension} \\ + --threads $task.cpus \\ + $input + +************************************************************ diff --git a/modules/nf-core/bcftools/annotate/environment.yml b/modules/nf-core/bcftools/annotate/environment.yml new file mode 100644 index 00000000..e0abc8d2 --- /dev/null +++ b/modules/nf-core/bcftools/annotate/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_annotate +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/annotate/main.nf b/modules/nf-core/bcftools/annotate/main.nf new file mode 100644 index 00000000..a65855ab --- /dev/null +++ b/modules/nf-core/bcftools/annotate/main.nf @@ -0,0 +1,65 @@ +process BCFTOOLS_ANNOTATE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(input), path(index), path(annotations), path(annotations_index), path(header_lines), path(rename_chr) + + output: + tuple val(meta), path("*.{vcf,vcf.gz,bcf,bcf.gz}"), emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def header_file = header_lines ? "--header-lines ${header_lines}" : '' + def annotations_file = annotations ? "--annotations ${annotations}" : '' + def rename_chr_cmd = rename_chr ? "--rename-chrs ${rename_chr}" : '' + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf" + if ("$input" == "${prefix}.${extension}") error "Input and output names are the same, set prefix in module configuration to disambiguate!" + """ + bcftools \\ + annotate \\ + $args \\ + $annotations_file \\ + $header_file \\ + $rename_chr_cmd \\ + --output ${prefix}.${extension} \\ + --threads $task.cpus \\ + $input + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$( bcftools --version |& sed '1!d; s/^.*bcftools //' ) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf" + """ + touch ${prefix}.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$( bcftools --version |& sed '1!d; s/^.*bcftools //' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/annotate/meta.yml b/modules/nf-core/bcftools/annotate/meta.yml new file mode 100644 index 00000000..f3aa463b --- /dev/null +++ b/modules/nf-core/bcftools/annotate/meta.yml @@ -0,0 +1,56 @@ +name: bcftools_annotate +description: Add or remove annotations. +keywords: + - bcftools + - annotate + - vcf + - remove + - add +tools: + - annotate: + description: Add or remove annotations. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: https://samtools.github.io/bcftools/bcftools.html#annotate + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: Query VCF or BCF file, can be either uncompressed or compressed + - index: + type: file + description: Index of the query VCF or BCF file + - annotations: + type: file + description: Bgzip-compressed file with annotations + - annotations_index: + type: file + description: Index of the annotations file + - header_lines: + type: file + description: Contains lines to append to the output VCF header +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - vcf: + type: file + description: Compressed annotated VCF file + pattern: "*{vcf,vcf.gz,bcf,bcf.gz}" +authors: + - "@projectoriented" + - "@ramprasadn" +maintainers: + - "@projectoriented" + - "@ramprasadn" diff --git a/modules/nf-core/bcftools/concat/bcftools-concat.diff b/modules/nf-core/bcftools/concat/bcftools-concat.diff new file mode 100644 index 00000000..256660aa --- /dev/null +++ b/modules/nf-core/bcftools/concat/bcftools-concat.diff @@ -0,0 +1,21 @@ +Changes in module 'nf-core/bcftools/concat' +--- modules/nf-core/bcftools/concat/main.nf ++++ modules/nf-core/bcftools/concat/main.nf +@@ -21,11 +21,14 @@ + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + """ ++ ++ ls -1v ${vcfs} > order_files.txt ++ + bcftools concat \\ + --output ${prefix}.vcf.gz \\ + $args \\ + --threads $task.cpus \\ +- ${vcfs} ++ -f order_files.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + +************************************************************ diff --git a/modules/nf-core/bcftools/concat/environment.yml b/modules/nf-core/bcftools/concat/environment.yml new file mode 100644 index 00000000..ff0200df --- /dev/null +++ b/modules/nf-core/bcftools/concat/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_concat +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/concat/main.nf b/modules/nf-core/bcftools/concat/main.nf new file mode 100644 index 00000000..e3281f46 --- /dev/null +++ b/modules/nf-core/bcftools/concat/main.nf @@ -0,0 +1,49 @@ +process BCFTOOLS_CONCAT { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(vcfs), path(tbi) + + output: + tuple val(meta), path("*.gz"), emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + """ + + ls -1v ${vcfs} > order_files.txt + + bcftools concat \\ + --output ${prefix}.vcf.gz \\ + $args \\ + --threads $task.cpus \\ + -f order_files.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.vcf.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/concat/meta.yml b/modules/nf-core/bcftools/concat/meta.yml new file mode 100644 index 00000000..91cb54d5 --- /dev/null +++ b/modules/nf-core/bcftools/concat/meta.yml @@ -0,0 +1,51 @@ +name: bcftools_concat +description: Concatenate VCF files +keywords: + - variant calling + - concat + - bcftools + - VCF +tools: + - concat: + description: | + Concatenate VCF files. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcfs: + type: list + description: | + List containing 2 or more vcf files + e.g. [ 'file1.vcf', 'file2.vcf' ] + - tbi: + type: list + description: | + List containing 2 or more index files (optional) + e.g. [ 'file1.tbi', 'file2.tbi' ] +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: VCF concatenated output file + pattern: "*.{vcf.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" + - "@nvnieuwk" +maintainers: + - "@abhi18av" + - "@nvnieuwk" diff --git a/modules/nf-core/bcftools/concat/tests/main.nf.test b/modules/nf-core/bcftools/concat/tests/main.nf.test new file mode 100644 index 00000000..bf1a5f3f --- /dev/null +++ b/modules/nf-core/bcftools/concat/tests/main.nf.test @@ -0,0 +1,108 @@ +nextflow_process { + + name "Test Process BCFTOOLS_CONCAT" + script "../main.nf" + process "BCFTOOLS_CONCAT" + + tag "modules" + tag "modules_nfcore" + tag "bcftools" + tag "bcftools/concat" + + config "./nextflow.config" + + test("sarscov2 - [[vcf1, vcf2], [tbi1, tbi2]]") { + + when { + process { + """ + input[0] = [ + [ id:'test3' ], // meta map + [ + file(params.test_data['homo_sapiens']['illumina']['test_haplotc_cnn_vcf_gz'], checkIfExists: true), + file(params.test_data['homo_sapiens']['illumina']['test_genome_vcf_gz'], checkIfExists: true) + ], + [ + file(params.test_data['homo_sapiens']['illumina']['test_genome_vcf_gz_tbi'], checkIfExists: true), + file(params.test_data['homo_sapiens']['illumina']['test_haplotc_cnn_vcf_gz_tbi'], checkIfExists: true) + ] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [[vcf1, vcf2], []]") { + + when { + process { + """ + input[0] = [ + [ id:'test3' ], // meta map + [ + file(params.test_data['homo_sapiens']['illumina']['test_haplotc_cnn_vcf_gz'], checkIfExists: true), + file(params.test_data['homo_sapiens']['illumina']['test_genome_vcf_gz'], checkIfExists: true) + ], + [] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [[vcf1, vcf2], [tbi1, tbi2]] - stub") { + + options "-stub" + when { + process { + """ + input[0] = [ + [ id:'test3' ], // meta map + [ + file(params.test_data['homo_sapiens']['illumina']['test_haplotc_cnn_vcf_gz'], checkIfExists: true), + file(params.test_data['homo_sapiens']['illumina']['test_genome_vcf_gz'], checkIfExists: true) + ], + [ + file(params.test_data['homo_sapiens']['illumina']['test_genome_vcf_gz_tbi'], checkIfExists: true), + file(params.test_data['homo_sapiens']['illumina']['test_haplotc_cnn_vcf_gz_tbi'], checkIfExists: true) + ] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + file(process.out.vcf[0][1]).name, + process.out.versions + ).match() } + ) + } + + } + +} diff --git a/modules/nf-core/bcftools/concat/tests/main.nf.test.snap b/modules/nf-core/bcftools/concat/tests/main.nf.test.snap new file mode 100644 index 00000000..7344e6e3 --- /dev/null +++ b/modules/nf-core/bcftools/concat/tests/main.nf.test.snap @@ -0,0 +1,43 @@ +{ + "sarscov2 - [[vcf1, vcf2], []]": { + "content": [ + [ + [ + { + "id": "test3" + }, + "test3.vcf.gz:md5,4bcd0afd89f56c5d433f6b6abc44d0a6" + ] + ], + [ + "versions.yml:md5,24ae05eb858733b40fbd3f89743a6d09" + ] + ], + "timestamp": "2023-11-29T13:52:27.03724666" + }, + "sarscov2 - [[vcf1, vcf2], [tbi1, tbi2]]": { + "content": [ + [ + [ + { + "id": "test3" + }, + "test3.vcf.gz:md5,4bcd0afd89f56c5d433f6b6abc44d0a6" + ] + ], + [ + "versions.yml:md5,24ae05eb858733b40fbd3f89743a6d09" + ] + ], + "timestamp": "2023-11-29T13:52:21.468988293" + }, + "sarscov2 - [[vcf1, vcf2], [tbi1, tbi2]] - stub": { + "content": [ + "test3.vcf.gz", + [ + "versions.yml:md5,24ae05eb858733b40fbd3f89743a6d09" + ] + ], + "timestamp": "2023-11-29T13:41:04.716017811" + } +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/concat/tests/nextflow.config b/modules/nf-core/bcftools/concat/tests/nextflow.config new file mode 100644 index 00000000..f3e1e98c --- /dev/null +++ b/modules/nf-core/bcftools/concat/tests/nextflow.config @@ -0,0 +1,3 @@ +process { + ext.args = "--no-version" +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/concat/tests/tags.yml b/modules/nf-core/bcftools/concat/tests/tags.yml new file mode 100644 index 00000000..21710d4e --- /dev/null +++ b/modules/nf-core/bcftools/concat/tests/tags.yml @@ -0,0 +1,2 @@ +bcftools/concat: + - "modules/nf-core/bcftools/concat/**" diff --git a/modules/nf-core/bcftools/convert/environment.yml b/modules/nf-core/bcftools/convert/environment.yml new file mode 100644 index 00000000..53e12e07 --- /dev/null +++ b/modules/nf-core/bcftools/convert/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_convert +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/convert/main.nf b/modules/nf-core/bcftools/convert/main.nf new file mode 100644 index 00000000..c01c2b21 --- /dev/null +++ b/modules/nf-core/bcftools/convert/main.nf @@ -0,0 +1,73 @@ +process BCFTOOLS_CONVERT { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(input), path(input_index) + tuple val(meta2), path(fasta) + path(bed) + + output: + tuple val(meta), path("*.vcf.gz"), optional:true , emit: vcf_gz + tuple val(meta), path("*.vcf") , optional:true , emit: vcf + tuple val(meta), path("*.bcf.gz"), optional:true , emit: bcf_gz + tuple val(meta), path("*.bcf") , optional:true , emit: bcf + tuple val(meta), path("*.hap.gz"), optional:true , emit: hap + tuple val(meta), path("*.legend.gz"), optional:true , emit: legend + tuple val(meta), path("*.samples"), optional:true , emit: samples + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + def regions = bed ? "--regions-file $bed" : "" + def reference = fasta ? "--fasta-ref $fasta" : "" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "--output ${prefix}.bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "--output ${prefix}.bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "--output ${prefix}.vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "--output ${prefix}.vcf" : + args.contains("--haplegendsample") || args.contains("-h") ? "" : + "--output ${prefix}.vcf.gz" + + """ + bcftools convert \\ + $args \\ + $regions \\ + $extension \\ + --threads $task.cpus \\ + $reference \\ + $input + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf.gz" + """ + touch ${prefix}.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/convert/meta.yml b/modules/nf-core/bcftools/convert/meta.yml new file mode 100644 index 00000000..2c89112f --- /dev/null +++ b/modules/nf-core/bcftools/convert/meta.yml @@ -0,0 +1,94 @@ +name: "bcftools_convert" +description: Converts certain output formats to VCF +keywords: + - bcftools + - convert + - vcf + - gvcf +tools: + - "bcftools": + description: "BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations." + homepage: "https://samtools.github.io/bcftools/bcftools.html" + documentation: "https://samtools.github.io/bcftools/bcftools.html#convert" + tool_dev_url: "https://github.com/samtools/bcftools" + doi: "10.1093/gigascience/giab008" + licence: ["GPL"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: | + The input format. Each format needs a seperate parameter to be specified in the `args`: + - GEN/SAMPLE file: `--gensample2vcf` + - gVCF file: `--gvcf2vcf` + - HAP/SAMPLE file: `--hapsample2vcf` + - HAP/LEGEND/SAMPLE file: `--haplegendsample2vcf` + - TSV file: `--tsv2vcf` + pattern: "*.{gen,sample,g.vcf,hap,legend}{.gz,}" + - input_index: + type: file + description: (Optional) The index for the input files, if needed + pattern: "*.bed" + - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'genome' ] + - fasta: + type: file + description: (Optional) The reference fasta, only needed for gVCF conversion + pattern: "*.{fa,fasta}" + - bed: + type: file + description: (Optional) The BED file containing the regions for the VCF file + pattern: "*.bed" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - vcf_gz: + type: file + description: VCF merged output file (bgzipped) => when `--output-type z` is used + pattern: "*.vcf.gz" + - vcf: + type: file + description: VCF merged output file => when `--output-type v` is used + pattern: "*.vcf" + - bcf_gz: + type: file + description: BCF merged output file (bgzipped) => when `--output-type b` is used + pattern: "*.bcf.gz" + - bcf: + type: file + description: BCF merged output file => when `--output-type u` is used + pattern: "*.bcf" + - hap: + type: file + description: hap format used by IMPUTE2 and SHAPEIT + pattern: "*.hap.gz" + - legend: + type: file + description: legend format used by IMPUTE2 and SHAPEIT + pattern: "*.legend.gz" + - sample: + type: file + description: sample format used by IMPUTE2 and SHAPEIT + pattern: "*.samples" +authors: + - "@nvnieuwk" + - "@ramprasadn" + - "@atrigila" +maintainers: + - "@nvnieuwk" + - "@ramprasadn" + - "@atrigila" diff --git a/modules/nf-core/bcftools/index/environment.yml b/modules/nf-core/bcftools/index/environment.yml new file mode 100644 index 00000000..bbee37ad --- /dev/null +++ b/modules/nf-core/bcftools/index/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_index +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/index/main.nf b/modules/nf-core/bcftools/index/main.nf new file mode 100644 index 00000000..4cd0dcbb --- /dev/null +++ b/modules/nf-core/bcftools/index/main.nf @@ -0,0 +1,51 @@ +process BCFTOOLS_INDEX { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(vcf) + + output: + tuple val(meta), path("*.csi"), optional:true, emit: csi + tuple val(meta), path("*.tbi"), optional:true, emit: tbi + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + bcftools \\ + index \\ + $args \\ + --threads $task.cpus \\ + $vcf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("--tsi") || args.contains("-t") ? "tbi" : + "csi" + """ + touch ${vcf}.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/index/meta.yml b/modules/nf-core/bcftools/index/meta.yml new file mode 100644 index 00000000..fc340cbc --- /dev/null +++ b/modules/nf-core/bcftools/index/meta.yml @@ -0,0 +1,48 @@ +name: bcftools_index +description: Index VCF tools +keywords: + - vcf + - index + - bcftools + - csi + - tbi +tools: + - bcftools: + description: BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations. + homepage: https://samtools.github.io/bcftools/ + documentation: https://samtools.github.io/bcftools/howtos/index.html + tool_dev_url: https://github.com/samtools/bcftools + doi: "10.1093/gigascience/giab008" + licence: ["MIT", "GPL-3.0-or-later"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - VCF: + type: file + description: VCF file (optionally GZIPPED) + pattern: "*.{vcf,vcf.gz}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "versions.yml" + - csi: + type: file + description: Default VCF file index file + pattern: "*.csi" + - tbi: + type: file + description: Alternative VCF file index file for larger files (activated with -t parameter) + pattern: "*.tbi" +authors: + - "@jfy133" +maintainers: + - "@jfy133" diff --git a/modules/nf-core/bcftools/mpileup/bcftools-mpileup.diff b/modules/nf-core/bcftools/mpileup/bcftools-mpileup.diff new file mode 100644 index 00000000..a85d9cfd --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/bcftools-mpileup.diff @@ -0,0 +1,38 @@ +Changes in module 'nf-core/bcftools/mpileup' +--- modules/nf-core/bcftools/mpileup/main.nf ++++ modules/nf-core/bcftools/mpileup/main.nf +@@ -8,8 +8,8 @@ + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: +- tuple val(meta), path(bam), path(intervals) +- tuple val(meta2), path(fasta) ++ tuple val(meta), path(bam), path(target_m), path(target_c) ++ tuple val(meta2), path(fasta), path(fai) + val save_mpileup + + output: +@@ -29,7 +29,8 @@ + def prefix = task.ext.prefix ?: "${meta.id}" + def mpileup = save_mpileup ? "| tee ${prefix}.mpileup" : "" + def bgzip_mpileup = save_mpileup ? "bgzip ${prefix}.mpileup" : "" +- def intervals = intervals ? "-T ${intervals}" : "" ++ def target_m = target_m ? "-T ${target_m}" : "" ++ def target_c = target_c ? "-T ${target_c}" : "" + """ + echo "${meta.id}" > sample_name.list + +@@ -38,9 +39,9 @@ + --fasta-ref $fasta \\ + $args \\ + $bam \\ +- $intervals \\ ++ $target_m \\ + $mpileup \\ +- | bcftools call --output-type v $args2 \\ ++ | bcftools call --output-type v $args2 $target_c \\ + | bcftools reheader --samples sample_name.list \\ + | bcftools view --output-file ${prefix}.vcf.gz --output-type z $args3 + + +************************************************************ diff --git a/modules/nf-core/bcftools/mpileup/environment.yml b/modules/nf-core/bcftools/mpileup/environment.yml new file mode 100644 index 00000000..114390be --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_mpileup +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/mpileup/main.nf b/modules/nf-core/bcftools/mpileup/main.nf new file mode 100644 index 00000000..48a567e5 --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/main.nf @@ -0,0 +1,59 @@ +process BCFTOOLS_MPILEUP { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(bam), path(target_m), path(target_c) + tuple val(meta2), path(fasta), path(fai) + val save_mpileup + + output: + tuple val(meta), path("*vcf.gz") , emit: vcf + tuple val(meta), path("*vcf.gz.tbi") , emit: tbi + tuple val(meta), path("*stats.txt") , emit: stats + tuple val(meta), path("*.mpileup.gz"), emit: mpileup, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def args3 = task.ext.args3 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def mpileup = save_mpileup ? "| tee ${prefix}.mpileup" : "" + def bgzip_mpileup = save_mpileup ? "bgzip ${prefix}.mpileup" : "" + def target_m = target_m ? "-T ${target_m}" : "" + def target_c = target_c ? "-T ${target_c}" : "" + """ + echo "${meta.id}" > sample_name.list + + bcftools \\ + mpileup \\ + --fasta-ref $fasta \\ + $args \\ + $bam \\ + $target_m \\ + $mpileup \\ + | bcftools call --output-type v $args2 $target_c \\ + | bcftools reheader --samples sample_name.list \\ + | bcftools view --output-file ${prefix}.vcf.gz --output-type z $args3 + + $bgzip_mpileup + + tabix -p vcf -f ${prefix}.vcf.gz + + bcftools stats ${prefix}.vcf.gz > ${prefix}.bcftools_stats.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/mpileup/meta.yml b/modules/nf-core/bcftools/mpileup/meta.yml new file mode 100644 index 00000000..65410ddd --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/meta.yml @@ -0,0 +1,70 @@ +name: bcftools_mpileup +description: Compresses VCF files +keywords: + - variant calling + - mpileup + - VCF +tools: + - mpileup: + description: | + Generates genotype likelihoods at each genomic position with coverage. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: Input BAM file + pattern: "*.{bam}" + - intervals: + type: file + description: Input intervals file. A file (commonly '.bed') containing regions to subset + - meta: + type: map + description: | + Groovy Map containing information about the genome fasta, e.g. [ id: 'sarscov2' ] + - fasta: + type: file + description: FASTA reference file + pattern: "*.{fasta,fa}" + - save_mpileup: + type: boolean + description: Save mpileup file generated by bcftools mpileup +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: VCF gzipped output file + pattern: "*.{vcf.gz}" + - tbi: + type: file + description: tabix index file + pattern: "*.{vcf.gz.tbi}" + - stats: + type: file + description: Text output file containing stats + pattern: "*{stats.txt}" + - mpileup: + type: file + description: mpileup gzipped output for all positions + pattern: "{*.mpileup.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" +maintainers: + - "@joseespinosa" + - "@drpatelh" diff --git a/modules/nf-core/bcftools/mpileup/tests/main.nf.test b/modules/nf-core/bcftools/mpileup/tests/main.nf.test new file mode 100644 index 00000000..6478bbc2 --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/tests/main.nf.test @@ -0,0 +1,116 @@ +nextflow_process { + + name "Test Process BCFTOOLS_MPILEUP" + script "../main.nf" + process "BCFTOOLS_MPILEUP" + + tag "modules" + tag "modules_nfcore" + tag "bcftools" + tag "bcftools/mpileup" + + config "./nextflow.config" + + test("sarscov2 - [bam, []], fasta, false") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_paired_end_sorted_bam'], checkIfExists: true), + [] + ] + input[1] = [ + [ id:'sarscov2' ], // meta map + file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) + ] + input[2] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.tbi, + process.out.stats, + process.out.mpileup, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [bam, []], fasta, true") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_paired_end_sorted_bam'], checkIfExists: true), + [] + ] + input[1] = [ + [ id:'sarscov2' ], // meta map + file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) + ] + input[2] = true + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.tbi, + process.out.stats, + process.out.mpileup, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [bam, bed], fasta, false") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_paired_end_sorted_bam'], checkIfExists: true), + file(params.test_data['sarscov2']['genome']['test_bed'], checkIfExists: true) + ] + input[1] = [ + [ id:'sarscov2' ], // meta map + file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) + ] + input[2] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.tbi, + process.out.stats, + process.out.mpileup, + process.out.versions + ).match() } + ) + } + + } + +} diff --git a/modules/nf-core/bcftools/mpileup/tests/main.nf.test.snap b/modules/nf-core/bcftools/mpileup/tests/main.nf.test.snap new file mode 100644 index 00000000..ef80ab1b --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/tests/main.nf.test.snap @@ -0,0 +1,112 @@ +{ + "sarscov2 - [bam, []], fasta, true": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.vcf.gz:md5,0f2f2c8488e97e7f13979380d5d3b6b5" + ] + ], + [ + [ + { + "id": "test" + }, + "test.vcf.gz.tbi:md5,34cb2eeb73f4d2b98218acecebd92704" + ] + ], + [ + [ + { + "id": "test" + }, + "test.bcftools_stats.txt:md5,a988fbcd2ea5d1ce30970dcb60a77ed7" + ] + ], + [ + [ + { + "id": "test" + }, + "test.mpileup.gz:md5,73b4a00398bddab2cd065b40d17ca4dc" + ] + ], + [ + "versions.yml:md5,e09c59d941374bb293aadc36e2f29dbf" + ] + ], + "timestamp": "2023-11-29T14:11:54.549517279" + }, + "sarscov2 - [bam, bed], fasta, false": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.vcf.gz:md5,687244dbf71d05b3b973ab08ecf05310" + ] + ], + [ + [ + { + "id": "test" + }, + "test.vcf.gz.tbi:md5,3785df15f3d7faf35f3ad70d167a50f7" + ] + ], + [ + [ + { + "id": "test" + }, + "test.bcftools_stats.txt:md5,f8c5ab149c4bf0e5f51c518346cb87b5" + ] + ], + [ + + ], + [ + "versions.yml:md5,e09c59d941374bb293aadc36e2f29dbf" + ] + ], + "timestamp": "2023-11-29T14:12:00.865439661" + }, + "sarscov2 - [bam, []], fasta, false": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.vcf.gz:md5,0f2f2c8488e97e7f13979380d5d3b6b5" + ] + ], + [ + [ + { + "id": "test" + }, + "test.vcf.gz.tbi:md5,34cb2eeb73f4d2b98218acecebd92704" + ] + ], + [ + [ + { + "id": "test" + }, + "test.bcftools_stats.txt:md5,a988fbcd2ea5d1ce30970dcb60a77ed7" + ] + ], + [ + + ], + [ + "versions.yml:md5,e09c59d941374bb293aadc36e2f29dbf" + ] + ], + "timestamp": "2023-11-29T14:11:47.814900494" + } +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/mpileup/tests/nextflow.config b/modules/nf-core/bcftools/mpileup/tests/nextflow.config new file mode 100644 index 00000000..a7ba19fe --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/tests/nextflow.config @@ -0,0 +1,4 @@ +process { + ext.args2 = '--no-version --ploidy 1 --multiallelic-caller' + ext.args3 = '--no-version' +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/mpileup/tests/tags.yml b/modules/nf-core/bcftools/mpileup/tests/tags.yml new file mode 100644 index 00000000..07b91f98 --- /dev/null +++ b/modules/nf-core/bcftools/mpileup/tests/tags.yml @@ -0,0 +1,2 @@ +bcftools/mpileup: + - "modules/nf-core/bcftools/mpileup/**" diff --git a/modules/nf-core/bcftools/norm/environment.yml b/modules/nf-core/bcftools/norm/environment.yml new file mode 100644 index 00000000..fe80e4e7 --- /dev/null +++ b/modules/nf-core/bcftools/norm/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_norm +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/norm/main.nf b/modules/nf-core/bcftools/norm/main.nf new file mode 100644 index 00000000..47d3dab1 --- /dev/null +++ b/modules/nf-core/bcftools/norm/main.nf @@ -0,0 +1,60 @@ +process BCFTOOLS_NORM { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(vcf), path(tbi) + tuple val(meta2), path(fasta) + + output: + tuple val(meta), path("*.{vcf,vcf.gz,bcf,bcf.gz}") , emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '--output-type z' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf.gz" + + """ + bcftools norm \\ + --fasta-ref ${fasta} \\ + --output ${prefix}.${extension}\\ + $args \\ + --threads $task.cpus \\ + ${vcf} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '--output-type z' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf.gz" + """ + touch ${prefix}.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/norm/meta.yml b/modules/nf-core/bcftools/norm/meta.yml new file mode 100644 index 00000000..1f3e1b62 --- /dev/null +++ b/modules/nf-core/bcftools/norm/meta.yml @@ -0,0 +1,61 @@ +name: bcftools_norm +description: Normalize VCF file +keywords: + - normalize + - norm + - variant calling + - VCF +tools: + - norm: + description: | + Normalize VCF files. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: | + The vcf file to be normalized + e.g. 'file1.vcf' + pattern: "*.{vcf,vcf.gz}" + - tbi: + type: file + description: | + An optional index of the VCF file (for when the VCF is compressed) + pattern: "*.vcf.gz.tbi" + - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'genome' ] + - fasta: + type: file + description: FASTA reference file + pattern: "*.{fasta,fa}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: One of uncompressed VCF (.vcf), compressed VCF (.vcf.gz), compressed BCF (.bcf.gz) or uncompressed BCF (.bcf) normalized output file + pattern: "*.{vcf,vcf.gz,bcf,bcf.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" + - "@ramprasadn" +maintainers: + - "@abhi18av" + - "@ramprasadn" diff --git a/modules/nf-core/bcftools/query/environment.yml b/modules/nf-core/bcftools/query/environment.yml new file mode 100644 index 00000000..4f9661ca --- /dev/null +++ b/modules/nf-core/bcftools/query/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_query +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/query/main.nf b/modules/nf-core/bcftools/query/main.nf new file mode 100644 index 00000000..e9e73a6a --- /dev/null +++ b/modules/nf-core/bcftools/query/main.nf @@ -0,0 +1,56 @@ +process BCFTOOLS_QUERY { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(vcf), path(tbi) + path regions + path targets + path samples + + output: + tuple val(meta), path("*.${suffix}"), emit: output + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + suffix = task.ext.suffix ?: "txt" + def regions_file = regions ? "--regions-file ${regions}" : "" + def targets_file = targets ? "--targets-file ${targets}" : "" + def samples_file = samples ? "--samples-file ${samples}" : "" + """ + bcftools query \\ + $regions_file \\ + $targets_file \\ + $samples_file \\ + $args \\ + $vcf \\ + > ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + suffix = task.ext.suffix ?: "txt" + """ + touch ${prefix}.${suffix} \\ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/query/meta.yml b/modules/nf-core/bcftools/query/meta.yml new file mode 100644 index 00000000..303ef610 --- /dev/null +++ b/modules/nf-core/bcftools/query/meta.yml @@ -0,0 +1,63 @@ +name: bcftools_query +description: Extracts fields from VCF or BCF files and outputs them in user-defined format. +keywords: + - query + - variant calling + - bcftools + - VCF +tools: + - query: + description: | + Extracts fields from VCF or BCF files and outputs them in user-defined format. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: | + The vcf file to be qeuried. + pattern: "*.{vcf.gz, vcf}" + - tbi: + type: file + description: | + The tab index for the VCF file to be inspected. + pattern: "*.tbi" + - regions: + type: file + description: | + Optionally, restrict the operation to regions listed in this file. + - targets: + type: file + description: | + Optionally, restrict the operation to regions listed in this file (doesn't rely upon index files) + - samples: + type: file + description: | + Optional, file of sample names to be included or excluded. + e.g. 'file.tsv' +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - output: + type: file + description: BCFTools query output file + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" + - "@drpatelh" +maintainers: + - "@abhi18av" + - "@drpatelh" diff --git a/modules/nf-core/bcftools/query/tests/main.nf.test b/modules/nf-core/bcftools/query/tests/main.nf.test new file mode 100644 index 00000000..e9ea5a9d --- /dev/null +++ b/modules/nf-core/bcftools/query/tests/main.nf.test @@ -0,0 +1,101 @@ +nextflow_process { + + name "Test Process BCFTOOLS_QUERY" + script "../main.nf" + process "BCFTOOLS_QUERY" + + tag "modules" + tag "modules_nfcore" + tag "bcftools" + tag "bcftools/query" + + config "./nextflow.config" + + test("sarscov2 - [vcf, tbi], [], [], []") { + + when { + process { + """ + input[0] = [ + [ id:'out' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = [] + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.output, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [vcf, tbi], vcf, tsv, []") { + + when { + process { + """ + input[0] = [ + [ id:'out' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = file(params.test_data['sarscov2']['illumina']['test3_vcf_gz'], checkIfExists: true) + input[2] = file(params.test_data['sarscov2']['illumina']['test2_vcf_targets_tsv_gz'], checkIfExists: true) + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.output, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [vcf, tbi], [], [], [] - stub") { + + when { + process { + """ + input[0] = [ + [ id:'out' ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = [] + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + file(process.out.output[0][1]).name, + process.out.versions + ).match() } + ) + } + + } + +} diff --git a/modules/nf-core/bcftools/query/tests/main.nf.test.snap b/modules/nf-core/bcftools/query/tests/main.nf.test.snap new file mode 100644 index 00000000..a19f2053 --- /dev/null +++ b/modules/nf-core/bcftools/query/tests/main.nf.test.snap @@ -0,0 +1,43 @@ +{ + "sarscov2 - [vcf, tbi], vcf, tsv, []": { + "content": [ + [ + [ + { + "id": "out" + }, + "out.txt:md5,75a6bd0084e2e1838cf7baba11b99d19" + ] + ], + [ + "versions.yml:md5,b40206d5437ce4b044d15c47ddd93d8e" + ] + ], + "timestamp": "2023-11-29T14:21:05.191946862" + }, + "sarscov2 - [vcf, tbi], [], [], [] - stub": { + "content": [ + "out.txt", + [ + "versions.yml:md5,b40206d5437ce4b044d15c47ddd93d8e" + ] + ], + "timestamp": "2023-11-29T14:21:11.169603542" + }, + "sarscov2 - [vcf, tbi], [], [], []": { + "content": [ + [ + [ + { + "id": "out" + }, + "out.txt:md5,87a2ab194e1ee3219b44e58429ec3307" + ] + ], + [ + "versions.yml:md5,b40206d5437ce4b044d15c47ddd93d8e" + ] + ], + "timestamp": "2023-11-29T14:20:59.335041418" + } +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/query/tests/nextflow.config b/modules/nf-core/bcftools/query/tests/nextflow.config new file mode 100644 index 00000000..da81c2a0 --- /dev/null +++ b/modules/nf-core/bcftools/query/tests/nextflow.config @@ -0,0 +1,3 @@ +process { + ext.args = "-f '%CHROM %POS %REF %ALT[%SAMPLE=%GT]'" +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/query/tests/tags.yml b/modules/nf-core/bcftools/query/tests/tags.yml new file mode 100644 index 00000000..fb9455cb --- /dev/null +++ b/modules/nf-core/bcftools/query/tests/tags.yml @@ -0,0 +1,2 @@ +bcftools/query: + - "modules/nf-core/bcftools/query/**" diff --git a/modules/nf-core/bcftools/view/environment.yml b/modules/nf-core/bcftools/view/environment.yml new file mode 100644 index 00000000..8937c6da --- /dev/null +++ b/modules/nf-core/bcftools/view/environment.yml @@ -0,0 +1,7 @@ +name: bcftools_view +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bcftools=1.18 diff --git a/modules/nf-core/bcftools/view/main.nf b/modules/nf-core/bcftools/view/main.nf new file mode 100644 index 00000000..5237adc8 --- /dev/null +++ b/modules/nf-core/bcftools/view/main.nf @@ -0,0 +1,66 @@ +process BCFTOOLS_VIEW { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.18--h8b25389_0': + 'biocontainers/bcftools:1.18--h8b25389_0' }" + + input: + tuple val(meta), path(vcf), path(index) + path(regions) + path(targets) + path(samples) + + output: + tuple val(meta), path("*.{vcf,vcf.gz,bcf,bcf.gz}"), emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def regions_file = regions ? "--regions-file ${regions}" : "" + def targets_file = targets ? "--targets-file ${targets}" : "" + def samples_file = samples ? "--samples-file ${samples}" : "" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf" + """ + bcftools view \\ + --output ${prefix}.${extension} \\ + ${regions_file} \\ + ${targets_file} \\ + ${samples_file} \\ + $args \\ + --threads $task.cpus \\ + ${vcf} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extension = args.contains("--output-type b") || args.contains("-Ob") ? "bcf.gz" : + args.contains("--output-type u") || args.contains("-Ou") ? "bcf" : + args.contains("--output-type z") || args.contains("-Oz") ? "vcf.gz" : + args.contains("--output-type v") || args.contains("-Ov") ? "vcf" : + "vcf" + """ + touch ${prefix}.${extension} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/bcftools/view/meta.yml b/modules/nf-core/bcftools/view/meta.yml new file mode 100644 index 00000000..6baa34a6 --- /dev/null +++ b/modules/nf-core/bcftools/view/meta.yml @@ -0,0 +1,64 @@ +name: bcftools_view +description: View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF +keywords: + - variant calling + - view + - bcftools + - VCF +tools: + - view: + description: | + View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: | + The vcf file to be inspected. + e.g. 'file.vcf' + - index: + type: file + description: | + The tab index for the VCF file to be inspected. + e.g. 'file.tbi' + - regions: + type: file + description: | + Optionally, restrict the operation to regions listed in this file. + e.g. 'file.vcf' + - targets: + type: file + description: | + Optionally, restrict the operation to regions listed in this file (doesn't rely upon index files) + e.g. 'file.vcf' + - samples: + type: file + description: | + Optional, file of sample names to be included or excluded. + e.g. 'file.tsv' +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: VCF normalized output file + pattern: "*.{vcf,vcf.gz,bcf,bcf.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" +maintainers: + - "@abhi18av" diff --git a/modules/nf-core/bcftools/view/tests/main.nf.test b/modules/nf-core/bcftools/view/tests/main.nf.test new file mode 100644 index 00000000..c285674c --- /dev/null +++ b/modules/nf-core/bcftools/view/tests/main.nf.test @@ -0,0 +1,103 @@ +nextflow_process { + + name "Test Process BCFTOOLS_VIEW" + script "../main.nf" + process "BCFTOOLS_VIEW" + + tag "modules" + tag "modules_nfcore" + tag "bcftools" + tag "bcftools/view" + + config "./nextflow.config" + + test("sarscov2 - [vcf, tbi], [], [], []") { + + when { + process { + """ + input[0] = [ + [ id:'out', single_end:false ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = [] + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [vcf, tbi], vcf, tsv, []") { + + when { + process { + """ + input[0] = [ + [ id:'out', single_end:false ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = file(params.test_data['sarscov2']['illumina']['test3_vcf_gz'], checkIfExists: true) + input[2] = file(params.test_data['sarscov2']['illumina']['test2_vcf_targets_tsv_gz'], checkIfExists: true) + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf, + process.out.versions + ).match() } + ) + } + + } + + test("sarscov2 - [vcf, tbi], [], [], [] - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'out', single_end:false ], // meta map + file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true), + file(params.test_data['sarscov2']['illumina']['test_vcf_gz_tbi'], checkIfExists: true) + ] + input[1] = [] + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + file(process.out.vcf[0][1]).name, + process.out.versions + ).match() } + ) + } + + } + +} diff --git a/modules/nf-core/bcftools/view/tests/main.nf.test.snap b/modules/nf-core/bcftools/view/tests/main.nf.test.snap new file mode 100644 index 00000000..b59be932 --- /dev/null +++ b/modules/nf-core/bcftools/view/tests/main.nf.test.snap @@ -0,0 +1,45 @@ +{ + "sarscov2 - [vcf, tbi], vcf, tsv, []": { + "content": [ + [ + [ + { + "id": "out", + "single_end": false + }, + "out.vcf:md5,1bcbd0eff25d316ba915d06463aab17b" + ] + ], + [ + "versions.yml:md5,106d119dde844ec7fee1cdd30828bcdc" + ] + ], + "timestamp": "2024-02-05T17:12:20.799849895" + }, + "sarscov2 - [vcf, tbi], [], [], [] - stub": { + "content": [ + "out.vcf", + [ + "versions.yml:md5,106d119dde844ec7fee1cdd30828bcdc" + ] + ], + "timestamp": "2024-02-05T16:53:34.652746985" + }, + "sarscov2 - [vcf, tbi], [], [], []": { + "content": [ + [ + [ + { + "id": "out", + "single_end": false + }, + "out.vcf:md5,8e722884ffb75155212a3fc053918766" + ] + ], + [ + "versions.yml:md5,106d119dde844ec7fee1cdd30828bcdc" + ] + ], + "timestamp": "2024-02-05T17:12:14.247465409" + } +} \ No newline at end of file diff --git a/modules/nf-core/bcftools/view/tests/nextflow.config b/modules/nf-core/bcftools/view/tests/nextflow.config new file mode 100644 index 00000000..932e3ba6 --- /dev/null +++ b/modules/nf-core/bcftools/view/tests/nextflow.config @@ -0,0 +1,3 @@ +process { + ext.args = '--no-version --output-type v' +} diff --git a/modules/nf-core/bcftools/view/tests/tags.yml b/modules/nf-core/bcftools/view/tests/tags.yml new file mode 100644 index 00000000..43b1f0aa --- /dev/null +++ b/modules/nf-core/bcftools/view/tests/tags.yml @@ -0,0 +1,2 @@ +bcftools/view: + - "modules/nf-core/bcftools/view/**" diff --git a/modules/nf-core/bedtools/makewindows/environment.yml b/modules/nf-core/bedtools/makewindows/environment.yml new file mode 100644 index 00000000..0de3c15d --- /dev/null +++ b/modules/nf-core/bedtools/makewindows/environment.yml @@ -0,0 +1,7 @@ +name: bedtools_makewindows +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::bedtools=2.31.1 diff --git a/modules/nf-core/bedtools/makewindows/main.nf b/modules/nf-core/bedtools/makewindows/main.nf new file mode 100644 index 00000000..36d6cac2 --- /dev/null +++ b/modules/nf-core/bedtools/makewindows/main.nf @@ -0,0 +1,49 @@ +process BEDTOOLS_MAKEWINDOWS { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bedtools:2.31.1--hf5e1c6e_0' : + 'biocontainers/bedtools:2.31.1--hf5e1c6e_0' }" + + input: + tuple val(meta), path(regions) + + output: + tuple val(meta), path("*.bed"), emit: bed + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def arg_input = regions.extension in ["bed", "tab"] ? "-b ${regions}" : "-g ${regions}" + if ("${regions}" == "${prefix}.bed") error "Input and output names are the same, set prefix in module configuration to disambiguate!" + """ + bedtools \\ + makewindows \\ + ${arg_input} \\ + ${args} \\ + > ${prefix}.bed + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + if ("${regions}" == "${prefix}.bed") error "Input and output names are the same, set prefix in module configuration to disambiguate!" + """ + touch ${prefix}.bed + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") + END_VERSIONS + """ +} diff --git a/modules/nf-core/bedtools/makewindows/meta.yml b/modules/nf-core/bedtools/makewindows/meta.yml new file mode 100644 index 00000000..f89d7175 --- /dev/null +++ b/modules/nf-core/bedtools/makewindows/meta.yml @@ -0,0 +1,44 @@ +name: bedtools_makewindows +description: Makes adjacent or sliding windows across a genome or BED file. +keywords: + - bed + - windows + - fai + - chunking +tools: + - bedtools: + description: A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types. + homepage: https://bedtools.readthedocs.io + documentation: https://bedtools.readthedocs.io/en/latest/content/tools/makewindows.html + doi: "10.1093/bioinformatics/btq033" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - regions: + type: file + description: BED file OR Genome details file () + pattern: "*.{bed,tab,fai}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - bed: + type: file + description: BED file containing the windows + pattern: "*.bed" +authors: + - "@kevbrick" + - "@nvnieuwk" +maintainers: + - "@kevbrick" + - "@nvnieuwk" diff --git a/modules/nf-core/custom/dumpsoftwareversions/environment.yml b/modules/nf-core/custom/dumpsoftwareversions/environment.yml new file mode 100644 index 00000000..b48ced26 --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/environment.yml @@ -0,0 +1,7 @@ +name: custom_dumpsoftwareversions +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::multiqc=1.20 diff --git a/modules/nf-core/custom/dumpsoftwareversions/main.nf b/modules/nf-core/custom/dumpsoftwareversions/main.nf new file mode 100644 index 00000000..105f9265 --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/main.nf @@ -0,0 +1,24 @@ +process CUSTOM_DUMPSOFTWAREVERSIONS { + label 'process_single' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.20--pyhdfd78af_0' : + 'biocontainers/multiqc:1.20--pyhdfd78af_0' }" + + input: + path versions + + output: + path "software_versions.yml" , emit: yml + path "software_versions_mqc.yml", emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + template 'dumpsoftwareversions.py' +} diff --git a/modules/nf-core/custom/dumpsoftwareversions/meta.yml b/modules/nf-core/custom/dumpsoftwareversions/meta.yml new file mode 100644 index 00000000..5f15a5fd --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/meta.yml @@ -0,0 +1,37 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: custom_dumpsoftwareversions +description: Custom module used to dump software versions within the nf-core pipeline template +keywords: + - custom + - dump + - version +tools: + - custom: + description: Custom module used to dump software versions within the nf-core pipeline template + homepage: https://github.com/nf-core/tools + documentation: https://github.com/nf-core/tools + licence: ["MIT"] +input: + - versions: + type: file + description: YML file containing software versions + pattern: "*.yml" +output: + - yml: + type: file + description: Standard YML file containing software versions + pattern: "software_versions.yml" + - mqc_yml: + type: file + description: MultiQC custom content YML file containing software versions + pattern: "software_versions_mqc.yml" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@grst" +maintainers: + - "@drpatelh" + - "@grst" diff --git a/modules/nf-core/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py b/modules/nf-core/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py new file mode 100644 index 00000000..da033408 --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py @@ -0,0 +1,101 @@ +#!/usr/bin/env python + + +"""Provide functions to merge multiple versions.yml files.""" + + +import yaml +import platform +from textwrap import dedent + + +def _make_versions_html(versions): + """Generate a tabular HTML output of all versions for MultiQC.""" + html = [ + dedent( + """\\ + + + + + + + + + + """ + ) + ] + for process, tmp_versions in sorted(versions.items()): + html.append("") + for i, (tool, version) in enumerate(sorted(tmp_versions.items())): + html.append( + dedent( + f"""\\ + + + + + + """ + ) + ) + html.append("") + html.append("
Process Name Software Version
{process if (i == 0) else ''}{tool}{version}
") + return "\\n".join(html) + + +def main(): + """Load all version files and generate merged output.""" + versions_this_module = {} + versions_this_module["${task.process}"] = { + "python": platform.python_version(), + "yaml": yaml.__version__, + } + + with open("$versions") as f: + versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module + + # aggregate versions by the module name (derived from fully-qualified process name) + versions_by_module = {} + for process, process_versions in versions_by_process.items(): + module = process.split(":")[-1] + try: + if versions_by_module[module] != process_versions: + raise AssertionError( + "We assume that software versions are the same between all modules. " + "If you see this error-message it means you discovered an edge-case " + "and should open an issue in nf-core/tools. " + ) + except KeyError: + versions_by_module[module] = process_versions + + versions_by_module["Workflow"] = { + "Nextflow": "$workflow.nextflow.version", + "$workflow.manifest.name": "$workflow.manifest.version", + } + + versions_mqc = { + "id": "software_versions", + "section_name": "${workflow.manifest.name} Software Versions", + "section_href": "https://github.com/${workflow.manifest.name}", + "plot_type": "html", + "description": "are collected at run time from the software output.", + "data": _make_versions_html(versions_by_module), + } + + with open("software_versions.yml", "w") as f: + yaml.dump(versions_by_module, f, default_flow_style=False) + with open("software_versions_mqc.yml", "w") as f: + yaml.dump(versions_mqc, f, default_flow_style=False) + + with open("versions.yml", "w") as f: + yaml.dump(versions_this_module, f, default_flow_style=False) + + +if __name__ == "__main__": + main() diff --git a/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test b/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test new file mode 100644 index 00000000..b1e1630b --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test @@ -0,0 +1,43 @@ +nextflow_process { + + name "Test Process CUSTOM_DUMPSOFTWAREVERSIONS" + script "../main.nf" + process "CUSTOM_DUMPSOFTWAREVERSIONS" + tag "modules" + tag "modules_nfcore" + tag "custom" + tag "dumpsoftwareversions" + tag "custom/dumpsoftwareversions" + + test("Should run without failures") { + when { + process { + """ + def tool1_version = ''' + TOOL1: + tool1: 0.11.9 + '''.stripIndent() + + def tool2_version = ''' + TOOL2: + tool2: 1.9 + '''.stripIndent() + + input[0] = Channel.of(tool1_version, tool2_version).collectFile() + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.versions, + file(process.out.mqc_yml[0]).readLines()[0..10], + file(process.out.yml[0]).readLines()[0..7] + ).match() + } + ) + } + } +} diff --git a/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test.snap b/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test.snap new file mode 100644 index 00000000..5f59a936 --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/tests/main.nf.test.snap @@ -0,0 +1,33 @@ +{ + "Should run without failures": { + "content": [ + [ + "versions.yml:md5,76d454d92244589d32455833f7c1ba6d" + ], + [ + "data: \"\\n\\n \\n \\n \\n \\n \\n \\n \\n\\", + " \\n\\n\\n \\n \\n\\", + " \\ \\n\\n\\n\\n \\n \\", + " \\ \\n \\n\\n\\n\\n\\", + " \\n\\n \\n \\n\\", + " \\ \\n\\n\\n\\n\\n\\n \\n\\", + " \\ \\n \\n\\n\\n\\n\\", + " \\n\\n \\n \\n\\" + ], + [ + "CUSTOM_DUMPSOFTWAREVERSIONS:", + " python: 3.11.7", + " yaml: 5.4.1", + "TOOL1:", + " tool1: 0.11.9", + "TOOL2:", + " tool2: '1.9'", + "Workflow:" + ] + ], + "timestamp": "2024-01-09T23:01:18.710682" + } +} \ No newline at end of file diff --git a/modules/nf-core/custom/dumpsoftwareversions/tests/tags.yml b/modules/nf-core/custom/dumpsoftwareversions/tests/tags.yml new file mode 100644 index 00000000..405aa24a --- /dev/null +++ b/modules/nf-core/custom/dumpsoftwareversions/tests/tags.yml @@ -0,0 +1,2 @@ +custom/dumpsoftwareversions: + - modules/nf-core/custom/dumpsoftwareversions/** diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml deleted file mode 100644 index ee5507e0..00000000 --- a/modules/nf-core/fastqc/meta.yml +++ /dev/null @@ -1,57 +0,0 @@ -name: fastqc -description: Run FastQC on sequenced reads -keywords: - - quality control - - qc - - adapters - - fastq -tools: - - fastqc: - description: | - FastQC gives general quality metrics about your reads. - It provides information about the quality score distribution - across your reads, the per base sequence content (%A/C/G/T). - You get information about adapter contamination and other - overrepresented sequences. - homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ - documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ - licence: ["GPL-2.0-only"] -input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. -output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" - - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" -authors: - - "@drpatelh" - - "@grst" - - "@ewels" - - "@FelixKrueger" -maintainers: - - "@drpatelh" - - "@grst" - - "@ewels" - - "@FelixKrueger" diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test deleted file mode 100644 index 70edae4d..00000000 --- a/modules/nf-core/fastqc/tests/main.nf.test +++ /dev/null @@ -1,212 +0,0 @@ -nextflow_process { - - name "Test Process FASTQC" - script "../main.nf" - process "FASTQC" - - tag "modules" - tag "modules_nfcore" - tag "fastqc" - - test("sarscov2 single-end [fastq]") { - - when { - process { - """ - input[0] = Channel.of([ - [ id: 'test', single_end:true ], - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. - // looks like this:
Mon 2 Oct 2023
test.gz
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_single") } - ) - } - } - - test("sarscov2 paired-end [fastq]") { - - when { - process { - """ - input[0] = Channel.of([ - [id: 'test', single_end: false], // meta map - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("") }, - { assert path(process.out.html[0][1][1]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_paired") } - ) - } - } - - test("sarscov2 interleaved [fastq]") { - - when { - process { - """ - input[0] = Channel.of([ - [id: 'test', single_end: false], // meta map - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") } - ) - } - } - - test("sarscov2 paired-end [bam]") { - - when { - process { - """ - input[0] = Channel.of([ - [id: 'test', single_end: false], // meta map - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_bam") } - ) - } - } - - test("sarscov2 multiple [fastq]") { - - when { - process { - """ - input[0] = Channel.of([ - [id: 'test', single_end: false], // meta map - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true), - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ] - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, - { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, - { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("") }, - { assert path(process.out.html[0][1][1]).text.contains("") }, - { assert path(process.out.html[0][1][2]).text.contains("") }, - { assert path(process.out.html[0][1][3]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_multiple") } - ) - } - } - - test("sarscov2 custom_prefix") { - - when { - process { - """ - input[0] = Channel.of([ - [ id:'mysample', single_end:true ], // meta map - file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") } - ) - } - } - - test("sarscov2 single-end [fastq] - stub") { - - options "-stub" - - when { - process { - """ - input[0] = Channel.of([ - [ id: 'test', single_end:true ], - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] - ]) - """ - } - } - - then { - assertAll ( - { assert process.success }, - { assert snapshot(process.out.html.collect { file(it[1]).getName() } + - process.out.zip.collect { file(it[1]).getName() } + - process.out.versions ).match("fastqc_stub") } - ) - } - } - -} diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap deleted file mode 100644 index 86f7c311..00000000 --- a/modules/nf-core/fastqc/tests/main.nf.test.snap +++ /dev/null @@ -1,88 +0,0 @@ -{ - "fastqc_versions_interleaved": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:40:07.293713" - }, - "fastqc_stub": { - "content": [ - [ - "test.html", - "test.zip", - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:31:01.425198" - }, - "fastqc_versions_multiple": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:40:55.797907" - }, - "fastqc_versions_bam": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:40:26.795862" - }, - "fastqc_versions_single": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:39:27.043675" - }, - "fastqc_versions_paired": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:39:47.584191" - }, - "fastqc_versions_custom_prefix": { - "content": [ - [ - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-01-31T17:41:14.576531" - } -} \ No newline at end of file diff --git a/modules/nf-core/fastqc/tests/tags.yml b/modules/nf-core/fastqc/tests/tags.yml deleted file mode 100644 index 7834294b..00000000 --- a/modules/nf-core/fastqc/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -fastqc: - - modules/nf-core/fastqc/** diff --git a/modules/nf-core/gawk/environment.yml b/modules/nf-core/gawk/environment.yml new file mode 100644 index 00000000..34513c7f --- /dev/null +++ b/modules/nf-core/gawk/environment.yml @@ -0,0 +1,7 @@ +name: gawk +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - anaconda::gawk=5.1.0 diff --git a/modules/nf-core/gawk/main.nf b/modules/nf-core/gawk/main.nf new file mode 100644 index 00000000..f856a1f8 --- /dev/null +++ b/modules/nf-core/gawk/main.nf @@ -0,0 +1,54 @@ +process GAWK { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gawk:5.1.0' : + 'biocontainers/gawk:5.1.0' }" + + input: + tuple val(meta), path(input) + path(program_file) + + output: + tuple val(meta), path("${prefix}.${suffix}"), emit: output + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' // args is used for the main arguments of the tool + def args2 = task.ext.args2 ?: '' // args2 is used to specify a program when no program file has been given + prefix = task.ext.prefix ?: "${meta.id}" + suffix = task.ext.suffix ?: "${input.getExtension()}" + + program = program_file ? "-f ${program_file}" : "${args2}" + + """ + awk \\ + ${args} \\ + ${program} \\ + ${input} \\ + > ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//') + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}" + suffix = task.ext.suffix ?: "${input.getExtension}" + + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/gawk/meta.yml b/modules/nf-core/gawk/meta.yml new file mode 100644 index 00000000..2b6033b0 --- /dev/null +++ b/modules/nf-core/gawk/meta.yml @@ -0,0 +1,50 @@ +name: "gawk" +description: | + If you are like many computer users, you would frequently like to make changes in various text files + wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. + The job is easy with awk, especially the GNU implementation gawk. +keywords: + - gawk + - awk + - txt + - text + - file parsing +tools: + - "gawk": + description: "GNU awk" + homepage: "https://www.gnu.org/software/gawk/" + documentation: "https://www.gnu.org/software/gawk/manual/" + tool_dev_url: "https://www.gnu.org/prep/ftp.html" + licence: ["GPL v3"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: The input file - Specify the logic that needs to be executed on this file on the `ext.args2` or in the program file + pattern: "*" + - program_file: + type: file + description: Optional file containing logic for awk to execute. If you don't wish to use a file, you can use `ext.args2` to specify the logic. + pattern: "*" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - output: + type: file + description: The output file - specify the name of this file using `ext.prefix` and the extension using `ext.suffix` + pattern: "*" +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/modules/nf-core/gawk/tests/main.nf.test b/modules/nf-core/gawk/tests/main.nf.test new file mode 100644 index 00000000..fce82ca9 --- /dev/null +++ b/modules/nf-core/gawk/tests/main.nf.test @@ -0,0 +1,56 @@ +nextflow_process { + + name "Test Process GAWK" + script "../main.nf" + process "GAWK" + + tag "modules" + tag "modules_nfcore" + tag "gawk" + + test("convert fasta to bed") { + config "./nextflow.config" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true) + ] + input[1] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("convert fasta to bed with program file") { + config "./nextflow_with_program_file.config" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true) + ] + input[1] = Channel.of('BEGIN {FS="\t"}; {print \$1 FS "0" FS \$2}').collectFile(name:"program.txt") + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} \ No newline at end of file diff --git a/modules/nf-core/gawk/tests/main.nf.test.snap b/modules/nf-core/gawk/tests/main.nf.test.snap new file mode 100644 index 00000000..ce207478 --- /dev/null +++ b/modules/nf-core/gawk/tests/main.nf.test.snap @@ -0,0 +1,68 @@ +{ + "convert fasta to bed with program file": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "1": [ + "versions.yml:md5,4c320d8c98ca80690afd7651da1ba520" + ], + "output": [ + [ + { + "id": "test" + }, + "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "versions": [ + "versions.yml:md5,4c320d8c98ca80690afd7651da1ba520" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.02.0" + }, + "timestamp": "2024-04-05T11:00:28.097563" + }, + "convert fasta to bed": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "1": [ + "versions.yml:md5,4c320d8c98ca80690afd7651da1ba520" + ], + "output": [ + [ + { + "id": "test" + }, + "test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "versions": [ + "versions.yml:md5,4c320d8c98ca80690afd7651da1ba520" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.02.0" + }, + "timestamp": "2024-04-05T10:28:15.625869" + } +} \ No newline at end of file diff --git a/modules/nf-core/gawk/tests/nextflow.config b/modules/nf-core/gawk/tests/nextflow.config new file mode 100644 index 00000000..6e5d43a3 --- /dev/null +++ b/modules/nf-core/gawk/tests/nextflow.config @@ -0,0 +1,6 @@ +process { + withName: GAWK { + ext.suffix = "bed" + ext.args2 = '\'BEGIN {FS="\t"}; {print \$1 FS "0" FS \$2}\'' + } +} diff --git a/modules/nf-core/gawk/tests/nextflow_with_program_file.config b/modules/nf-core/gawk/tests/nextflow_with_program_file.config new file mode 100644 index 00000000..693ad419 --- /dev/null +++ b/modules/nf-core/gawk/tests/nextflow_with_program_file.config @@ -0,0 +1,5 @@ +process { + withName: GAWK { + ext.suffix = "bed" + } +} diff --git a/modules/nf-core/gawk/tests/tags.yml b/modules/nf-core/gawk/tests/tags.yml new file mode 100644 index 00000000..72e4531d --- /dev/null +++ b/modules/nf-core/gawk/tests/tags.yml @@ -0,0 +1,2 @@ +gawk: + - "modules/nf-core/gawk/**" diff --git a/modules/nf-core/glimpse/chunk/environment.yml b/modules/nf-core/glimpse/chunk/environment.yml new file mode 100644 index 00000000..8d71aa91 --- /dev/null +++ b/modules/nf-core/glimpse/chunk/environment.yml @@ -0,0 +1,7 @@ +name: glimpse_chunk +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=1.1.1 diff --git a/modules/nf-core/glimpse/chunk/main.nf b/modules/nf-core/glimpse/chunk/main.nf new file mode 100644 index 00000000..94779846 --- /dev/null +++ b/modules/nf-core/glimpse/chunk/main.nf @@ -0,0 +1,49 @@ +process GLIMPSE_CHUNK { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:1.1.1--h2ce4488_2': + 'biocontainers/glimpse-bio:1.1.1--hce55b13_1' }" + + input: + tuple val(meta), path(input), path(input_index), val(region) + + output: + tuple val(meta), path("*.txt"), emit: chunk_chr + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + + """ + GLIMPSE_chunk \\ + $args \\ + --input $input \\ + --region $region \\ + --thread $task.cpus \\ + --output ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_chunk --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + """ + touch ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_chunk --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse/chunk/meta.yml b/modules/nf-core/glimpse/chunk/meta.yml new file mode 100644 index 00000000..e500d9e9 --- /dev/null +++ b/modules/nf-core/glimpse/chunk/meta.yml @@ -0,0 +1,49 @@ +name: "glimpse_chunk" +description: Defines chunks where to run imputation +keywords: + - chunk + - imputation + - low coverage +tools: + - "glimpse": + description: "GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: | + Target dataset in VCF/BCF format defined at all variable positions. + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - region: + type: string + description: | + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - txt: + type: file + description: Tab delimited output txt file containing buffer and imputation regions. + pattern: "*.{txt}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse/chunk/tests/main.nf.test b/modules/nf-core/glimpse/chunk/tests/main.nf.test new file mode 100644 index 00000000..4c278af1 --- /dev/null +++ b/modules/nf-core/glimpse/chunk/tests/main.nf.test @@ -0,0 +1,36 @@ +nextflow_process { + + name "Test Process GLIMPSE_CHUNK" + script "../main.nf" + process "GLIMPSE_CHUNK" + tag "glimpse" + tag "glimpse/chunk" + tag "modules_nfcore" + tag "modules" + + test("Should run without failures") { + config "modules/nf-core/glimpse/chunk/tests/nextflow.config" + + when { + process { + """ + input[0] = [ + [ id:'input' ], // meta map + file(params.test_data['homo_sapiens']['genome']['mills_and_1000g_indels_21_vcf_gz'], checkIfExists: true), + file(params.test_data['homo_sapiens']['genome']['mills_and_1000g_indels_21_vcf_gz_tbi'], checkIfExists: true), + "chr21" + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/glimpse/chunk/tests/main.nf.test.snap b/modules/nf-core/glimpse/chunk/tests/main.nf.test.snap new file mode 100644 index 00000000..0490a8e4 --- /dev/null +++ b/modules/nf-core/glimpse/chunk/tests/main.nf.test.snap @@ -0,0 +1,31 @@ +{ + "Should run without failures": { + "content": [ + { + "0": [ + [ + { + "id": "input" + }, + "input.txt:md5,9e5562b3f94857b8189b59849ce65cfb" + ] + ], + "1": [ + "versions.yml:md5,a523ef8d6391ddeff47bfd30b606d628" + ], + "chunk_chr": [ + [ + { + "id": "input" + }, + "input.txt:md5,9e5562b3f94857b8189b59849ce65cfb" + ] + ], + "versions": [ + "versions.yml:md5,a523ef8d6391ddeff47bfd30b606d628" + ] + } + ], + "timestamp": "2023-10-16T15:55:52.457257547" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse/chunk/tests/nextflow.config b/modules/nf-core/glimpse/chunk/tests/nextflow.config new file mode 100644 index 00000000..c945152e --- /dev/null +++ b/modules/nf-core/glimpse/chunk/tests/nextflow.config @@ -0,0 +1,9 @@ +process { + withName: GLIMPSE_CHUNK { + ext.args = [ + "--window-size 2000000", + "--buffer-size 200000" + ].join(' ') + ext.prefix = { "${meta.id}" } + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse/chunk/tests/tags.yml b/modules/nf-core/glimpse/chunk/tests/tags.yml new file mode 100644 index 00000000..bd846dfd --- /dev/null +++ b/modules/nf-core/glimpse/chunk/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse/chunk: + - modules/nf-core/glimpse/chunk/** diff --git a/modules/nf-core/glimpse/ligate/environment.yml b/modules/nf-core/glimpse/ligate/environment.yml new file mode 100644 index 00000000..0f9e9a33 --- /dev/null +++ b/modules/nf-core/glimpse/ligate/environment.yml @@ -0,0 +1,7 @@ +name: glimpse_ligate +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=1.1.1 diff --git a/modules/nf-core/glimpse/ligate/main.nf b/modules/nf-core/glimpse/ligate/main.nf new file mode 100644 index 00000000..65425fd5 --- /dev/null +++ b/modules/nf-core/glimpse/ligate/main.nf @@ -0,0 +1,51 @@ +process GLIMPSE_LIGATE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:1.1.1--hce55b13_1': + 'biocontainers/glimpse-bio:1.1.1--hce55b13_1' }" + + input: + tuple val(meta), path(input_list), path(input_index) + + output: + tuple val(meta), path("*.{vcf,bcf,vcf.gz,bcf.gz}"), emit: merged_variants + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + printf "%s\\n" $input_list | tr -d '[],' > all_files.txt + + GLIMPSE_ligate \\ + $args \\ + --input all_files.txt \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_ligate --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + def args = task.ext.args ?: "" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_ligate --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse/ligate/meta.yml b/modules/nf-core/glimpse/ligate/meta.yml new file mode 100644 index 00000000..c3b1485c --- /dev/null +++ b/modules/nf-core/glimpse/ligate/meta.yml @@ -0,0 +1,49 @@ +name: "glimpse_ligate" +description: Concatenates imputation chunks in a single VCF/BCF file ligating phased information. +keywords: + - ligate + - low-coverage + - glimpse + - imputation +tools: + - "glimpse": + description: "GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input_list: + type: file + description: VCF/BCF file containing genotype probabilities (GP field). + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - input_index: + type: file + description: Index file of the input VCF/BCF file containing genotype likelihoods. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - merged_variants: + type: file + description: | + Output VCF/BCF file for the merged regions. + Phased information (HS field) is updated accordingly for the full region. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse/ligate/tests/main.nf.test b/modules/nf-core/glimpse/ligate/tests/main.nf.test new file mode 100644 index 00000000..7289fc91 --- /dev/null +++ b/modules/nf-core/glimpse/ligate/tests/main.nf.test @@ -0,0 +1,76 @@ +nextflow_process { + + name "Test Process GLIMPSE_LIGATE" + script "../main.nf" + process "GLIMPSE_LIGATE" + + tag "modules_nfcore" + tag "modules" + tag "glimpse" + tag "glimpse/ligate" + tag "glimpse/phase" + tag "bcftools/index" + + test("test_glimpse_ligate") { + setup { + run("GLIMPSE_PHASE") { + script "../../phase/main.nf" + process { + """ + ch_sample = Channel.of('NA12878 2').collectFile(name: 'sampleinfos.txt') + region = Channel.fromList([ + ["chr21:16600000-16750000","chr21:16650000-16700000"], + ["chr21:16650000-16800000","chr21:16700000-16750000"] + ]) + input_vcf = Channel.of([ + [ id:'input'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true) + ]) + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + ch_map = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true), + ]) + + input[0] = input_vcf + | combine(ch_sample) + | combine(region) + | combine(ref_panel) + | combine(ch_map) + """ + } + } + run("BCFTOOLS_INDEX") { + script "../../../bcftools/index/main.nf" + process { + """ + input[0] = GLIMPSE_PHASE.out.phased_variants + """ + } + } + } + + when { + process { + """ + input[0] = GLIMPSE_PHASE.out.phased_variants + | groupTuple() + | join (BCFTOOLS_INDEX.out.csi.groupTuple()) + """ + } + } + + then { + def lines = path(process.out.merged_variants.get(0).get(1)).linesGzip.last() + assertAll( + { assert process.success }, + { assert snapshot(process.out.versions).match("versions") }, + { assert snapshot(lines).match("ligate") } + ) + } + + } +} diff --git a/modules/nf-core/glimpse/ligate/tests/main.nf.test.snap b/modules/nf-core/glimpse/ligate/tests/main.nf.test.snap new file mode 100644 index 00000000..8eec1328 --- /dev/null +++ b/modules/nf-core/glimpse/ligate/tests/main.nf.test.snap @@ -0,0 +1,16 @@ +{ + "versions": { + "content": [ + [ + "versions.yml:md5,0cc9dfe9c9c1087666c418aa3379cf85" + ] + ], + "timestamp": "2023-10-17T11:56:25.087453677" + }, + "ligate": { + "content": [ + "chr21\t16799989\t21:16799989:T:C\tT\tC\t.\t.\tRAF=0.000468897;AF=0;INFO=1\tGT:DS:GP:HS\t0/0:0:1,0,0:0" + ], + "timestamp": "2023-10-17T11:56:25.116120487" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse/ligate/tests/tags.yml b/modules/nf-core/glimpse/ligate/tests/tags.yml new file mode 100644 index 00000000..f15d8121 --- /dev/null +++ b/modules/nf-core/glimpse/ligate/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse/ligate: + - modules/nf-core/glimpse/ligate/** diff --git a/modules/nf-core/glimpse/phase/environment.yml b/modules/nf-core/glimpse/phase/environment.yml new file mode 100644 index 00000000..fc79765a --- /dev/null +++ b/modules/nf-core/glimpse/phase/environment.yml @@ -0,0 +1,7 @@ +name: glimpse_phase +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=1.1.1 diff --git a/modules/nf-core/glimpse/phase/main.nf b/modules/nf-core/glimpse/phase/main.nf new file mode 100644 index 00000000..41004e60 --- /dev/null +++ b/modules/nf-core/glimpse/phase/main.nf @@ -0,0 +1,58 @@ +process GLIMPSE_PHASE { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:1.1.1--hce55b13_1': + 'biocontainers/glimpse-bio:1.1.1--hce55b13_1' }" + + input: + tuple val(meta) , path(input), path(input_index), path(samples_file), val(input_region), val(output_region), path(reference), path(reference_index), path(map) + + output: + tuple val(meta), path("*.{vcf,bcf,vcf.gz,bcf.gz}"), emit: phased_variants + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_${input_region.replace(":","_")}" + def suffix = task.ext.suffix ?: "vcf.gz" + + def map_command = map ? "--map $map" :"" + def samples_file_command = samples_file ? "--samples-file $samples_file" :"" + + """ + GLIMPSE_phase \\ + $args \\ + --input $input \\ + --reference $reference \\ + $map_command \\ + $samples_file_command \\ + --input-region $input_region \\ + --output-region $output_region \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_phase --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_${input_region.replace(":","_")}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_phase --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse/phase/meta.yml b/modules/nf-core/glimpse/phase/meta.yml new file mode 100644 index 00000000..862033b7 --- /dev/null +++ b/modules/nf-core/glimpse/phase/meta.yml @@ -0,0 +1,78 @@ +name: "glimpse_phase" +description: main GLIMPSE algorithm, performs phasing and imputation refining genotype likelihoods +keywords: + - phase + - imputation + - low-coverage + - glimpse +tools: + - "glimpse": + description: "GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: Input VCF/BCF file containing genotype likelihoods. + pattern: "*.{vcf.gz,bcf.gz}" + - input_index: + type: file + description: Index file of the input VCF/BCF file containing genotype likelihoods. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - samples_file: + type: file + description: | + File with sample names and ploidy information. + One sample per line with a mandatory second column indicating ploidy (1 or 2). + Sample names that are not present are assumed to have ploidy 2 (diploids). + GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy. + pattern: "*.{txt,tsv}" + - input_region: + type: string + description: Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - output_region: + type: string + description: Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000). + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - reference: + type: file + description: Reference panel of haplotypes in VCF/BCF format. + pattern: "*.{vcf.gz,bcf.gz}" + - reference_index: + type: file + description: Index file of the Reference panel file. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - map: + type: file + description: File containing the genetic map. + pattern: "*.gmap" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - phased_variants: + type: file + description: | + Output VCF/BCF file containing genotype probabilities (GP field), + imputed dosages (DS field), best guess genotypes (GT field), + sampled haplotypes in the last (max 16) main iterations (HS field) and info-score. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse/phase/tests/main.nf.test b/modules/nf-core/glimpse/phase/tests/main.nf.test new file mode 100644 index 00000000..5c92cb1f --- /dev/null +++ b/modules/nf-core/glimpse/phase/tests/main.nf.test @@ -0,0 +1,67 @@ +nextflow_process { + + name "Test Process GLIMPSE_PHASE" + script "../main.nf" + process "GLIMPSE_PHASE" + tag "glimpse" + tag "glimpse/phase" + tag "modules_nfcore" + tag "modules" + + test("test_glimpse_phase") { + + when { + process { + """ + ch_sample = Channel.of([sample:'present']) + | combine(Channel.of('NA12878 2').collectFile(name: 'sampleinfos.txt')) + | concat(Channel.of([[sample: 'absent'], []])) + region = Channel.fromList([ + ["chr21:16600000-16750000","chr21:16650000-16700000"], + ["chr21:16650000-16800000","chr21:16700000-16750000"] + ]) + input_vcf = Channel.of([ + [ id:'input'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true) + ]) + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + ch_map = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true), + ]) + + input[0] = input_vcf + | combine(ch_sample) + | combine(region) + | map { meta, vcf, index, metaS, sample, regionI, regionO -> + [[id: meta.id + "_" + metaS.sample, region : regionI], vcf, index, sample, regionI, regionO] + } + | combine(ref_panel) + | combine(ch_map) + """ + } + } + + then { + String targetFileName = "input_present_chr21_16650000-16800000.vcf.gz" + File selectedFile = process.out.phased_variants.stream() + .filter(vector -> vector.size() > 1) + .map(vector -> new File(vector.get(1).toString())) + .filter(file -> file.getName().equals(targetFileName)) + .findFirst() + .orElse(null) + String selectedFilename = selectedFile != null ? selectedFile.getPath() : null + def lines = path(selectedFilename).linesGzip.last() + assertAll( + { assert process.success }, + { assert snapshot(process.out.versions).match("versions") }, + { assert process.out.phased_variants.size() == 4}, + { assert snapshot(lines).match("imputed") } + ) + } + + } +} diff --git a/modules/nf-core/glimpse/phase/tests/main.nf.test.snap b/modules/nf-core/glimpse/phase/tests/main.nf.test.snap new file mode 100644 index 00000000..d61cf86e --- /dev/null +++ b/modules/nf-core/glimpse/phase/tests/main.nf.test.snap @@ -0,0 +1,19 @@ +{ + "versions": { + "content": [ + [ + "versions.yml:md5,b24f49b2f5989a1f7da32c195334e96b", + "versions.yml:md5,b24f49b2f5989a1f7da32c195334e96b", + "versions.yml:md5,b24f49b2f5989a1f7da32c195334e96b", + "versions.yml:md5,b24f49b2f5989a1f7da32c195334e96b" + ] + ], + "timestamp": "2023-10-17T15:27:55.512415434" + }, + "imputed": { + "content": [ + "chr21\t16799989\t21:16799989:T:C\tT\tC\t.\t.\tRAF=0.000468897;AF=0;INFO=1;BUF=1\tGT:DS:GP:HS\t0/0:0:1,0,0:0" + ], + "timestamp": "2023-10-17T15:27:55.99820664" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse/phase/tests/tags.yml b/modules/nf-core/glimpse/phase/tests/tags.yml new file mode 100644 index 00000000..61c28281 --- /dev/null +++ b/modules/nf-core/glimpse/phase/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse/phase: + - modules/nf-core/glimpse/phase/** diff --git a/modules/nf-core/glimpse2/chunk/environment.yml b/modules/nf-core/glimpse2/chunk/environment.yml new file mode 100644 index 00000000..8b893af7 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/environment.yml @@ -0,0 +1,7 @@ +name: glimpse2_chunk +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=2.0.0 diff --git a/modules/nf-core/glimpse2/chunk/main.nf b/modules/nf-core/glimpse2/chunk/main.nf new file mode 100644 index 00000000..4ff4b2a7 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/main.nf @@ -0,0 +1,63 @@ +process GLIMPSE2_CHUNK { + tag "$meta.id" + label 'process_low' + + beforeScript """ + if cat /proc/cpuinfo | grep avx2 -q + then + echo "Feature AVX2 present on host" + else + echo "Feature AVX2 not present on host" + exit 1 + fi + """ + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:2.0.0--hf340a29_0': + 'biocontainers/glimpse-bio:2.0.0--hf340a29_0' }" + + input: + tuple val(meta) , path(input), path(input_index), val(region) + tuple val(meta2), path(map) + val(model) + + output: + tuple val(meta), path("*.txt"), emit: chunk_chr + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + def map_cmd = map ? "--map ${map}":"" + + """ + GLIMPSE2_chunk \\ + $args \\ + $map_cmd \\ + --${model} \\ + --input $input \\ + --region $region \\ + --threads $task.cpus \\ + --output ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_chunk --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + """ + touch ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_chunk --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse2/chunk/meta.yml b/modules/nf-core/glimpse2/chunk/meta.yml new file mode 100644 index 00000000..759ee024 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/meta.yml @@ -0,0 +1,73 @@ +name: "glimpse2_chunk" +description: Defines chunks where to run imputation +keywords: + - chunk + - low-coverage + - imputation + - glimpse +tools: + - "glimpse2": + description: "GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +requirements: + - AVX2 +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: | + Target dataset in VCF/BCF format defined at all variable positions. + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - input_index: + type: file + description: Index file of the input VCF/BCF file containing genotype likelihoods. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - region: + type: string + description: | + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy. + - meta2: + type: map + description: | + Groovy Map containing genomic map information + e.g. [ map:'GRCh38' ] + - map: + type: file + description: File containing the genetic map. + pattern: "*.gmap" + - model: + type: string + description: | + Algorithm model to use: + "recursive": Recursive algorithm + "sequential": Sequential algorithm (Recommended) + "uniform-number-variants": Experimental. Uniform the number of variants in the sequential algorithm + pattern: "{recursive,sequential,uniform-number-variants}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - chunk_chr: + type: file + description: Tab delimited output txt file containing buffer and imputation regions. + pattern: "*.{txt}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse2/chunk/tests/main.nf.test b/modules/nf-core/glimpse2/chunk/tests/main.nf.test new file mode 100644 index 00000000..0f9e8850 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/tests/main.nf.test @@ -0,0 +1,65 @@ +nextflow_process { + + name "Test Process GLIMPSE2_CHUNK" + script "../main.nf" + process "GLIMPSE2_CHUNK" + tag "glimpse2" + tag "glimpse2/chunk" + tag "modules_nfcore" + tag "modules" + + test("Should run without map") { + config "modules/nf-core/glimpse2/chunk/tests/nextflow.config" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + file("https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file("https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true), + "chr21" + ] + input[1]= [[ id:'map'],[]] + input[2]= "recursive" + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("Should run with map") { + config "modules/nf-core/glimpse2/chunk/tests/nextflow.config" + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + file("https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file("https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true), + "chr21" + ] + input[1]= [[ id:'map'],file("https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true)] + input[2]= "recursive" + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/glimpse2/chunk/tests/main.nf.test.snap b/modules/nf-core/glimpse2/chunk/tests/main.nf.test.snap new file mode 100644 index 00000000..f61ebdcc --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/tests/main.nf.test.snap @@ -0,0 +1,72 @@ +{ + "Should run without map": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,aae05c3099aff601005282744baf8db8" + ] + ], + "1": [ + "versions.yml:md5,f5aa9b92845efdd03350ca7cab08ff6f" + ], + "chunk_chr": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,aae05c3099aff601005282744baf8db8" + ] + ], + "versions": [ + "versions.yml:md5,f5aa9b92845efdd03350ca7cab08ff6f" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-06T14:51:29.494098" + }, + "Should run with map": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,1f7a58d7891e82fa5e9669abdbba5690" + ] + ], + "1": [ + "versions.yml:md5,f5aa9b92845efdd03350ca7cab08ff6f" + ], + "chunk_chr": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,1f7a58d7891e82fa5e9669abdbba5690" + ] + ], + "versions": [ + "versions.yml:md5,f5aa9b92845efdd03350ca7cab08ff6f" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-06T14:51:38.545206" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/chunk/tests/nextflow.config b/modules/nf-core/glimpse2/chunk/tests/nextflow.config new file mode 100644 index 00000000..e5721995 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/tests/nextflow.config @@ -0,0 +1,6 @@ +process { + withName: GLIMPSE2_CHUNK { + ext.prefix = { "${meta.id}" } + } + publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/chunk/tests/tags.yml b/modules/nf-core/glimpse2/chunk/tests/tags.yml new file mode 100644 index 00000000..69cc8b67 --- /dev/null +++ b/modules/nf-core/glimpse2/chunk/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse2/chunk: + - modules/nf-core/glimpse2/chunk/** diff --git a/modules/nf-core/glimpse2/concordance/environment.yml b/modules/nf-core/glimpse2/concordance/environment.yml new file mode 100644 index 00000000..c3ad98fb --- /dev/null +++ b/modules/nf-core/glimpse2/concordance/environment.yml @@ -0,0 +1,7 @@ +name: glimpse2_concordance +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=2.0.0 diff --git a/modules/nf-core/glimpse2/concordance/main.nf b/modules/nf-core/glimpse2/concordance/main.nf new file mode 100644 index 00000000..4fcb587b --- /dev/null +++ b/modules/nf-core/glimpse2/concordance/main.nf @@ -0,0 +1,79 @@ +process GLIMPSE2_CONCORDANCE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:2.0.0--hf340a29_0': + 'biocontainers/glimpse-bio:2.0.0--hf340a29_0' }" + + input: + tuple val(meta), path(estimate), path(estimate_index), path(truth), path(truth_index), path(freq), path(freq_index), path(samples), val(region) + tuple val(meta2), path(groups), val(bins), val(ac_bins), val(allele_counts) + val(min_val_gl) + val(min_val_dp) + + output: + tuple val(meta), path("*.error.cal.txt.gz") , emit: errors_cal + tuple val(meta), path("*.error.grp.txt.gz") , emit: errors_grp + tuple val(meta), path("*.error.spl.txt.gz") , emit: errors_spl + tuple val(meta), path("*.rsquare.grp.txt.gz"), emit: rsquare_grp + tuple val(meta), path("*.rsquare.spl.txt.gz"), emit: rsquare_spl + tuple val(meta), path("*_r2_sites.txt.gz") , emit: rsquare_per_site, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def samples_cmd = samples ? "--samples ${samples}" : "" + def groups_cmd = groups ? "--groups ${groups}" : "" + def bins_cmd = bins ? "--bins ${bins}" : "" + def ac_bins_cmd = ac_bins ? "--ac-bins ${ac_bins}" : "" + def ale_ct_cmd = allele_counts ? "--allele-counts ${allele_counts}" : "" + def region_str = region instanceof List ? region.join('\\n') : region + + if (((groups ? 1:0) + (bins ? 1:0) + (ac_bins ? 1:0) + (allele_counts ? 1:0)) != 1) error "One and only one argument should be selected between groups, bins, ac_bins, allele_counts" + + """ + printf '$region_str' > regions.txt + sed 's/\$/ $freq $truth $estimate/' regions.txt > input.txt + GLIMPSE2_concordance \\ + $args \\ + $samples_cmd \\ + $groups_cmd \\ + $bins_cmd \\ + $ac_bins_cmd \\ + $ale_ct_cmd \\ + --min-val-gl $min_val_gl \\ + --min-val-dp $min_val_dp \\ + --input input.txt \\ + --thread $task.cpus \\ + --output ${prefix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_concordance --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + def rsquare_per_site_cmd = args.contains("--out-r2-per-site") ? "touch ${prefix}_r2_sites.txt.gz" : "" + """ + touch ${prefix}.error.cal.txt.gz + touch ${prefix}.error.grp.txt.gz + touch ${prefix}.error.spl.txt.gz + touch ${prefix}.rsquare.grp.txt.gz + touch ${prefix}.rsquare.spl.txt.gz + ${rsquare_per_site_cmd} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse: "\$(GLIMPSE_concordance --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]')" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse2/concordance/meta.yml b/modules/nf-core/glimpse2/concordance/meta.yml new file mode 100644 index 00000000..7c82c350 --- /dev/null +++ b/modules/nf-core/glimpse2/concordance/meta.yml @@ -0,0 +1,110 @@ +name: "glimpse2_concordance" +description: Program to compute the genotyping error rate at the sample or marker level. +keywords: + - concordance + - low-coverage + - glimpse + - imputation +tools: + - "glimpse2": + description: "GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: "['MIT']" +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - region: + type: string + description: Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). Can also be a list of such regions. + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - freq: + type: file + description: File containing allele frequencies at each site. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - truth: + type: file + description: Validation dataset called at the same positions as the imputed file. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - estimate: + type: file + description: Imputed dataset file obtain after phasing. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - samples: + type: file + description: List of samples to process, one sample ID per line. + pattern: "*.{txt,tsv}" + - groups: + type: file + description: Alternative to frequency bins, group bins are user defined, provided in a file. + pattern: "*.{txt,tsv}" + - bins: + type: string + description: | + Allele frequency bins used for rsquared computations. + By default they should as MAF bins [0-0.5], while + they should take the full range [0-1] if --use-ref-alt is used. + pattern: "0 0.01 0.05 ... 0.5" + - ac_bins: + type: string + description: User-defined allele count bins used for rsquared computations. + pattern: "1 2 5 10 20 ... 100000" + - allele_counts: + type: string + description: | + Default allele count bins used for rsquared computations. + AN field must be defined in the frequency file. + - min_val_gl: + type: float + description: | + Minimum genotype likelihood probability P(G|R) in validation data. + Set to zero to have no filter of if using –gt-validation + - min_val_dp: + type: integer + description: | + Minimum coverage in validation data. + If FORMAT/DP is missing and –min_val_dp > 0, the program exits with an error. + Set to zero to have no filter of if using –gt-validation +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions. + pattern: "versions.yml" + - errors_cal: + type: file + description: Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "*.errors.cal.txt.gz" + - errors_grp: + type: file + description: Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "*.errors.grp.txt.gz" + - errors_spl: + type: file + description: Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "*.errors.spl.txt.gz" + - rsquare_grp: + type: file + description: Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "*.rsquare.grp.txt.gz" + - rsquare_spl: + type: file + description: Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "*.rsquare.spl.txt.gz" + - rsquare_per_site: + type: file + description: Variant r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype. + pattern: "_r2_sites.txt.gz" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse2/ligate/environment.yml b/modules/nf-core/glimpse2/ligate/environment.yml new file mode 100644 index 00000000..67e2c3e6 --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/environment.yml @@ -0,0 +1,7 @@ +name: glimpse2_ligate +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=2.0.0 diff --git a/modules/nf-core/glimpse2/ligate/main.nf b/modules/nf-core/glimpse2/ligate/main.nf new file mode 100644 index 00000000..e58b5939 --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/main.nf @@ -0,0 +1,51 @@ +process GLIMPSE2_LIGATE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:2.0.0--hf340a29_0': + 'biocontainers/glimpse-bio:2.0.0--hf340a29_0' }" + + input: + tuple val(meta), path(input_list), path(input_index) + + output: + tuple val(meta), path("*.{vcf,bcf,vcf.gz,bcf.gz}"), emit: merged_variants + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + printf "%s\\n" $input_list | tr -d '[],' > all_files.txt + + GLIMPSE2_ligate \\ + $args \\ + --input all_files.txt \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_ligate --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_ligate --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse2/ligate/meta.yml b/modules/nf-core/glimpse2/ligate/meta.yml new file mode 100644 index 00000000..7c07973f --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/meta.yml @@ -0,0 +1,49 @@ +name: "glimpse2_ligate" +description: | + Ligatation of multiple phased BCF/VCF files into a single whole chromosome file. + GLIMPSE2 is run in chunks that are ligated into chromosome-wide files maintaining the phasing. +keywords: + - ligate + - low-coverage + - glimpse + - imputation +tools: + - "glimpse2": + description: "GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input_list: + type: file + description: VCF/BCF file containing genotype probabilities (GP field). + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - input_index: + type: file + description: Index file of the input VCF/BCF file containing genotype likelihoods. + pattern: "*.{csi,tbi}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - merged_variants: + type: file + description: Output ligated (phased) file in VCF/BCF format. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/glimpse2/ligate/tests/main.nf.test b/modules/nf-core/glimpse2/ligate/tests/main.nf.test new file mode 100644 index 00000000..d45c448b --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/tests/main.nf.test @@ -0,0 +1,76 @@ +nextflow_process { + + name "Test Process GLIMPSE2_LIGATE" + script "../main.nf" + + process "GLIMPSE2_LIGATE" + + tag "modules_nfcore" + tag "modules" + tag "glimpse2" + tag "glimpse2/ligate" + tag "bcftools/index" + tag "glimpse2/phase" + + test("Should run glimpse ligate") { + setup { + run("GLIMPSE2_PHASE") { + script "../../phase/main.nf" + process { + """ + input_vcf = Channel.of([ + [ id:'input' ], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true), + [], + "chr21:16600000-16800000", + "chr21:16650000-16750000" + ]) + + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + + map_file = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true) + ]) + + // [meta, vcf, index, sample_infos, regionin, regionout,ref, index, map] [meta, fasta, fai] + input[0] = input_vcf + .combine(ref_panel) + .combine(map_file) + input[1] = Channel.of([[],[],[]]) + """ + } + } + run("BCFTOOLS_INDEX") { + script "../../../bcftools/index/main.nf" + process { + """ + input[0] = GLIMPSE2_PHASE.out.phased_variants + """ + } + } + } + + when { + process { + """ + input[0] = GLIMPSE2_PHASE.out.phased_variants + | groupTuple() + | join (BCFTOOLS_INDEX.out.csi.groupTuple()) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.versions).match("versions") }, + { assert file(process.out.merged_variants[0][1]).name == "input.vcf.gz" } + ) + } + + } +} diff --git a/modules/nf-core/glimpse2/ligate/tests/main.nf.test.snap b/modules/nf-core/glimpse2/ligate/tests/main.nf.test.snap new file mode 100644 index 00000000..a1b0b8c8 --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/tests/main.nf.test.snap @@ -0,0 +1,14 @@ +{ + "versions": { + "content": [ + [ + "versions.yml:md5,44addcaef4965ff6409a8293c5bcad84" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T15:52:19.469961519" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/ligate/tests/tags.yml b/modules/nf-core/glimpse2/ligate/tests/tags.yml new file mode 100644 index 00000000..1613896f --- /dev/null +++ b/modules/nf-core/glimpse2/ligate/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse2/ligate: + - modules/nf-core/glimpse2/ligate/** diff --git a/modules/nf-core/glimpse2/phase/environment.yml b/modules/nf-core/glimpse2/phase/environment.yml new file mode 100644 index 00000000..b56a1ee6 --- /dev/null +++ b/modules/nf-core/glimpse2/phase/environment.yml @@ -0,0 +1,7 @@ +name: glimpse2_phase +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=2.0.0 diff --git a/modules/nf-core/glimpse2/phase/main.nf b/modules/nf-core/glimpse2/phase/main.nf new file mode 100644 index 00000000..f61cf022 --- /dev/null +++ b/modules/nf-core/glimpse2/phase/main.nf @@ -0,0 +1,86 @@ +process GLIMPSE2_PHASE { + tag "$meta.id" + label 'process_medium' + + beforeScript """ + if cat /proc/cpuinfo | grep avx2 -q + then + echo "Feature AVX2 present on host" + else + echo "Feature AVX2 not present on host" + exit 1 + fi + """ + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:2.0.0--hf340a29_0': + 'biocontainers/glimpse-bio:2.0.0--hf340a29_0' }" + + input: + tuple val(meta) , path(input), path(input_index), path(samples_file), val(input_region), val(output_region), path(reference), path(reference_index), path(map) + tuple val(meta2), path(fasta_reference), path(fasta_reference_index) + + output: + tuple val(meta), path("*.{vcf,bcf,bgen}"), emit: phased_variants + tuple val(meta), path("*.txt.gz") , emit: stats_coverage, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def region = input_region ? "${output_region.replace(":","_")}" : "${reference}" + def args = task.ext.args ?: "" + def prefix = task.ext.prefix ?: "${meta.id}_${region}" + def suffix = task.ext.suffix ?: "bcf" + + def map_command = map ? "--map $map" : "" + def samples_file_command = samples_file ? "--samples-file $samples_file" : "" + def fasta_command = fasta_reference ? "--fasta $fasta_reference" : "" + def input_region_cmd = input_region ? "--input-region $input_region" : "" + def output_region_cmd = output_region ? "--output-region $output_region": "" + + def input_bam = input.any { it.extension in ["cram","bam"]} + + """ + if $input_bam + then + ls -1 | grep '\\.cram\$\\|\\.bam\$' > all_bam.txt + input_command="--bam-list all_bam.txt" + else + input_command="--input-gl $input" + fi + + GLIMPSE2_phase \\ + $args \\ + \$input_command \\ + --reference $reference \\ + $map_command \\ + $fasta_command \\ + $samples_file_command \\ + $input_region_cmd \\ + $output_region_cmd \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_phase --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def region = input_region ? "${output_region.replace(":","_")}" : "${reference}" + def args = task.ext.args ?: "" + def prefix = task.ext.prefix ?: "${meta.id}_${region}" + def suffix = task.ext.suffix ?: "bcf" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_phase --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse2/phase/meta.yml b/modules/nf-core/glimpse2/phase/meta.yml new file mode 100644 index 00000000..db2595e6 --- /dev/null +++ b/modules/nf-core/glimpse2/phase/meta.yml @@ -0,0 +1,105 @@ +name: "glimpse2_phase" +description: Tool for imputation and phasing from vcf file or directly from bam files. +keywords: + - phasing + - low-coverage + - imputation + - glimpse +tools: + - "glimpse2": + description: "GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test', single_end:false ]` + - input: + type: file + description: | + Either multiple BAM/CRAM files containing low-coverage sequencing reads or one VCF/BCF file containing the genotype likelihoods. + When using BAM/CRAM the name of the file is used as samples name. + pattern: "*.{bam,cram,vcf,vcf.gz,bcf,bcf.gz}" + - input_index: + type: file + description: Index file of the input BAM/CRAM/VCF/BCF file. + pattern: "*.{bam.bai,cram.crai,vcf.gz.csi,bcf.gz.csi}" + - samples_file: + type: file + description: | + File with sample names and ploidy information. + One sample per line with a mandatory second column indicating ploidy (1 or 2). + Sample names that are not present are assumed to have ploidy 2 (diploids). + GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy. + pattern: "*.{txt,tsv}" + - input_region: + type: string + description: | + Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). + Optional if reference panel is in bin format. + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - output_region: + type: string + description: | + Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000). + Optional if reference panel is in bin format. + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - meta2: + type: map + description: | + Groovy Map containing genomic map information + e.g. `[ map:'GRCh38' ]` + - reference: + type: file + description: Reference panel of haplotypes in VCF/BCF format. + pattern: "*.{vcf.gz,bcf.gz}" + - reference_index: + type: file + description: Index file of the Reference panel file. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - map: + type: file + description: | + File containing the genetic map. + Optional if reference panel is in bin format. + pattern: "*.gmap" + - fasta_reference: + type: file + description: | + Faidx-indexed reference sequence file in the appropriate genome build. + Necessary for CRAM files. + pattern: "*.fasta" + - fasta_reference_index: + type: file + description: | + Faidx index of the reference sequence file in the appropriate genome build. + Necessary for CRAM files. + pattern: "*.fai" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test', single_end:false ]` + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - phased_variants: + type: file + description: | + Output VCF/BCF file containing genotype probabilities (GP field), imputed dosages (DS field), best guess genotypes (GT field), sampled haplotypes in the last (max 16) main iterations (HS field) and info-score. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - stats_coverage: + type: file + description: Optional coverage statistic file created when BAM/CRAM files are used as inputs. + pattern: "*.txt.gz" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/modules/nf-core/glimpse2/phase/tests/main.nf.test b/modules/nf-core/glimpse2/phase/tests/main.nf.test new file mode 100644 index 00000000..95c6d9e1 --- /dev/null +++ b/modules/nf-core/glimpse2/phase/tests/main.nf.test @@ -0,0 +1,144 @@ +nextflow_process { + + name "Test Process GLIMPSE2_PHASE" + script "../main.nf" + + process "GLIMPSE2_PHASE" + + tag "modules_nfcore" + tag "modules" + tag "glimpse2" + tag "glimpse2/phase" + + test("Should run with vcf") { + + when { + process { + """ + input_vcf = Channel.of([ + [ id:'input' ], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true), + [], + "chr21:16600000-16800000", + "chr21:16650000-16750000" + ]) + + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + + map_file = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true) + ]) + + // [meta, vcf, index, sample_infos, regionin, regionout,ref, index, map] [meta, fasta, fai] + input[0] = input_vcf + .combine(ref_panel) + .combine(map_file) + input[1] = Channel.of([[],[],[]]) + """ + } + } + + then { + assertAll( + { assert process.success }, + // File has a timestamp in it and is in binary format, so we can only check the name + { assert file(process.out.phased_variants[0][1]).name == "input_chr21_16650000-16750000.bcf" }, + { assert snapshot(process.out.versions).match("VCF")} + ) + } + + } + + test("Should run with bam") { + + when { + process { + """ + input_bam = Channel.of([ + [id:'input'], + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.bam", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.bam.bai", checkIfExists: true), + [], + "chr21:16600000-16800000", + "chr21:16650000-16750000", + ]) + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + + map_file = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true) + ]) + + // [meta, vcf, index, sample_infos, regionin, regionout,ref, index, map] [meta, fasta, fai] + input[0] = input_bam + .combine(ref_panel) + .combine(map_file) + input[1] = Channel.of([[],[],[]]) + """ + } + } + + then { + assertAll( + { assert process.success }, + // File has a timestamp in it and is in binary format, so we can only check the name + { assert file(process.out.phased_variants[0][1]).name == "input_chr21_16650000-16750000.bcf" }, + { assert snapshot(process.out.stats_coverage).match("BAM_coverage")}, + { assert snapshot(process.out.versions).match("BAM")} + ) + } + + } + + test("Should run with cram and reference genome") { + + when { + process { + """ + input_cram = Channel.of([ + [id:'input'], + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.cram", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.cram.crai", checkIfExists: true), + [], + "chr21:16600000-16800000", + "chr21:16650000-16750000", + ]) + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + + map_file = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true) + ]) + reference_genome = Channel.of([ + [id:'refHG38_chr21'], + file(params.modules_testdata_base_path + "delete_me/glimpse/hs38DH.chr21.fa.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/hs38DH.chr21.fa.gz.fai", checkIfExists: true) + ]) + // [meta, vcf, index, sample_infos, regionin, regionout,ref, index, map] [meta, fasta, fai] + input[0] = input_cram + .combine(ref_panel) + .combine(map_file) + input[1] = reference_genome + """ + } + } + + then { + assertAll( + { assert process.success }, + // File has a timestamp in it and is in binary format, so we can only check the name + { assert file(process.out.phased_variants[0][1]).name == "input_chr21_16650000-16750000.bcf" }, + { assert snapshot(process.out.stats_coverage).match("CRAM_coverage")}, + { assert snapshot(process.out.versions).match("CRAM")} + ) + } + } +} diff --git a/modules/nf-core/glimpse2/phase/tests/main.nf.test.snap b/modules/nf-core/glimpse2/phase/tests/main.nf.test.snap new file mode 100644 index 00000000..861f9a70 --- /dev/null +++ b/modules/nf-core/glimpse2/phase/tests/main.nf.test.snap @@ -0,0 +1,72 @@ +{ + "CRAM": { + "content": [ + [ + "versions.yml:md5,c68de03046a6503cdbcf3a1495fc512f" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T16:58:34.365910006" + }, + "VCF": { + "content": [ + [ + "versions.yml:md5,c68de03046a6503cdbcf3a1495fc512f" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T16:58:05.013609832" + }, + "BAM_coverage": { + "content": [ + [ + [ + { + "id": "input" + }, + "input_chr21_16650000-16750000_stats_coverage.txt.gz:md5,9be7101ef4f599416c22fd6160c3b146" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T16:58:13.527360464" + }, + "CRAM_coverage": { + "content": [ + [ + [ + { + "id": "input" + }, + "input_chr21_16650000-16750000_stats_coverage.txt.gz:md5,a2bee17d81568dba62ce4dd430947d29" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T16:58:34.264826731" + }, + "BAM": { + "content": [ + [ + "versions.yml:md5,c68de03046a6503cdbcf3a1495fc512f" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T16:58:13.58159608" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/phase/tests/tags.yml b/modules/nf-core/glimpse2/phase/tests/tags.yml new file mode 100644 index 00000000..ab05b49f --- /dev/null +++ b/modules/nf-core/glimpse2/phase/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse2/phase: + - modules/nf-core/glimpse2/phase/** diff --git a/modules/nf-core/glimpse2/splitreference/environment.yml b/modules/nf-core/glimpse2/splitreference/environment.yml new file mode 100644 index 00000000..a4dd839a --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/environment.yml @@ -0,0 +1,7 @@ +name: glimpse2_splitreference +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::glimpse-bio=2.0.0 diff --git a/modules/nf-core/glimpse2/splitreference/main.nf b/modules/nf-core/glimpse2/splitreference/main.nf new file mode 100644 index 00000000..31b758d3 --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/main.nf @@ -0,0 +1,64 @@ +process GLIMPSE2_SPLITREFERENCE { + tag "$meta.id" + label 'process_low' + + beforeScript """ + if cat /proc/cpuinfo | grep avx2 -q + then + echo "Feature AVX2 present" + else + echo "Feature AVX2 not present on node" + exit 1 + fi + """ + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/glimpse-bio:2.0.0--hf340a29_0': + 'biocontainers/glimpse-bio:2.0.0--hf340a29_0' }" + + input: + tuple val(meta) , path(reference), path(reference_index), val(input_region), val(output_region) + tuple val(meta2), path(map) + + + output: + tuple val(meta), path("*.bin"), emit: bin_ref + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_${output_region.replace(":","_")}" + def map_command = map ? "--map $map" : "" + + """ + GLIMPSE2_split_reference \\ + $args \\ + --reference $reference \\ + $map_command \\ + --input-region $input_region \\ + --output-region $output_region \\ + --thread $task.cpus \\ + --output ${prefix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_split_reference --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}_${output_region.replace(":","_")}" + """ + touch ${prefix}.bin + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + glimpse2: "\$(GLIMPSE2_split_reference --help | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/glimpse2/splitreference/meta.yml b/modules/nf-core/glimpse2/splitreference/meta.yml new file mode 100644 index 00000000..c70ec024 --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/meta.yml @@ -0,0 +1,66 @@ +name: "glimpse2_splitreference" +description: Tool to create a binary reference panel for quick reading time. +keywords: + - split + - reference + - phasing + - imputation +tools: + - "glimpse2": + description: "GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies." + homepage: "https://odelaneau.github.io/GLIMPSE" + documentation: "https://odelaneau.github.io/GLIMPSE/commands.html" + tool_dev_url: "https://github.com/odelaneau/GLIMPSE" + doi: "10.1038/s41588-020-00756-0" + licence: ["MIT"] +requirements: + - AVX2 +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reference: + type: file + description: Reference panel of haplotypes in VCF/BCF format. + pattern: "*.{vcf.gz,bcf.gz}" + - reference_index: + type: file + description: Index file of the Reference panel file. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - input_region: + type: string + description: Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - output_region: + type: string + description: Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000). + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - meta2: + type: map + description: | + Groovy Map containing genomic map information + e.g. `[ map:'GRCh38' ]` + - map: + type: file + description: File containing the genetic map. + pattern: "*.gmap" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - bin_ref: + type: file + description: binary reference panel + pattern: "*.bin" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/modules/nf-core/glimpse2/splitreference/tests/main.nf.test b/modules/nf-core/glimpse2/splitreference/tests/main.nf.test new file mode 100644 index 00000000..be55b4c7 --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/tests/main.nf.test @@ -0,0 +1,70 @@ +nextflow_process { + + name "Test Process GLIMPSE2_SPLITREFERENCE" + script "../main.nf" + config "./nextflow.config" + + process "GLIMPSE2_SPLITREFERENCE" + + tag "modules_nfcore" + tag "modules" + tag "glimpse2" + tag "glimpse2/splitreference" + + test("Should run without map") { + + when { + process { + """ + input[0] = [ + [ id:'ref1000GP', single_end:false ], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true), + "chr21:16600000-16800000", + "chr21:16600000-16800000" + ] + input[1]= [[ id:'map'],[]] + """ + } + } + + then { + assertAll( + { assert process.success }, + // File has a timestamp in it and is in binary format, so we can only check the name + { assert file(process.out.bin_ref[0][1]).name == "ref1000GP_chr21_16600000_16800000.bin" }, + { assert snapshot(process.out.version).match()} + ) + } + + } + + test("Should run with map") { + + when { + process { + """ + input[0] = [ + [ id:'ref1000GP', single_end:false ], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true), + "chr21:16600000-16800000", + "chr21:16600000-16800000" + ] + input[1]= [[ id:'map'], file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true)] + """ + } + } + + then { + assertAll( + { assert process.success }, + // File has a timestamp in it and is in binary format, so we can only check the name + { assert file(process.out.bin_ref[0][1]).name == "ref1000GP_chr21_16600000_16800000.bin" }, + { assert snapshot(process.out.version).match()} + ) + } + + } + +} diff --git a/modules/nf-core/glimpse2/splitreference/tests/main.nf.test.snap b/modules/nf-core/glimpse2/splitreference/tests/main.nf.test.snap new file mode 100644 index 00000000..6e6d64ca --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/tests/main.nf.test.snap @@ -0,0 +1,18 @@ +{ + "Should run without map": { + "content": null, + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T14:52:00.115502" + }, + "Should run with map": { + "content": null, + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-13T14:52:08.29561" + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/splitreference/tests/nextflow.config b/modules/nf-core/glimpse2/splitreference/tests/nextflow.config new file mode 100644 index 00000000..4d6152a8 --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/tests/nextflow.config @@ -0,0 +1,9 @@ +process { + withName: GLIMPSE2_SPLITREFERENCE { + ext.args = [ + "--sparse-maf 0.01", + "--keep-monomorphic-ref-sites" + ].join(' ') + ext.prefix = { "${meta.id}" } + } +} \ No newline at end of file diff --git a/modules/nf-core/glimpse2/splitreference/tests/tags.yml b/modules/nf-core/glimpse2/splitreference/tests/tags.yml new file mode 100644 index 00000000..ce5545c5 --- /dev/null +++ b/modules/nf-core/glimpse2/splitreference/tests/tags.yml @@ -0,0 +1,2 @@ +glimpse2/splitreference: + - modules/nf-core/glimpse2/splitreference/** diff --git a/modules/nf-core/gunzip/environment.yml b/modules/nf-core/gunzip/environment.yml new file mode 100644 index 00000000..25910b34 --- /dev/null +++ b/modules/nf-core/gunzip/environment.yml @@ -0,0 +1,7 @@ +name: gunzip +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - conda-forge::sed=4.7 diff --git a/modules/nf-core/gunzip/main.nf b/modules/nf-core/gunzip/main.nf new file mode 100644 index 00000000..468a6f28 --- /dev/null +++ b/modules/nf-core/gunzip/main.nf @@ -0,0 +1,48 @@ +process GUNZIP { + tag "$archive" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'nf-core/ubuntu:20.04' }" + + input: + tuple val(meta), path(archive) + + output: + tuple val(meta), path("$gunzip"), emit: gunzip + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + gunzip = archive.toString() - '.gz' + """ + # Not calling gunzip itself because it creates files + # with the original group ownership rather than the + # default one for that user / the work directory + gzip \\ + -cd \\ + $args \\ + $archive \\ + > $gunzip + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ + + stub: + gunzip = archive.toString() - '.gz' + """ + touch $gunzip + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/gunzip/meta.yml b/modules/nf-core/gunzip/meta.yml new file mode 100644 index 00000000..231034f2 --- /dev/null +++ b/modules/nf-core/gunzip/meta.yml @@ -0,0 +1,39 @@ +name: gunzip +description: Compresses and decompresses files. +keywords: + - gunzip + - compression + - decompression +tools: + - gunzip: + description: | + gzip is a file format and a software application used for file compression and decompression. + documentation: https://www.gnu.org/software/gzip/manual/gzip.html + licence: ["GPL-3.0-or-later"] +input: + - meta: + type: map + description: | + Optional groovy Map containing meta information + e.g. [ id:'test', single_end:false ] + - archive: + type: file + description: File to be compressed/uncompressed + pattern: "*.*" +output: + - gunzip: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" + - "@jfy133" +maintainers: + - "@joseespinosa" + - "@drpatelh" + - "@jfy133" diff --git a/modules/nf-core/gunzip/tests/main.nf.test b/modules/nf-core/gunzip/tests/main.nf.test new file mode 100644 index 00000000..6406008e --- /dev/null +++ b/modules/nf-core/gunzip/tests/main.nf.test @@ -0,0 +1,36 @@ +nextflow_process { + + name "Test Process GUNZIP" + script "../main.nf" + process "GUNZIP" + tag "gunzip" + tag "modules_nfcore" + tag "modules" + + test("Should run without failures") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ] + ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/gunzip/tests/main.nf.test.snap b/modules/nf-core/gunzip/tests/main.nf.test.snap new file mode 100644 index 00000000..720fd9ff --- /dev/null +++ b/modules/nf-core/gunzip/tests/main.nf.test.snap @@ -0,0 +1,31 @@ +{ + "Should run without failures": { + "content": [ + { + "0": [ + [ + [ + + ], + "test_1.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "1": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ], + "gunzip": [ + [ + [ + + ], + "test_1.fastq:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "versions": [ + "versions.yml:md5,54376d32aca20e937a4ec26dac228e84" + ] + } + ], + "timestamp": "2023-10-17T15:35:37.690477896" + } +} \ No newline at end of file diff --git a/modules/nf-core/gunzip/tests/tags.yml b/modules/nf-core/gunzip/tests/tags.yml new file mode 100644 index 00000000..fd3f6915 --- /dev/null +++ b/modules/nf-core/gunzip/tests/tags.yml @@ -0,0 +1,2 @@ +gunzip: + - modules/nf-core/gunzip/** diff --git a/modules/nf-core/quilt/quilt/environment.yml b/modules/nf-core/quilt/quilt/environment.yml new file mode 100644 index 00000000..a2161a65 --- /dev/null +++ b/modules/nf-core/quilt/quilt/environment.yml @@ -0,0 +1,8 @@ +name: quilt_quilt +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::r-quilt=1.0.5=r43h06b5641_0 + - r-base=4.3.1 diff --git a/modules/nf-core/quilt/quilt/main.nf b/modules/nf-core/quilt/quilt/main.nf new file mode 100644 index 00000000..982479b5 --- /dev/null +++ b/modules/nf-core/quilt/quilt/main.nf @@ -0,0 +1,63 @@ +process QUILT_QUILT { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/r-quilt:1.0.5--r43h06b5641_0': + 'biocontainers/r-quilt:1.0.5--r43h06b5641_0' }" + + input: + tuple val(meta), path(bams), path(bais), path(reference_haplotype_file), path(reference_legend_file), val(chr), val(regions_start), val(regions_end), val(ngen), val(buffer), path(genetic_map_file) + tuple val(meta2), path(posfile), path(phasefile) + tuple val(meta3), path(fasta) + + output: + tuple val(meta), path("*.vcf.gz"), emit: vcf + tuple val(meta), path("*.vcf.gz.tbi"), emit: tbi, optional:true + tuple val(meta), path("RData", type: "dir"), emit: rdata, optional:true + tuple val(meta), path("plots", type: "dir"), emit: plots, optional:true + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def extensions = bams.collect { it.extension } + def extension = extensions.flatten().unique() + def list_command = extension == ["bam"] ? "--bamlist=" : + extension == ["cram"] ? "--reference=${fasta} --cramlist=" : "" + def genetic_map_file_command = genetic_map_file ? "--genetic_map_file=${genetic_map_file}" : "" + def posfile_command = posfile ? "--posfile=${posfile}" : "" + def phasefile_command = phasefile ? "--phasefile=${phasefile}" : "" + if (!(args ==~ /.*--seed.*/)) {args += " --seed=1"} + + """ + printf "%s\\n" $bams | tr -d '[],' > all_files.txt + + QUILT.R \\ + ${list_command}all_files.txt \\ + $genetic_map_file_command \\ + $posfile_command \\ + $phasefile_command \\ + --chr=$chr \\ + --regionStart=$regions_start \\ + --regionEnd=$regions_end \\ + --nGen=$ngen \\ + --buffer=$buffer \\ + --nCores=$task.cpus \\ + --outputdir="." \\ + --reference_haplotype_file=$reference_haplotype_file \\ + --reference_legend_file=$reference_legend_file \\ + $args + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + r-base: \$(Rscript -e "cat(strsplit(R.version[['version.string']], ' ')[[1]][3])") + r-quilt: \$(Rscript -e "cat(as.character(utils::packageVersion(\\"QUILT\\")))") + END_VERSIONS + """ +} diff --git a/modules/nf-core/quilt/quilt/meta.yml b/modules/nf-core/quilt/quilt/meta.yml new file mode 100644 index 00000000..e4653983 --- /dev/null +++ b/modules/nf-core/quilt/quilt/meta.yml @@ -0,0 +1,107 @@ +name: "quilt_quilt" +description: QUILT is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel. +keywords: + - imputation + - low-coverage + - genotype + - genomics + - vcf +tools: + - "quilt": + description: "Read aware low coverage whole genome sequence imputation from a reference panel" + homepage: "https://github.com/rwdavies/quilt" + documentation: "https://github.com/rwdavies/quilt" + tool_dev_url: "https://github.com/rwdavies/quilt" + doi: "10.1038/s41588-021-00877-0" + licence: ["GPL v3"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bams: + type: file + description: (Mandatory) BAM/CRAM files + pattern: "*.{bam,cram,sam}" + - bais: + type: file + description: (Mandatory) BAM/CRAM index files + pattern: "*.{bai}" + - reference_haplotype_file: + type: file + description: (Mandatory) Reference haplotype file in IMPUTE format (file with no header and no rownames, one row per SNP, one column per reference haplotype, space separated, values must be 0 or 1) + pattern: "*.{hap.gz}" + - reference_legend_file: + type: file + description: (Mandatory) Reference haplotype legend file in IMPUTE format (file with one row per SNP, and a header including position for the physical position in 1 based coordinates, a0 for the reference allele, and a1 for the alternate allele). + pattern: "*.{legend.gz}" + - chr: + type: string + description: (Mandatory) What chromosome to run. Should match BAM headers. + - regions_start: + type: integer + description: (Mandatory) When running imputation, where to start from. The 1-based position x is kept if regionStart <= x <= regionEnd. + - regions_end: + type: integer + description: (Mandatory) When running imputation, where to stop. + - buffer: + type: integer + description: Buffer of region to perform imputation over. So imputation is run form regionStart-buffer to regionEnd+buffer, and reported for regionStart to regionEnd, including the bases of regionStart and regionEnd. + - ngen: + type: integer + description: Number of generations since founding or mixing. Note that the algorithm is relatively robust to this. Use nGen = 4 * Ne / K if unsure. + - genetic_map_file: + type: file + description: (Optional) File with genetic map information, a file with 3 white-space delimited entries giving position (1-based), genetic rate map in cM/Mbp, and genetic map in cM. If no file included, rate is based on physical distance and expected rate (expRate). + pattern: "*.{txt.gz}" + - meta2: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - posfile: + type: file + description: (Optional) File with positions of where to impute, lining up one-to-one with genfile. File is tab seperated with no header, one row per SNP, with col 1 = chromosome, col 2 = physical position (sorted from smallest to largest), col 3 = reference base, col 4 = alternate base. Bases are capitalized. + pattern: "*.{txt}" + - phasefile: + type: file + description: (Optional) File with truth phasing results. Supersedes genfile if both options given. File has a header row with a name for each sample, matching what is found in the bam file. Each subject is then a tab seperated column, with 0 = ref and 1 = alt, separated by a vertical bar |, e.g. 0|0 or 0|1. Note therefore this file has one more row than posfile which has no header. + pattern: "*.{txt}" + - meta3: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: (Optional) File with reference genome. + pattern: "*.{txt.gz}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - vcf: + type: file + description: VCF file with both SNP annotation information and per-sample genotype information. + pattern: "*.{vcf.gz}" + - tbi: + type: file + description: TBI file of the VCF. + pattern: "*.{vcf.gz.tbi}" + - rdata: + type: directory + description: Optional directory path to prepared RData file with reference objects (useful with --save_prepared_reference=TRUE). + - plots: + type: directory + description: Optional directory path to save plots. +authors: + - "@atrigila" +maintainers: + - "@atrigila" diff --git a/modules/nf-core/quilt/quilt/tests/main.nf.test b/modules/nf-core/quilt/quilt/tests/main.nf.test new file mode 100644 index 00000000..2d80516d --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/main.nf.test @@ -0,0 +1,132 @@ +// Input data +def path = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/" +def bam = "[${path}NA12878.haplotagged.1.0.bam', checkIfExists: true), ${path}NA12878.ont.1.0.bam', checkIfExists: true), ${path}NA12878.illumina.1.0.bam', checkIfExists: true)]" +def bai = "[${path}NA12878.haplotagged.1.0.bam.bai', checkIfExists: true), ${path}NA12878.ont.1.0.bam.bai', checkIfExists: true),${path}NA12878.illumina.1.0.bam.bai', checkIfExists: true)]" + +// Input reference data +def reference_haplotype_file = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/ALL.chr20_GRCh38.genotypes.20170504.chr20.2000001.2100000.noNA12878.hap.gz', checkIfExists: true)" +def reference_legend_file = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/ALL.chr20_GRCh38.genotypes.20170504.chr20.2000001.2100000.noNA12878.legend.gz', checkIfExists: true)" +def genetic_map_file = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/CEU-chr20-final.b38.txt.gz', checkIfExists: true)" + +// Parameters +def chr = "'chr20'" +def regions_start = "2000001" +def regions_end = "2100000" +def ngen = "100" +def buffer = "10000" + + +// (optional) input truth data +def posfile = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/ALL.chr20_GRCh38.genotypes.20170504.chr20.2000001.2100000.posfile.txt', checkIfExists: true)" +def phasefile = "file('https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/quilt/ALL.chr20_GRCh38.genotypes.20170504.chr20.2000001.2100000.phasefile.txt', checkIfExists: true)" +def posfile_phasefile = "[[ id:'test', chr:'chr20' ], [$posfile], [$phasefile]]" +def fasta = "[[id:'test'], []]" + +// Input channel quilt +def ch_input = "[ id:'test', chr:'chr20' ], $bam, $bai, [$reference_haplotype_file], [$reference_legend_file], $chr, $regions_start, $regions_end, $ngen, $buffer" +def ch_input_gmap = "[$ch_input, [$genetic_map_file]]" +def ch_input_nogmap = "[$ch_input, []]" + +nextflow_process { + + name "Test Process QUILT" + script "../main.nf" + process "QUILT_QUILT" + + tag "modules" + tag "modules_nfcore" + tag "quilt/quilt" + tag "quilt" + + test("QUILT") { + config ("./quilt_default.config") + when { + process { + """ + input[0] = $ch_input_gmap + input[1] = $posfile_phasefile + input[2] = $fasta + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("QUILT no optional files") { + config ("./quilt_default.config") + when { + process { + """ + input[0] = $ch_input_nogmap + input[1] = [[id: null], [], []] + input[2] = $fasta + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("QUILT optional output") { + config ("./quilt_optional.config") + when { + process { + """ + input[0] = $ch_input_gmap + input[1] = $posfile_phasefile + input[2] = $fasta + """ + } + } + + then { + def dir = new File(process.out.plots[0][1]) + def list = [] + dir.eachFileRecurse { file -> list << file.getName() } + assertAll( + { assert process.success }, + { assert snapshot( + process.out.vcf + process.out.tbi + + list.sort() + + process.out.rdata + process.out.versions + ).match() } + ) + } + + } + + test("QUILT no seed") { + config ("./quilt_noseed.config") + when { + process { + """ + input[0] = $ch_input_gmap + input[1] = $posfile_phasefile + input[2] = $fasta + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} \ No newline at end of file diff --git a/modules/nf-core/quilt/quilt/tests/main.nf.test.snap b/modules/nf-core/quilt/quilt/tests/main.nf.test.snap new file mode 100644 index 00000000..191b519a --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/main.nf.test.snap @@ -0,0 +1,376 @@ +{ + "QUILT": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,32f539c80971e2e8e0c31870be094a25" + ] + ], + "1": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,4607cdcb20599cbebd1ccf76d4dc56ae" + ] + ], + "2": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "3": [ + + ], + "4": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ], + "plots": [ + + ], + "rdata": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "tbi": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,4607cdcb20599cbebd1ccf76d4dc56ae" + ] + ], + "vcf": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,32f539c80971e2e8e0c31870be094a25" + ] + ], + "versions": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-23T17:27:54.607934432" + }, + "QUILT no seed": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,32f539c80971e2e8e0c31870be094a25" + ] + ], + "1": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,4607cdcb20599cbebd1ccf76d4dc56ae" + ] + ], + "2": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "3": [ + + ], + "4": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ], + "plots": [ + + ], + "rdata": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "tbi": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,4607cdcb20599cbebd1ccf76d4dc56ae" + ] + ], + "vcf": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,32f539c80971e2e8e0c31870be094a25" + ] + ], + "versions": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-23T17:29:31.357244889" + }, + "QUILT no optional files": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,3fde483728ef2287416b2340c06aaf85" + ] + ], + "1": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,20d9e8cda03fc84482f3aa53a0c94fb6" + ] + ], + "2": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "3": [ + + ], + "4": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ], + "plots": [ + + ], + "rdata": [ + [ + { + "id": "test", + "chr": "chr20" + }, + [ + + ] + ] + ], + "tbi": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,20d9e8cda03fc84482f3aa53a0c94fb6" + ] + ], + "vcf": [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,3fde483728ef2287416b2340c06aaf85" + ] + ], + "versions": [ + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-23T17:28:16.39358682" + }, + "QUILT optional output": { + "content": [ + [ + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz:md5,8352fbcabdd102a8ba2c4490e0834287" + ], + [ + { + "id": "test", + "chr": "chr20" + }, + "quilt.chr20.2000001.2100000.vcf.gz.tbi:md5,88d16933f2ac53058b7a5d5c849dc19a" + ], + "haps.NA12878.chr20.2000001.2100000_igs.1.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.1.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.1.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.1.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.2.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.2.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.2.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.2.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.3.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.3.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.3.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.3.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.4.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.4.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.4.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.4.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.5.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.5.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.5.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.5.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.6.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.6.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.6.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.6.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.7.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.7.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.7.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.7.it3.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.8.0.truth.png", + "haps.NA12878.chr20.2000001.2100000_igs.8.it1.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.8.it2.gibbs.png", + "haps.NA12878.chr20.2000001.2100000_igs.8.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.1.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.1.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.1.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.1.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.2.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.2.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.2.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.2.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.3.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.3.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.3.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.3.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.4.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.4.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.4.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.4.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.5.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.5.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.5.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.5.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.6.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.6.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.6.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.6.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.7.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.7.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.7.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.7.it3.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.8.0.truth.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.8.it1.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.8.it2.gibbs.png", + "haps.NA12878HT.chr20.2000001.2100000_igs.8.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.1.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.1.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.1.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.1.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.2.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.2.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.2.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.2.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.3.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.3.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.3.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.3.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.4.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.4.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.4.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.4.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.5.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.5.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.5.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.5.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.6.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.6.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.6.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.6.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.7.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.7.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.7.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.7.it3.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.8.0.truth.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.8.it1.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.8.it2.gibbs.png", + "haps.NA12878ONT.chr20.2000001.2100000_igs.8.it3.gibbs.png", + [ + { + "id": "test", + "chr": "chr20" + }, + [ + "QUILT_prepared_reference.chr20.2000001.2100000.RData:md5,c2bbcf91085f33536fbaf094b4f0ea05" + ] + ], + "versions.yml:md5,6d07cd60389ff6981a44004872bd16b7" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-23T17:28:59.999377862" + } +} \ No newline at end of file diff --git a/modules/nf-core/quilt/quilt/tests/quilt_default.config b/modules/nf-core/quilt/quilt/tests/quilt_default.config new file mode 100644 index 00000000..87f87b9a --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/quilt_default.config @@ -0,0 +1,6 @@ +process { + cpus = 1 // More than 1 cpu may lead to different md5sum + withName: QUILT_QUILT { + ext.args = "--seed=1" + } +} diff --git a/modules/nf-core/quilt/quilt/tests/quilt_noseed.config b/modules/nf-core/quilt/quilt/tests/quilt_noseed.config new file mode 100644 index 00000000..e9f81a34 --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/quilt_noseed.config @@ -0,0 +1,6 @@ +process { + cpus = 1 // More than 1 cpu may lead to different md5sum + withName: QUILT_QUILT { + ext.args = "" + } +} diff --git a/modules/nf-core/quilt/quilt/tests/quilt_optional.config b/modules/nf-core/quilt/quilt/tests/quilt_optional.config new file mode 100644 index 00000000..cfbd1353 --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/quilt_optional.config @@ -0,0 +1,6 @@ +process { + cpus = 1 // More than 1 cpu may lead to different md5sum + withName: QUILT_QUILT { + ext.args = "--save_prepared_reference=TRUE --make_plots=TRUE --seed=1" + } +} diff --git a/modules/nf-core/quilt/quilt/tests/tags.yml b/modules/nf-core/quilt/quilt/tests/tags.yml new file mode 100644 index 00000000..ac1b9092 --- /dev/null +++ b/modules/nf-core/quilt/quilt/tests/tags.yml @@ -0,0 +1,2 @@ +quilt/quilt: + - "modules/nf-core/quilt/quilt/**" diff --git a/modules/nf-core/samtools/coverage/environment.yml b/modules/nf-core/samtools/coverage/environment.yml new file mode 100644 index 00000000..b5e6b997 --- /dev/null +++ b/modules/nf-core/samtools/coverage/environment.yml @@ -0,0 +1,8 @@ +name: samtools_coverage +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::samtools=1.19.2 + - bioconda::htslib=1.19.1 diff --git a/modules/nf-core/samtools/coverage/main.nf b/modules/nf-core/samtools/coverage/main.nf new file mode 100644 index 00000000..52f3225c --- /dev/null +++ b/modules/nf-core/samtools/coverage/main.nf @@ -0,0 +1,51 @@ +process SAMTOOLS_COVERAGE { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' : + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: + tuple val(meta), path(input), path(input_index), val(region) + tuple val(meta2), path(fasta), path(fai) + + output: + tuple val(meta), path("*.txt"), emit: coverage + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def region_cmd = region ? "--region ${region}" : '' + """ + samtools \\ + coverage \\ + $args \\ + -o ${prefix}.txt \\ + $region_cmd \\ + --reference ${fasta} \\ + $input + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' ) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/samtools/coverage/meta.yml b/modules/nf-core/samtools/coverage/meta.yml new file mode 100644 index 00000000..e74082d9 --- /dev/null +++ b/modules/nf-core/samtools/coverage/meta.yml @@ -0,0 +1,61 @@ +name: "samtools_coverage" +description: produces a histogram or table of coverage per chromosome +keywords: + - depth + - samtools + - bam +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - input_index: + type: file + description: BAM/CRAM index file + pattern: "*.{bai,crai}" + - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'genome' ] + - fasta: + type: file + description: Reference genome file + pattern: "*.{fa,fasta}" + - fai: + type: file + description: Reference genome index file + pattern: "*.fai" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - coverage: + type: file + description: Tabulated text containing the coverage at each position or region or an ASCII-art histogram (with --histogram). + pattern: "*.txt" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/modules/nf-core/samtools/coverage/samtools-coverage.diff b/modules/nf-core/samtools/coverage/samtools-coverage.diff new file mode 100644 index 00000000..a37b6818 --- /dev/null +++ b/modules/nf-core/samtools/coverage/samtools-coverage.diff @@ -0,0 +1,32 @@ +Changes in module 'nf-core/samtools/coverage' +--- modules/nf-core/samtools/coverage/main.nf ++++ modules/nf-core/samtools/coverage/main.nf +@@ -8,7 +8,7 @@ + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: +- tuple val(meta), path(input), path(input_index) ++ tuple val(meta), path(input), path(input_index), val(region) + tuple val(meta2), path(fasta) + tuple val(meta3), path(fai) + +@@ -20,13 +20,15 @@ + task.ext.when == null || task.ext.when + + script: +- def args = task.ext.args ?: '' +- def prefix = task.ext.prefix ?: "${meta.id}" ++ def args = task.ext.args ?: '' ++ def prefix = task.ext.prefix ?: "${meta.id}" ++ def region_cmd = region ? "--region ${region}" : '' + """ + samtools \\ + coverage \\ + $args \\ + -o ${prefix}.txt \\ ++ $region_cmd \\ + --reference ${fasta} \\ + $input + + +************************************************************ diff --git a/modules/nf-core/samtools/coverage/tests/main.nf.test b/modules/nf-core/samtools/coverage/tests/main.nf.test new file mode 100644 index 00000000..1e3ad5a4 --- /dev/null +++ b/modules/nf-core/samtools/coverage/tests/main.nf.test @@ -0,0 +1,105 @@ +nextflow_process { + + name "Test Process SAMTOOLS_COVERAGE" + script "../main.nf" + process "SAMTOOLS_COVERAGE" + + tag "modules" + tag "modules_nfcore" + tag "samtools" + tag "samtools/coverage" + + test("test_samtools_coverage_bam") { + + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam.bai', checkIfExists: true) + ]) + input[1] = Channel.of([ + [ id:'fasta' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = Channel.of([ + [ id:'fai' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.fai', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_coverage_cram") { + + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram.crai', checkIfExists: true) + ]) + input[1] = Channel.of([ + [ id:'fasta' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = Channel.of([ + [ id:'fai' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta.fai', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_coverage_stub") { + + options "-stub" + + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam.bai', checkIfExists: true) + ]) + input[1] = Channel.of([ + [ id:'fasta' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = Channel.of([ + [ id:'fai' ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.fai', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/samtools/coverage/tests/main.nf.test.snap b/modules/nf-core/samtools/coverage/tests/main.nf.test.snap new file mode 100644 index 00000000..cc3ce01c --- /dev/null +++ b/modules/nf-core/samtools/coverage/tests/main.nf.test.snap @@ -0,0 +1,107 @@ +{ + "test_samtools_coverage_stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ], + "coverage": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-02-29T11:08:03.724132" + }, + "test_samtools_coverage_bam": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,99a521b3bf53b6acf8055a44a571ea84" + ] + ], + "1": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ], + "coverage": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,99a521b3bf53b6acf8055a44a571ea84" + ] + ], + "versions": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-02-29T11:36:30.272862" + }, + "test_samtools_coverage_cram": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,ce896534bac51cfcc97e5508ae907e99" + ] + ], + "1": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ], + "coverage": [ + [ + { + "id": "test", + "single_end": false + }, + "test.txt:md5,ce896534bac51cfcc97e5508ae907e99" + ] + ], + "versions": [ + "versions.yml:md5,56e1239217405837de88af882d9d68f6" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-02-29T11:39:08.488488" + } +} \ No newline at end of file diff --git a/modules/nf-core/samtools/coverage/tests/tags.yml b/modules/nf-core/samtools/coverage/tests/tags.yml new file mode 100644 index 00000000..2b4f53c2 --- /dev/null +++ b/modules/nf-core/samtools/coverage/tests/tags.yml @@ -0,0 +1,2 @@ +samtools/coverage: + - "modules/nf-core/samtools/coverage/**" diff --git a/modules/nf-core/samtools/faidx/environment.yml b/modules/nf-core/samtools/faidx/environment.yml new file mode 100644 index 00000000..9c24eb0a --- /dev/null +++ b/modules/nf-core/samtools/faidx/environment.yml @@ -0,0 +1,10 @@ +name: samtools_faidx + +channels: + - conda-forge + - bioconda + - defaults + +dependencies: + - bioconda::htslib=1.19.1 + - bioconda::samtools=1.19.2 diff --git a/modules/nf-core/samtools/faidx/main.nf b/modules/nf-core/samtools/faidx/main.nf new file mode 100644 index 00000000..cfe7ad95 --- /dev/null +++ b/modules/nf-core/samtools/faidx/main.nf @@ -0,0 +1,50 @@ +process SAMTOOLS_FAIDX { + tag "$fasta" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' : + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: + tuple val(meta), path(fasta) + tuple val(meta2), path(fai) + + output: + tuple val(meta), path ("*.{fa,fasta}") , emit: fa , optional: true + tuple val(meta), path ("*.fai") , emit: fai, optional: true + tuple val(meta), path ("*.gzi") , emit: gzi, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + samtools \\ + faidx \\ + $fasta \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + def match = (task.ext.args =~ /-o(?:utput)?\s(.*)\s?/).findAll() + def fastacmd = match[0] ? "touch ${match[0][1]}" : '' + """ + ${fastacmd} + touch ${fasta}.fai + + cat <<-END_VERSIONS > versions.yml + + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/samtools/faidx/meta.yml b/modules/nf-core/samtools/faidx/meta.yml new file mode 100644 index 00000000..f3c25de2 --- /dev/null +++ b/modules/nf-core/samtools/faidx/meta.yml @@ -0,0 +1,65 @@ +name: samtools_faidx +description: Index FASTA file +keywords: + - index + - fasta + - faidx +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test' ] + - fasta: + type: file + description: FASTA file + pattern: "*.{fa,fasta}" + - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test' ] + - fai: + type: file + description: FASTA index file + pattern: "*.{fai}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fa: + type: file + description: FASTA file + pattern: "*.{fa}" + - fai: + type: file + description: FASTA index file + pattern: "*.{fai}" + - gzi: + type: file + description: Optional gzip index file for compressed inputs + pattern: "*.gzi" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" + - "@phue" +maintainers: + - "@drpatelh" + - "@ewels" + - "@phue" diff --git a/modules/nf-core/samtools/faidx/tests/main.nf.test b/modules/nf-core/samtools/faidx/tests/main.nf.test new file mode 100644 index 00000000..17244ef2 --- /dev/null +++ b/modules/nf-core/samtools/faidx/tests/main.nf.test @@ -0,0 +1,122 @@ +nextflow_process { + + name "Test Process SAMTOOLS_FAIDX" + script "../main.nf" + process "SAMTOOLS_FAIDX" + + tag "modules" + tag "modules_nfcore" + tag "samtools" + tag "samtools/faidx" + + test("test_samtools_faidx") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] + + input[1] = [[],[]] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_faidx_bgzip") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.gz', checkIfExists: true)] + + input[1] = [[],[]] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_faidx_fasta") { + + config "./nextflow.config" + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] + + input[1] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.fai', checkIfExists: true) ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_faidx_stub_fasta") { + + config "./nextflow2.config" + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] + + input[1] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.fai', checkIfExists: true) ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_samtools_faidx_stub_fai") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] + + input[1] = [[],[]] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} \ No newline at end of file diff --git a/modules/nf-core/samtools/faidx/tests/main.nf.test.snap b/modules/nf-core/samtools/faidx/tests/main.nf.test.snap new file mode 100644 index 00000000..3e651ef6 --- /dev/null +++ b/modules/nf-core/samtools/faidx/tests/main.nf.test.snap @@ -0,0 +1,249 @@ +{ + "test_samtools_faidx": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ], + "fa": [ + + ], + "fai": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "gzi": [ + + ], + "versions": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T16:22:39.412601" + }, + "test_samtools_faidx_bgzip": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.gz.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.gz.gzi:md5,7dea362b3fac8e00956a4952a3d4f474" + ] + ], + "3": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ], + "fa": [ + + ], + "fai": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.gz.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "gzi": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.gz.gzi:md5,7dea362b3fac8e00956a4952a3d4f474" + ] + ], + "versions": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T16:23:22.427966" + }, + "test_samtools_faidx_fasta": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "extract.fa:md5,6a0774a0ad937ba0bfd2ac7457d90f36" + ] + ], + "1": [ + + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ], + "fa": [ + [ + { + "id": "test", + "single_end": false + }, + "extract.fa:md5,6a0774a0ad937ba0bfd2ac7457d90f36" + ] + ], + "fai": [ + + ], + "gzi": [ + + ], + "versions": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T16:24:04.107537" + }, + "test_samtools_faidx_stub_fasta": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "extract.fa:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "1": [ + + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ], + "fa": [ + [ + { + "id": "test", + "single_end": false + }, + "extract.fa:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "fai": [ + + ], + "gzi": [ + + ], + "versions": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T16:24:45.868463" + }, + "test_samtools_faidx_stub_fai": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "2": [ + + ], + "3": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ], + "fa": [ + + ], + "fai": [ + [ + { + "id": "test", + "single_end": false + }, + "genome.fasta.fai:md5,9da2a56e2853dc8c0b86a9e7229c9fe5" + ] + ], + "gzi": [ + + ], + "versions": [ + "versions.yml:md5,4870fc0a88c616aa937f8325a2db0c3c" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T16:25:27.550554" + } +} \ No newline at end of file diff --git a/modules/nf-core/samtools/faidx/tests/nextflow.config b/modules/nf-core/samtools/faidx/tests/nextflow.config new file mode 100644 index 00000000..f76a3ba0 --- /dev/null +++ b/modules/nf-core/samtools/faidx/tests/nextflow.config @@ -0,0 +1,7 @@ +process { + + withName: SAMTOOLS_FAIDX { + ext.args = 'MT192765.1 -o extract.fa' + } + +} diff --git a/modules/nf-core/samtools/faidx/tests/nextflow2.config b/modules/nf-core/samtools/faidx/tests/nextflow2.config new file mode 100644 index 00000000..33ebbd5d --- /dev/null +++ b/modules/nf-core/samtools/faidx/tests/nextflow2.config @@ -0,0 +1,6 @@ +process { + + withName: SAMTOOLS_FAIDX { + ext.args = '-o extract.fa' + } +} diff --git a/modules/nf-core/samtools/faidx/tests/tags.yml b/modules/nf-core/samtools/faidx/tests/tags.yml new file mode 100644 index 00000000..e4a83948 --- /dev/null +++ b/modules/nf-core/samtools/faidx/tests/tags.yml @@ -0,0 +1,2 @@ +samtools/faidx: + - modules/nf-core/samtools/faidx/** diff --git a/modules/nf-core/samtools/index/environment.yml b/modules/nf-core/samtools/index/environment.yml new file mode 100644 index 00000000..a5e50649 --- /dev/null +++ b/modules/nf-core/samtools/index/environment.yml @@ -0,0 +1,8 @@ +name: samtools_index +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::samtools=1.19.2 + - bioconda::htslib=1.19.1 diff --git a/modules/nf-core/samtools/index/main.nf b/modules/nf-core/samtools/index/main.nf new file mode 100644 index 00000000..dc14f98d --- /dev/null +++ b/modules/nf-core/samtools/index/main.nf @@ -0,0 +1,48 @@ +process SAMTOOLS_INDEX { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' : + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: + tuple val(meta), path(input) + + output: + tuple val(meta), path("*.bai") , optional:true, emit: bai + tuple val(meta), path("*.csi") , optional:true, emit: csi + tuple val(meta), path("*.crai"), optional:true, emit: crai + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + samtools \\ + index \\ + -@ ${task.cpus-1} \\ + $args \\ + $input + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + """ + touch ${input}.bai + touch ${input}.crai + touch ${input}.csi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/samtools/index/meta.yml b/modules/nf-core/samtools/index/meta.yml new file mode 100644 index 00000000..01a4ee03 --- /dev/null +++ b/modules/nf-core/samtools/index/meta.yml @@ -0,0 +1,57 @@ +name: samtools_index +description: Index SAM/BAM/CRAM file +keywords: + - index + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + - crai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + - csi: + type: file + description: CSI index file + pattern: "*.{csi}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" + - "@maxulysse" +maintainers: + - "@drpatelh" + - "@ewels" + - "@maxulysse" diff --git a/modules/nf-core/samtools/index/tests/csi.nextflow.config b/modules/nf-core/samtools/index/tests/csi.nextflow.config new file mode 100644 index 00000000..0ed260ef --- /dev/null +++ b/modules/nf-core/samtools/index/tests/csi.nextflow.config @@ -0,0 +1,7 @@ +process { + + withName: SAMTOOLS_INDEX { + ext.args = '-c' + } + +} diff --git a/modules/nf-core/samtools/index/tests/main.nf.test b/modules/nf-core/samtools/index/tests/main.nf.test new file mode 100644 index 00000000..bb7756d1 --- /dev/null +++ b/modules/nf-core/samtools/index/tests/main.nf.test @@ -0,0 +1,87 @@ +nextflow_process { + + name "Test Process SAMTOOLS_INDEX" + script "../main.nf" + process "SAMTOOLS_INDEX" + tag "modules" + tag "modules_nfcore" + tag "samtools" + tag "samtools/index" + + test("bai") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out.bai).match("bai") }, + { assert snapshot(process.out.versions).match("bai_versions") } + ) + } + } + + test("crai") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.recalibrated.sorted.cram', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out.crai).match("crai") }, + { assert snapshot(process.out.versions).match("crai_versions") } + ) + } + } + + test("csi") { + + config "./csi.nextflow.config" + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert path(process.out.csi.get(0).get(1)).exists() }, + { assert snapshot(process.out.versions).match("csi_versions") } + ) + } + } +} diff --git a/modules/nf-core/samtools/index/tests/main.nf.test.snap b/modules/nf-core/samtools/index/tests/main.nf.test.snap new file mode 100644 index 00000000..3dc8e7de --- /dev/null +++ b/modules/nf-core/samtools/index/tests/main.nf.test.snap @@ -0,0 +1,74 @@ +{ + "crai_versions": { + "content": [ + [ + "versions.yml:md5,cc4370091670b64bba7c7206403ffb3e" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:00.324667957" + }, + "csi_versions": { + "content": [ + [ + "versions.yml:md5,cc4370091670b64bba7c7206403ffb3e" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:07.885103162" + }, + "crai": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.paired_end.recalibrated.sorted.cram.crai:md5,14bc3bd5c89cacc8f4541f9062429029" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T18:41:38.446424" + }, + "bai": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.paired_end.sorted.bam.bai:md5,704c10dd1326482448ca3073fdebc2f4" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T18:40:46.579747" + }, + "bai_versions": { + "content": [ + [ + "versions.yml:md5,cc4370091670b64bba7c7206403ffb3e" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:11:51.641425452" + } +} \ No newline at end of file diff --git a/modules/nf-core/samtools/index/tests/tags.yml b/modules/nf-core/samtools/index/tests/tags.yml new file mode 100644 index 00000000..e0f58a7a --- /dev/null +++ b/modules/nf-core/samtools/index/tests/tags.yml @@ -0,0 +1,2 @@ +samtools/index: + - modules/nf-core/samtools/index/** diff --git a/modules/nf-core/samtools/view/environment.yml b/modules/nf-core/samtools/view/environment.yml new file mode 100644 index 00000000..b5be8bbb --- /dev/null +++ b/modules/nf-core/samtools/view/environment.yml @@ -0,0 +1,10 @@ +name: samtools_view + +channels: + - conda-forge + - bioconda + - defaults + +dependencies: + - bioconda::htslib=1.19.1 + - bioconda::samtools=1.19.2 diff --git a/modules/nf-core/samtools/view/main.nf b/modules/nf-core/samtools/view/main.nf new file mode 100644 index 00000000..76ec127f --- /dev/null +++ b/modules/nf-core/samtools/view/main.nf @@ -0,0 +1,80 @@ +process SAMTOOLS_VIEW { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' : + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: + tuple val(meta), path(input), path(index), val(region), val(subsample) + tuple val(meta2), path(fasta) + path qname + + output: + tuple val(meta), path("*.bam"), emit: bam, optional: true + tuple val(meta), path("*.cram"), emit: cram, optional: true + tuple val(meta), path("*.sam"), emit: sam, optional: true + tuple val(meta), path("*.bai"), emit: bai, optional: true + tuple val(meta), path("*.csi"), emit: csi, optional: true + tuple val(meta), path("*.crai"), emit: crai, optional: true + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reference = fasta ? "--reference ${fasta}" : "" + def readnames = qname ? "--qname-file ${qname}" : "" + def region_cmd = region ? "${region}" : "" + def subsample_cmd = subsample ? "--subsample ${subsample}" : "" + def file_type = args.contains("--output-fmt sam") ? "sam" : + args.contains("--output-fmt bam") ? "bam" : + args.contains("--output-fmt cram") ? "cram" : + input.getExtension() + if ("$input" == "${prefix}.${file_type}") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + samtools \\ + view \\ + --threads ${task.cpus-1} \\ + ${reference} \\ + ${readnames} \\ + $args \\ + ${subsample_cmd} \\ + -o ${prefix}.${file_type} \\ + $input \\ + $args2 \\ + ${region_cmd} + + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def file_type = args.contains("--output-fmt sam") ? "sam" : + args.contains("--output-fmt bam") ? "bam" : + args.contains("--output-fmt cram") ? "cram" : + input.getExtension() + if ("$input" == "${prefix}.${file_type}") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + + def index = args.contains("--write-index") ? "touch ${prefix}.csi" : "" + + """ + touch ${prefix}.${file_type} + ${index} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/samtools/view/meta.yml b/modules/nf-core/samtools/view/meta.yml new file mode 100644 index 00000000..3dadafae --- /dev/null +++ b/modules/nf-core/samtools/view/meta.yml @@ -0,0 +1,89 @@ +name: samtools_view +description: filter/convert SAM/BAM/CRAM file +keywords: + - view + - bam + - sam + - cram +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - index: + type: file + description: BAM.BAI/BAM.CSI/CRAM.CRAI file (optional) + pattern: "*.{.bai,.csi,.crai}" + - meta2: + type: map + description: | + Groovy Map containing reference information + e.g. [ id:'test' ] + - fasta: + type: file + description: Reference file the CRAM was created with (optional) + pattern: "*.{fasta,fa}" + - qname: + type: file + description: Optional file with read names to output only select alignments + pattern: "*.{txt,list}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: optional filtered/converted BAM file + pattern: "*.{bam}" + - cram: + type: file + description: optional filtered/converted CRAM file + pattern: "*.{cram}" + - sam: + type: file + description: optional filtered/converted SAM file + pattern: "*.{sam}" + # bai, csi, and crai are created with `--write-index` + - bai: + type: file + description: optional BAM file index + pattern: "*.{bai}" + - csi: + type: file + description: optional tabix BAM file index + pattern: "*.{csi}" + - crai: + type: file + description: optional CRAM file index + pattern: "*.{crai}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@joseespinosa" + - "@FriederikeHanssen" + - "@priyanka-surana" +maintainers: + - "@drpatelh" + - "@joseespinosa" + - "@FriederikeHanssen" + - "@priyanka-surana" diff --git a/modules/nf-core/samtools/view/samtools-view.diff b/modules/nf-core/samtools/view/samtools-view.diff new file mode 100644 index 00000000..f159cbee --- /dev/null +++ b/modules/nf-core/samtools/view/samtools-view.diff @@ -0,0 +1,56 @@ +Changes in module 'nf-core/samtools/view' +--- modules/nf-core/samtools/view/main.nf ++++ modules/nf-core/samtools/view/main.nf +@@ -8,7 +8,7 @@ + 'biocontainers/samtools:1.19.2--h50ea8bc_0' }" + + input: +- tuple val(meta), path(input), path(index) ++ tuple val(meta), path(input), path(index), val(region), val(subsample) + tuple val(meta2), path(fasta) + path qname + +@@ -28,8 +28,10 @@ + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" +- def reference = fasta ? "--reference ${fasta}" : "" +- def readnames = qname ? "--qname-file ${qname}": "" ++ def reference = fasta ? "--reference ${fasta}" : "" ++ def readnames = qname ? "--qname-file ${qname}" : "" ++ def region_cmd = region ? "${region}" : "" ++ def subsample_cmd = subsample ? "--subsample ${subsample}" : "" + def file_type = args.contains("--output-fmt sam") ? "sam" : + args.contains("--output-fmt bam") ? "bam" : + args.contains("--output-fmt cram") ? "cram" : +@@ -42,9 +44,12 @@ + ${reference} \\ + ${readnames} \\ + $args \\ ++ ${subsample_cmd} \\ + -o ${prefix}.${file_type} \\ + $input \\ +- $args2 ++ $args2 \\ ++ ${region_cmd} ++ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + +--- modules/nf-core/samtools/view/environment.yml ++++ modules/nf-core/samtools/view/environment.yml +@@ -1,8 +1,10 @@ + name: samtools_view ++ + channels: + - conda-forge + - bioconda + - defaults ++ + dependencies: ++ - bioconda::htslib=1.19.1 + - bioconda::samtools=1.19.2 +- - bioconda::htslib=1.19.1 + +************************************************************ diff --git a/modules/nf-core/samtools/view/tests/bam.config b/modules/nf-core/samtools/view/tests/bam.config new file mode 100644 index 00000000..c10d1081 --- /dev/null +++ b/modules/nf-core/samtools/view/tests/bam.config @@ -0,0 +1,3 @@ +process { + ext.args = "--output-fmt bam" +} \ No newline at end of file diff --git a/modules/nf-core/samtools/view/tests/bam_index.config b/modules/nf-core/samtools/view/tests/bam_index.config new file mode 100644 index 00000000..771ae033 --- /dev/null +++ b/modules/nf-core/samtools/view/tests/bam_index.config @@ -0,0 +1,3 @@ +process { + ext.args = "--output-fmt bam --write-index" +} \ No newline at end of file diff --git a/modules/nf-core/samtools/view/tests/main.nf.test b/modules/nf-core/samtools/view/tests/main.nf.test new file mode 100644 index 00000000..45a0defb --- /dev/null +++ b/modules/nf-core/samtools/view/tests/main.nf.test @@ -0,0 +1,212 @@ +nextflow_process { + + name "Test Process SAMTOOLS_VIEW" + script "../main.nf" + process "SAMTOOLS_VIEW" + + tag "modules" + tag "modules_nfcore" + tag "samtools" + tag "samtools/view" + + test("bam") { + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.bam', checkIfExists: true), + [] + ]) + input[1] = [[],[]] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.bam[0][1]).name).match("bam_bam") }, + { assert snapshot(process.out.bai).match("bam_bai") }, + { assert snapshot(process.out.crai).match("bam_crai") }, + { assert snapshot(process.out.cram).match("bam_cram") }, + { assert snapshot(process.out.csi).match("bam_csi") }, + { assert snapshot(process.out.sam).match("bam_sam") }, + { assert snapshot(process.out.versions).match("bam_versions") } + ) + } + } + + test("cram") { + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram.crai', checkIfExists: true) + ]) + input[1] = Channel.of([ + [ id:'genome' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.cram[0][1]).name).match("cram_cram") }, + { assert snapshot(process.out.bai).match("cram_bai") }, + { assert snapshot(process.out.bam).match("cram_bam") }, + { assert snapshot(process.out.crai).match("cram_crai") }, + { assert snapshot(process.out.csi).match("cram_csi") }, + { assert snapshot(process.out.sam).match("cram_sam") }, + { assert snapshot(process.out.versions).match("cram_versions") } + ) + } + } + + test("cram_to_bam") { + + config "./bam.config" + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram', checkIfExists: true), + [] + ]) + input[1] = Channel.of([ + [ id:'genome' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.bam[0][1]).name).match("cram_to_bam_bam") }, + { assert snapshot(process.out.bai).match("cram_to_bam_bai") }, + { assert snapshot(process.out.crai).match("cram_to_bam_crai") }, + { assert snapshot(process.out.cram).match("cram_to_bam_cram") }, + { assert snapshot(process.out.csi).match("cram_to_bam_csi") }, + { assert snapshot(process.out.sam).match("cram_to_bam_sam") }, + { assert snapshot(process.out.versions).match("cram_to_bam_versions") } + ) + } + } + + test("cram_to_bam_index") { + + config "./bam_index.config" + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram', checkIfExists: true), + [] + ]) + input[1] = Channel.of([ + [ id:'genome' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.bam[0][1]).name).match("cram_to_bam_index_bam") }, + { assert snapshot(file(process.out.csi[0][1]).name).match("cram_to_bam_index_csi") }, + { assert snapshot(process.out.bai).match("cram_to_bam_index_bai") }, + { assert snapshot(process.out.crai).match("cram_to_bam_index_crai") }, + { assert snapshot(process.out.cram).match("cram_to_bam_index_cram") }, + { assert snapshot(process.out.sam).match("cram_to_bam_index_sam") }, + { assert snapshot(process.out.versions).match("cram_to_bam_index_versions") } + ) + } + } + + test("cram_to_bam_index_qname") { + + config "./bam_index.config" + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram', checkIfExists: true), + [] + ]) + input[1] = Channel.of([ + [ id:'genome' ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/genome/genome.fasta', checkIfExists: true) + ]) + input[2] = Channel.of("testN:2817", "testN:2814").collectFile(name: "readnames.list", newLine: true) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.bam[0][1]).name).match("cram_to_bam_index_qname_bam") }, + { assert snapshot(file(process.out.csi[0][1]).name).match("cram_to_bam_index_qname_csi") }, + { assert snapshot(process.out.bai).match("cram_to_bam_index_qname_bai") }, + { assert snapshot(process.out.crai).match("cram_to_bam_index_qname_crai") }, + { assert snapshot(process.out.cram).match("cram_to_bam_index_qname_cram") }, + { assert snapshot(process.out.sam).match("cram_to_bam_index_qname_sam") }, + { assert snapshot(process.out.versions).match("cram_to_bam_index_qname_versions") } + ) + } + } + + test("bam_stub") { + + options "-stub" + config "./bam_index.config" + + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.bam', checkIfExists: true), + [] + ]) + input[1] = [[],[]] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(file(process.out.bam[0][1]).name).match("bam_stub_bam") }, + { assert snapshot(file(process.out.csi[0][1]).name).match("bam_stub_csi") }, + { assert snapshot(process.out.bai).match("bam_stub_bai") }, + { assert snapshot(process.out.crai).match("bam_stub_crai") }, + { assert snapshot(process.out.cram).match("bam_stub_cram") }, + { assert snapshot(process.out.sam).match("bam_stub_sam") }, + { assert snapshot(process.out.versions).match("bam_stub_versions") } + ) + } + } +} diff --git a/modules/nf-core/samtools/view/tests/main.nf.test.snap b/modules/nf-core/samtools/view/tests/main.nf.test.snap new file mode 100644 index 00000000..f55943a7 --- /dev/null +++ b/modules/nf-core/samtools/view/tests/main.nf.test.snap @@ -0,0 +1,488 @@ +{ + "bam_bam": { + "content": [ + "test.bam" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.256068" + }, + "cram_to_bam_index_csi": { + "content": [ + "test.bam.csi" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.958617" + }, + "bam_stub_bam": { + "content": [ + "test.bam" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.065301" + }, + "bam_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.258578" + }, + "bam_stub_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.071284" + }, + "bam_stub_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:13:09.713353823" + }, + "cram_to_bam_index_cram": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.972288" + }, + "cram_to_bam_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.999247" + }, + "cram_to_bam_index_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.976457" + }, + "cram_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.497581" + }, + "cram_csi": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.50038" + }, + "cram_to_bam_cram": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.992239" + }, + "cram_to_bam_index_qname_csi": { + "content": [ + "test.bam.csi" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.325496" + }, + "bam_stub_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.079529" + }, + "cram_cram": { + "content": [ + "test.cram" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.490286" + }, + "bam_csi": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.262882" + }, + "cram_to_bam_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.989247" + }, + "cram_to_bam_index_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.967681" + }, + "cram_to_bam_index_qname_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:13:03.935041046" + }, + "cram_to_bam_bam": { + "content": [ + "test.bam" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.982361" + }, + "cram_to_bam_index_bam": { + "content": [ + "test.bam" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.95456" + }, + "cram_to_bam_index_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:55.910685496" + }, + "cram_to_bam_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.98601" + }, + "cram_to_bam_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:47.715221169" + }, + "cram_bam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.495512" + }, + "bam_stub_cram": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.076908" + }, + "cram_to_bam_index_qname_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.328458" + }, + "cram_to_bam_index_qname_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.330789" + }, + "cram_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.493129" + }, + "bam_stub_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.074313" + }, + "cram_to_bam_index_qname_bam": { + "content": [ + "test.bam" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.322874" + }, + "bam_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:31.692607421" + }, + "cram_to_bam_index_qname_cram": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.333248" + }, + "bam_crai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.259774" + }, + "bam_cram": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.261287" + }, + "cram_to_bam_csi": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:04.995454" + }, + "cram_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:56.502625" + }, + "cram_versions": { + "content": [ + [ + "versions.yml:md5,4ea32c57d546102a1b32d9693ada7cf1" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.01.0" + }, + "timestamp": "2024-02-13T16:12:39.913411036" + }, + "bam_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:37:51.264651" + }, + "cram_to_bam_index_bai": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:12.962863" + }, + "cram_to_bam_index_qname_sam": { + "content": [ + [ + + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:23.337634" + }, + "bam_stub_csi": { + "content": [ + "test.csi" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.3" + }, + "timestamp": "2024-02-12T19:38:32.068596" + } +} \ No newline at end of file diff --git a/modules/nf-core/samtools/view/tests/tags.yml b/modules/nf-core/samtools/view/tests/tags.yml new file mode 100644 index 00000000..4fdf1dd1 --- /dev/null +++ b/modules/nf-core/samtools/view/tests/tags.yml @@ -0,0 +1,2 @@ +samtools/view: + - "modules/nf-core/samtools/view/**" diff --git a/modules/nf-core/shapeit5/ligate/environment.yml b/modules/nf-core/shapeit5/ligate/environment.yml new file mode 100644 index 00000000..d4c71302 --- /dev/null +++ b/modules/nf-core/shapeit5/ligate/environment.yml @@ -0,0 +1,7 @@ +name: shapeit5_ligate +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::shapeit5=1.0.0 diff --git a/modules/nf-core/shapeit5/ligate/main.nf b/modules/nf-core/shapeit5/ligate/main.nf new file mode 100644 index 00000000..5624d7d9 --- /dev/null +++ b/modules/nf-core/shapeit5/ligate/main.nf @@ -0,0 +1,51 @@ +process SHAPEIT5_LIGATE { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/shapeit5:1.0.0--h0c8ee15_0': + 'biocontainers/shapeit5:1.0.0--h0c8ee15_0'}" + + input: + tuple val(meta), path(input_list), path (input_list_index) + + output: + tuple val(meta), path("*.{vcf,bcf,vcf.gz,bcf.gz}"), emit: merged_variants + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + printf "%s\\n" $input_list | tr -d '[],' > all_files.txt + + SHAPEIT5_ligate \\ + $args \\ + --input all_files.txt \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + shapeit5: "\$(SHAPEIT5_ligate | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -n 1)" + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + shapeit5: "\$(SHAPEIT5_ligate | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -n 1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/shapeit5/ligate/meta.yml b/modules/nf-core/shapeit5/ligate/meta.yml new file mode 100644 index 00000000..ed1e5e9e --- /dev/null +++ b/modules/nf-core/shapeit5/ligate/meta.yml @@ -0,0 +1,52 @@ +name: "shapeit5_ligate" +description: | + Ligate multiple phased BCF/VCF files into a single whole chromosome file. + Typically run to ligate multiple chunks of phased common variants. +keywords: + - ligate + - haplotype + - shapeit +tools: + - "shapeit5": + description: "Fast and accurate method for estimation of haplotypes (phasing)" + homepage: "https://odelaneau.github.io/shapeit5/" + documentation: "https://odelaneau.github.io/shapeit5/docs/documentation" + tool_dev_url: "https://github.com/odelaneau/shapeit5" + doi: "10.1101/2022.10.19.512867" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input_list: + type: file + description: | + VCF/BCF files containing genotype probabilities (GP field). + The files should be ordered by genomic position. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - input_list_index: + type: file + description: VCF/BCF files index. + pattern: "*.csi" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - merged_variants: + type: file + description: | + Output VCF/BCF file for the merged regions. + Phased information (HS field) is updated accordingly for the full region. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/nf-core/shapeit5/phasecommon/environment.yml b/modules/nf-core/shapeit5/phasecommon/environment.yml new file mode 100644 index 00000000..8bc91822 --- /dev/null +++ b/modules/nf-core/shapeit5/phasecommon/environment.yml @@ -0,0 +1,7 @@ +name: shapeit5_phasecommon +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::shapeit5=1.0.0 diff --git a/modules/nf-core/shapeit5/phasecommon/main.nf b/modules/nf-core/shapeit5/phasecommon/main.nf new file mode 100644 index 00000000..c1fb4e79 --- /dev/null +++ b/modules/nf-core/shapeit5/phasecommon/main.nf @@ -0,0 +1,65 @@ +process SHAPEIT5_PHASECOMMON { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/shapeit5:1.0.0--h0c8ee15_0': + 'biocontainers/shapeit5:1.0.0--h0c8ee15_0'}" + + input: + tuple val(meta) , path(input), path(input_index), path(pedigree), val(region) + tuple val(meta2), path(reference), path(reference_index) + tuple val(meta3), path(scaffold), path(scaffold_index) + tuple val(meta4), path(map) + + output: + tuple val(meta), path("*.{vcf,bcf,vcf.gz,bcf.gz}"), emit: phased_variant + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + + if ("$input" == "${prefix}.${suffix}") error "Input and output names are the same, set prefix in module configuration to disambiguate!" + + def map_command = map ? "--map $map" : "" + def reference_command = reference ? "--reference $reference" : "" + def scaffold_command = scaffold ? "--scaffold $scaffold" : "" + def pedigree_command = pedigree ? "--pedigree $pedigree" : "" + + """ + SHAPEIT5_phase_common \\ + $args \\ + --input $input \\ + $map_command \\ + $reference_command \\ + $scaffold_command \\ + $pedigree_command \\ + --region $region \\ + --thread $task.cpus \\ + --output ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + shapeit5: "\$(SHAPEIT5_phase_common | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def suffix = task.ext.suffix ?: "vcf.gz" + """ + touch ${prefix}.${suffix} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + shapeit5: "\$(SHAPEIT5_phase_common | sed -nr '/Version/p' | grep -o -E '([0-9]+.){1,2}[0-9]' | head -1)" + END_VERSIONS + """ +} diff --git a/modules/nf-core/shapeit5/phasecommon/meta.yml b/modules/nf-core/shapeit5/phasecommon/meta.yml new file mode 100644 index 00000000..5d1381fb --- /dev/null +++ b/modules/nf-core/shapeit5/phasecommon/meta.yml @@ -0,0 +1,79 @@ +name: "shapeit5_phasecommon" +description: Tool to phase common sites, typically SNP array data, or the first step of WES/WGS data. +keywords: + - phasing + - haplotype + - shapeit +tools: + - "shapeit5": + description: "Fast and accurate method for estimation of haplotypes (phasing)" + homepage: "https://odelaneau.github.io/shapeit5/" + documentation: "https://odelaneau.github.io/shapeit5/docs/documentation" + tool_dev_url: "https://github.com/odelaneau/shapeit5" + doi: "10.1101/2022.10.19.512867 " + licence: "['MIT']" +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: | + Target dataset in VCF/BCF format defined at all variable positions. + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - input_index: + type: file + description: Index file of the input VCF/BCF file containing genotype likelihoods. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - pedigree: + type: file + description: | + Pedigree information in the following format: offspring father mother. + pattern: "*.{txt, tsv}" + - region: + type: string + description: | + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + For chrX, please treat PAR and non-PAR regions as different choromosome in order to avoid mixing ploidy. + pattern: "chrXX:leftBufferPosition-rightBufferPosition" + - reference: + type: file + description: Reference panel of haplotypes in VCF/BCF format. + pattern: "*.{vcf.gz,bcf.gz}" + - reference_index: + type: file + description: Index file of the Reference panel file. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - scaffold: + type: file + description: Scaffold of haplotypes in VCF/BCF format. + pattern: "*.{vcf.gz,bcf.gz}" + - scaffold_index: + type: file + description: Index file of the scaffold file. + pattern: "*.{vcf.gz.csi,bcf.gz.csi}" + - map: + type: file + description: File containing the genetic map. + pattern: "*.gmap" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - phased_variants: + type: file + description: Phased haplotypes in VCF/BCF format. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/modules/nf-core/stitch/environment.yml b/modules/nf-core/stitch/environment.yml new file mode 100644 index 00000000..3facc1bc --- /dev/null +++ b/modules/nf-core/stitch/environment.yml @@ -0,0 +1,7 @@ +name: stitch +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::r-stitch=1.6.10 diff --git a/modules/nf-core/stitch/main.nf b/modules/nf-core/stitch/main.nf new file mode 100644 index 00000000..0f8d8109 --- /dev/null +++ b/modules/nf-core/stitch/main.nf @@ -0,0 +1,86 @@ +process STITCH { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/r-stitch:1.6.10--r43h06b5641_0': + 'biocontainers/r-stitch:1.6.10--r43h06b5641_0' }" + + input: + tuple val(meta), path(collected_crams), path(collected_crais), path(cramlist) + tuple val(meta2), path(posfile), path(input, stageAs: "input"), path(rdata, stageAs: "RData_in"), val(chromosome_name), val(K), val(nGen) + tuple val(meta3), path(fasta), path(fasta_fai) + val seed + + output: + tuple val(meta), path("input", type: "dir") , emit: input + tuple val(meta), path("RData", type: "dir") , emit: rdata + tuple val(meta), path("plots", type: "dir") , emit: plots , optional: { generate_input_only } + tuple val(meta), path("*.vcf.gz") , emit: vcf , optional: { generate_input_only || bgen_output } + tuple val(meta), path("*.bgen") , emit: bgen , optional: { generate_input_only || !bgen_output } + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + def args2 = task.ext.args2 ?: "" + def generate_input_only = args2.contains( "--generateInputOnly TRUE" ) + def bgen_output = args2.contains( "--output_format bgen" ) + def reads_ext = collected_crams ? collected_crams.extension.unique() : [] + def rsync_cmd = rdata ? "rsync -rL ${rdata}/ RData" : "" + def stitch_cmd = seed ? "Rscript <(cat \$(which STITCH.R) | tail -n +2 | cat <(echo 'set.seed(${seed})') -)" : "STITCH.R" + def cramlist_cmd = cramlist && reads_ext == ["cram"] ? "--cramlist ${cramlist}" : "" + def bamlist_cmd = cramlist && reads_ext == ["bam" ] ? "--bamlist ${cramlist}" : "" + def reference_cmd = fasta ? "--reference ${fasta}" : "" + def regenerate_input_cmd = input && rdata && !cramlist ? "--regenerateInput FALSE --originalRegionName ${chromosome_name}" : "" + def rsync_version_cmd = rdata ? "rsync: \$(rsync --version | head -n1 | sed 's/^rsync version //; s/ .*\$//')" : "" + """ + ${rsync_cmd} ${args} + + ${stitch_cmd} \\ + --chr ${chromosome_name} \\ + --posfile ${posfile} \\ + --outputdir . \\ + --nCores ${task.cpus} \\ + --K ${K} \\ + --nGen ${nGen} \\ + ${cramlist_cmd} \\ + ${bamlist_cmd} \\ + ${reference_cmd} \\ + ${regenerate_input_cmd} \\ + ${args2} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ${rsync_version_cmd} + r-base: \$(Rscript -e "cat(strsplit(R.version[['version.string']], ' ')[[1]][3])") + r-stitch: \$(Rscript -e "cat(as.character(utils::packageVersion('STITCH')))") + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: "" + def args2 = task.ext.args2 ?: "" + def generate_input_only = args2.contains( "--generateInputOnly TRUE" ) + def generate_plots_cmd = !generate_input_only ? "mkdir plots" : "" + def generate_vcf_cmd = !generate_input_only ? "touch ${prefix}.vcf.gz" : "" + def rsync_version_cmd = rdata ? "rsync: \$(rsync --version | head -n1 | sed 's/^rsync version //; s/ .*\$//')" : "" + """ + touch input + touch RData + ${generate_plots_cmd} + ${generate_vcf_cmd} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ${rsync_version_cmd} + r-base: \$(Rscript -e "cat(strsplit(R.version[['version.string']], ' ')[[1]][3])") + r-stitch: \$(Rscript -e "cat(as.character(utils::packageVersion('STITCH')))") + END_VERSIONS + """ +} diff --git a/modules/nf-core/stitch/meta.yml b/modules/nf-core/stitch/meta.yml new file mode 100644 index 00000000..a36d61cd --- /dev/null +++ b/modules/nf-core/stitch/meta.yml @@ -0,0 +1,120 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/yaml-schema.json +name: "stitch" +description: "STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format." +keywords: + - imputation + - genomics + - vcf + - bgen + - cram + - bam + - sam +tools: + - "stitch": + description: "STITCH - Sequencing To Imputation Through Constructing Haplotypes" + homepage: "https://github.com/rwdavies/stitch" + documentation: "https://github.com/rwdavies/stitch" + tool_dev_url: "https://github.com/rwdavies/stitch" + doi: "10.1038/ng.3594" + licence: "['GPL v3']" +input: + - meta: + type: map + description: | + Groovy Map containing information about the set of positions to run the imputation over + e.g. `[ id:'test' ]` + - posfile: + type: file + description: | + Tab-separated file describing the variable positions to be used for imputation. Refer to the documentation for the `--posfile` argument of STITCH for more information. + pattern: "*.tsv" + - input: + type: directory + description: | + Folder of pre-generated input RData objects used when STITCH is called with the `--regenerateInput FALSE` flag. It is generated by running STITCH with the `--generateInputOnly TRUE` flag. + pattern: "input" + - rdata: + type: directory + description: | + Folder of pre-generated input RData objects used when STITCH is called with the `--regenerateInput FALSE` flag. It is generated by running STITCH with the `--generateInputOnly TRUE` flag. + pattern: "RData" + - chromosome_name: + type: string + description: Name of the chromosome to impute. Should match a chromosome name in the reference genome. + - K: + type: integer + description: Number of ancestral haplotypes to use for imputation. Refer to the documentation for the `--K` argument of STITCH for more information. + - nGen: + type: integer + description: Number of generations since founding of the population to use for imputation. Refer to the documentation for the `--nGen` argument of STITCH for more information. + - meta2: + type: map + description: | + Groovy Map containing information about the set of samples + e.g. `[ id:'test' ]` + - collected_crams: + type: file + description: List of sorted BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - collected_crais: + type: file + description: List of BAM/CRAM/SAM index files + pattern: "*.{bai,crai,sai}" + - cramlist: + type: file + description: | + Text file with the path to the cram files to use in imputation, one per line. Since the cram files are staged to the working directory for the process, this file should just contain the file names without any pre-pending path. + pattern: "*.txt" + - meta3: + type: map + description: | + Groovy Map containing information about the reference genome used + e.g. `[ id:'test' ]` + - fasta: + type: file + description: FASTA reference genome file + pattern: "*.{fa,fasta}" + - fasta_fai: + type: file + description: FASTA index file + pattern: "*.{fai}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test' ]` + - input: + type: directory + description: | + Folder of pre-generated input RData objects used when STITCH is called with the `--regenerateInput FALSE` flag. It is generated by running STITCH with the `--generateInputOnly TRUE` flag. + pattern: "input" + - rdata: + type: directory + description: | + Folder of pre-generated input RData objects used when STITCH is called with the `--regenerateInput FALSE` flag. It is generated by running STITCH with the `--generateInputOnly TRUE` flag. + pattern: "RData" + - plots: + type: directory + description: | + Folder containing plots produced by STITCH during imputation. Which plots are produced depends on the command-line arguments passed to STITCH. + pattern: "plots" + - vcf: + type: file + description: | + Imputed genotype calls for the positions in `posfile`, in vcf format. This is the default output. + pattern: ".vcf.gz" + - bgen: + type: file + description: | + Imputed genotype calls for the positions in `posfile`, in vcf format. This is the produced if `--output_format bgen` is specified. + pattern: ".bgen" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@saulpierotti" +maintainers: + - "@saulpierotti" diff --git a/modules/nf-core/stitch/stitch.diff b/modules/nf-core/stitch/stitch.diff new file mode 100644 index 00000000..0a987c1b --- /dev/null +++ b/modules/nf-core/stitch/stitch.diff @@ -0,0 +1,24 @@ +Changes in module 'nf-core/stitch' +--- modules/nf-core/stitch/main.nf ++++ modules/nf-core/stitch/main.nf +@@ -8,8 +8,8 @@ + 'biocontainers/r-stitch:1.6.10--r43h06b5641_0' }" + + input: +- tuple val(meta) , path(posfile), path(input, stageAs: "input"), path(rdata, stageAs: "RData_in"), val(chromosome_name), val(K), val(nGen) +- tuple val(meta2), path(collected_crams), path(collected_crais), path(cramlist) ++ tuple val(meta), path(collected_crams), path(collected_crais), path(cramlist) ++ tuple val(meta2), path(posfile), path(input, stageAs: "input"), path(rdata, stageAs: "RData_in"), val(chromosome_name), val(K), val(nGen) + tuple val(meta3), path(fasta), path(fasta_fai) + val seed + + +--- modules/nf-core/stitch/meta.yml ++++ modules/nf-core/stitch/meta.yml +@@ -117,4 +117,4 @@ + authors: + - "@saulpierotti" + maintainers: +- - "@saulpierotti" ++ - "@saulpierotti" +************************************************************ diff --git a/modules/nf-core/tabix/bgzip/environment.yml b/modules/nf-core/tabix/bgzip/environment.yml new file mode 100644 index 00000000..361c078b --- /dev/null +++ b/modules/nf-core/tabix/bgzip/environment.yml @@ -0,0 +1,8 @@ +name: tabix_bgzip +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::tabix=1.11 + - bioconda::htslib=1.19.1 diff --git a/modules/nf-core/tabix/bgzip/main.nf b/modules/nf-core/tabix/bgzip/main.nf new file mode 100644 index 00000000..3065dab0 --- /dev/null +++ b/modules/nf-core/tabix/bgzip/main.nf @@ -0,0 +1,55 @@ +process TABIX_BGZIP { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/htslib:1.19.1--h81da01d_1' : + 'biocontainers/htslib:1.19.1--h81da01d_1' }" + + input: + tuple val(meta), path(input) + + output: + tuple val(meta), path("${output}") , emit: output + tuple val(meta), path("${output}.gzi"), emit: gzi, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + in_bgzip = ["gz", "bgz", "bgzf"].contains(input.getExtension()) + extension = in_bgzip ? input.getBaseName().tokenize(".")[-1] : input.getExtension() + output = in_bgzip ? "${prefix}.${extension}" : "${prefix}.${extension}.gz" + command = in_bgzip ? '-d' : '' + // Name the index according to $prefix, unless a name has been requested + if ((args.matches("(^| )-i\\b") || args.matches("(^| )--index(\$| )")) && !args.matches("(^| )-I\\b") && !args.matches("(^| )--index-name\\b")) { + args = args + " -I ${output}.gzi" + } + """ + bgzip $command -c $args -@${task.cpus} $input > ${output} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}" + in_bgzip = ["gz", "bgz", "bgzf"].contains(input.getExtension()) + output = in_bgzip ? input.getBaseName() : "${prefix}.${input.getExtension()}.gz" + + """ + echo "" | gzip > ${output} + touch ${output}.gzi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/tabix/bgzip/meta.yml b/modules/nf-core/tabix/bgzip/meta.yml new file mode 100644 index 00000000..621d49ea --- /dev/null +++ b/modules/nf-core/tabix/bgzip/meta.yml @@ -0,0 +1,52 @@ +name: tabix_bgzip +description: Compresses/decompresses files +keywords: + - compress + - decompress + - bgzip + - tabix +tools: + - bgzip: + description: | + Bgzip compresses or decompresses files in a similar manner to, and compatible with, gzip. + homepage: https://www.htslib.org/doc/tabix.html + documentation: http://www.htslib.org/doc/bgzip.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: file to compress or to decompress +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - output: + type: file + description: Output compressed/decompressed file + pattern: "*." + - gzi: + type: file + description: Optional gzip index file for compressed inputs + pattern: "*.gzi" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" + - "@maxulysse" + - "@nvnieuwk" +maintainers: + - "@joseespinosa" + - "@drpatelh" + - "@maxulysse" + - "@nvnieuwk" diff --git a/modules/nf-core/tabix/bgzip/tests/bgzip_compress.config b/modules/nf-core/tabix/bgzip/tests/bgzip_compress.config new file mode 100644 index 00000000..6b6ff55f --- /dev/null +++ b/modules/nf-core/tabix/bgzip/tests/bgzip_compress.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_BGZIP { + ext.args = ' -i' + } +} diff --git a/modules/nf-core/tabix/bgzip/tests/main.nf.test b/modules/nf-core/tabix/bgzip/tests/main.nf.test new file mode 100644 index 00000000..95fd4c50 --- /dev/null +++ b/modules/nf-core/tabix/bgzip/tests/main.nf.test @@ -0,0 +1,111 @@ +nextflow_process { + + name "Test Process TABIX_BGZIP" + script "modules/nf-core/tabix/bgzip/main.nf" + process "TABIX_BGZIP" + + tag "modules" + tag "modules_nfcore" + tag "tabix" + tag "tabix/bgzip" + + test("sarscov2_vcf_bgzip_compress") { + when { + process { + """ + input[0] = [ + [ id:'bgzip_test' ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.output[0][1]).name + ).match("bgzip_test") + } + ) + } + } + + test("homo_genome_bedgz_compress") { + when { + process { + """ + input[0] = [ + [ id:'bedgz_test' ], + [ file(params.test_data['homo_sapiens']['genome']['genome_bed_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.output[0][1]).name + ).match("bedgz_test") + } + ) + } + } + + test("sarscov2_vcf_bgzip_compress_stub") { + options '-stub' + config "./bgzip_compress.config" + + when { + process { + """ + input[0] = [ + [ id:"test_stub" ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.output[0][1]).name + ).match("test_stub") + } + ) + } + } + + test("sarscov2_vcf_bgzip_compress_gzi") { + config "./bgzip_compress.config" + when { + process { + """ + input[0] = [ + [ id:"gzi_compress_test" ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.gzi[0][1]).name + ).match("gzi_compress_test") + } + ) + } + } +} diff --git a/modules/nf-core/tabix/bgzip/tests/main.nf.test.snap b/modules/nf-core/tabix/bgzip/tests/main.nf.test.snap new file mode 100644 index 00000000..53d59932 --- /dev/null +++ b/modules/nf-core/tabix/bgzip/tests/main.nf.test.snap @@ -0,0 +1,186 @@ +{ + "gzi_compress_test": { + "content": [ + "gzi_compress_test.vcf.gz.gzi" + ], + "timestamp": "2024-02-19T14:52:29.328146" + }, + "homo_genome_bedgz_compress": { + "content": [ + { + "0": [ + [ + { + "id": "bedgz_test" + }, + "bedgz_test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "1": [ + + ], + "2": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ], + "gzi": [ + + ], + "output": [ + [ + { + "id": "bedgz_test" + }, + "bedgz_test.bed:md5,87a15eb9c2ff20ccd5cd8735a28708f7" + ] + ], + "versions": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ] + } + ], + "timestamp": "2024-02-19T14:52:12.422209" + }, + "test_stub": { + "content": [ + "test_stub.vcf.gz" + ], + "timestamp": "2024-02-19T14:52:20.811489" + }, + "sarscov2_vcf_bgzip_compress": { + "content": [ + { + "0": [ + [ + { + "id": "bgzip_test" + }, + "bgzip_test.vcf.gz:md5,8e722884ffb75155212a3fc053918766" + ] + ], + "1": [ + + ], + "2": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ], + "gzi": [ + + ], + "output": [ + [ + { + "id": "bgzip_test" + }, + "bgzip_test.vcf.gz:md5,8e722884ffb75155212a3fc053918766" + ] + ], + "versions": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ] + } + ], + "timestamp": "2024-02-19T14:52:03.706028" + }, + "sarscov2_vcf_bgzip_compress_gzi": { + "content": [ + { + "0": [ + [ + { + "id": "gzi_compress_test" + }, + "gzi_compress_test.vcf.gz:md5,8e722884ffb75155212a3fc053918766" + ] + ], + "1": [ + [ + { + "id": "gzi_compress_test" + }, + "gzi_compress_test.vcf.gz.gzi:md5,26fd00d4e26141cd11561f6e7d4a2ad0" + ] + ], + "2": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ], + "gzi": [ + [ + { + "id": "gzi_compress_test" + }, + "gzi_compress_test.vcf.gz.gzi:md5,26fd00d4e26141cd11561f6e7d4a2ad0" + ] + ], + "output": [ + [ + { + "id": "gzi_compress_test" + }, + "gzi_compress_test.vcf.gz:md5,8e722884ffb75155212a3fc053918766" + ] + ], + "versions": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ] + } + ], + "timestamp": "2024-02-19T14:52:29.271494" + }, + "bgzip_test": { + "content": [ + "bgzip_test.vcf.gz" + ], + "timestamp": "2024-02-19T14:52:03.768295" + }, + "bedgz_test": { + "content": [ + "bedgz_test.bed" + ], + "timestamp": "2024-02-19T14:52:12.453855" + }, + "sarscov2_vcf_bgzip_compress_stub": { + "content": [ + { + "0": [ + [ + { + "id": "test_stub" + }, + "test_stub.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test_stub" + }, + "test_stub.vcf.gz.gzi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ], + "gzi": [ + [ + { + "id": "test_stub" + }, + "test_stub.vcf.gz.gzi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "output": [ + [ + { + "id": "test_stub" + }, + "test_stub.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,e023292de6ee109a44fc67475d658174" + ] + } + ], + "timestamp": "2024-02-19T14:52:20.769619" + } +} \ No newline at end of file diff --git a/modules/nf-core/tabix/bgzip/tests/tags.yml b/modules/nf-core/tabix/bgzip/tests/tags.yml new file mode 100644 index 00000000..de0eec86 --- /dev/null +++ b/modules/nf-core/tabix/bgzip/tests/tags.yml @@ -0,0 +1,2 @@ +tabix/bgzip: + - "modules/nf-core/tabix/bgzip/**" diff --git a/modules/nf-core/tabix/bgzip/tests/vcf_none.config b/modules/nf-core/tabix/bgzip/tests/vcf_none.config new file mode 100644 index 00000000..f3a3c467 --- /dev/null +++ b/modules/nf-core/tabix/bgzip/tests/vcf_none.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_BGZIP { + ext.args = '' + } +} diff --git a/modules/nf-core/tabix/tabix/environment.yml b/modules/nf-core/tabix/tabix/environment.yml new file mode 100644 index 00000000..76b45e16 --- /dev/null +++ b/modules/nf-core/tabix/tabix/environment.yml @@ -0,0 +1,8 @@ +name: tabix_tabix +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - bioconda::tabix=1.11 + - bioconda::htslib=1.19.1 diff --git a/modules/nf-core/tabix/tabix/main.nf b/modules/nf-core/tabix/tabix/main.nf new file mode 100644 index 00000000..1737141d --- /dev/null +++ b/modules/nf-core/tabix/tabix/main.nf @@ -0,0 +1,42 @@ +process TABIX_TABIX { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/htslib:1.19.1--h81da01d_1' : + 'biocontainers/htslib:1.19.1--h81da01d_1' }" + + input: + tuple val(meta), path(tab) + + output: + tuple val(meta), path("*.tbi"), optional:true, emit: tbi + tuple val(meta), path("*.csi"), optional:true, emit: csi + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + tabix $args $tab + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ + + stub: + """ + touch ${tab}.tbi + touch ${tab}.csi + cat <<-END_VERSIONS > versions.yml + + "${task.process}": + tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/tabix/tabix/meta.yml b/modules/nf-core/tabix/tabix/meta.yml new file mode 100644 index 00000000..ae5b4f43 --- /dev/null +++ b/modules/nf-core/tabix/tabix/meta.yml @@ -0,0 +1,49 @@ +name: tabix_tabix +description: create tabix index from a sorted bgzip tab-delimited genome file +keywords: + - index + - tabix + - vcf +tools: + - tabix: + description: Generic indexer for TAB-delimited genome position files. + homepage: https://www.htslib.org/doc/tabix.html + documentation: https://www.htslib.org/doc/tabix.1.html + doi: 10.1093/bioinformatics/btq671 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - tab: + type: file + description: TAB-delimited genome position file compressed with bgzip + pattern: "*.{bed.gz,gff.gz,sam.gz,vcf.gz}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - tbi: + type: file + description: tabix index file + pattern: "*.{tbi}" + - csi: + type: file + description: coordinate sorted index file + pattern: "*.{csi}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" + - "@maxulysse" +maintainers: + - "@joseespinosa" + - "@drpatelh" + - "@maxulysse" diff --git a/modules/nf-core/tabix/tabix/tests/main.nf.test b/modules/nf-core/tabix/tabix/tests/main.nf.test new file mode 100644 index 00000000..3a150c70 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/main.nf.test @@ -0,0 +1,142 @@ +nextflow_process { + + name "Test Process TABIX_TABIX" + script "modules/nf-core/tabix/tabix/main.nf" + process "TABIX_TABIX" + + tag "modules" + tag "modules_nfcore" + tag "tabix" + tag "tabix/tabix" + + test("sarscov2_bedgz_tbi") { + config "./tabix_bed.config" + when { + process { + """ + input[0] = [ + [ id:'tbi_bed' ], + [ file(params.test_data['sarscov2']['genome']['test_bed_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.tbi[0][1]).name + ).match("tbi_bed") + } + ) + } + } + + test("sarscov2_gff_tbi") { + config "./tabix_gff.config" + when { + process { + """ + input[0] = [ + [ id:'tbi_gff' ], + [ file(params.test_data['sarscov2']['genome']['genome_gff3_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.tbi[0][1]).name + ).match("tbi_gff") + } + ) + } + + } + + test("sarscov2_vcf_tbi") { + config "./tabix_vcf_tbi.config" + when { + process { + """ + input[0] = [ + [ id:'tbi_vcf' ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.tbi[0][1]).name + ).match("tbi_vcf") + } + ) + } + + } + + test("sarscov2_vcf_csi") { + config "./tabix_vcf_csi.config" + when { + process { + """ + input[0] = [ + [ id:'vcf_csi' ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.csi[0][1]).name + ).match("vcf_csi") + } + ) + } + + } + + test("sarscov2_vcf_csi_stub") { + config "./tabix_vcf_csi.config" + options "-stub" + when { + process { + """ + input[0] = [ + [ id:'vcf_csi_stub' ], + [ file(params.test_data['sarscov2']['illumina']['test_vcf_gz'], checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() }, + { assert snapshot( + file(process.out.csi[0][1]).name + ).match("vcf_csi_stub") + } + ) + } + + } + +} diff --git a/modules/nf-core/tabix/tabix/tests/main.nf.test.snap b/modules/nf-core/tabix/tabix/tests/main.nf.test.snap new file mode 100644 index 00000000..034e38b6 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/main.nf.test.snap @@ -0,0 +1,217 @@ +{ + "vcf_csi_stub": { + "content": [ + "test.vcf.gz.csi" + ], + "timestamp": "2024-03-04T14:51:59.788002" + }, + "tbi_gff": { + "content": [ + "genome.gff3.gz.tbi" + ], + "timestamp": "2024-02-19T14:53:37.420216" + }, + "sarscov2_gff_tbi": { + "content": [ + { + "0": [ + [ + { + "id": "tbi_gff" + }, + "genome.gff3.gz.tbi:md5,53fc683fd217aae47ef10d23c52a9178" + ] + ], + "1": [ + + ], + "2": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ], + "csi": [ + + ], + "tbi": [ + [ + { + "id": "tbi_gff" + }, + "genome.gff3.gz.tbi:md5,53fc683fd217aae47ef10d23c52a9178" + ] + ], + "versions": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ] + } + ], + "timestamp": "2024-02-19T14:53:37.388157" + }, + "sarscov2_bedgz_tbi": { + "content": [ + { + "0": [ + [ + { + "id": "tbi_bed" + }, + "test.bed.gz.tbi:md5,0f17d85e7f0a042b2aa367b70df224f8" + ] + ], + "1": [ + + ], + "2": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ], + "csi": [ + + ], + "tbi": [ + [ + { + "id": "tbi_bed" + }, + "test.bed.gz.tbi:md5,0f17d85e7f0a042b2aa367b70df224f8" + ] + ], + "versions": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ] + } + ], + "timestamp": "2024-02-19T14:53:28.879408" + }, + "tbi_vcf": { + "content": [ + "test.vcf.gz.tbi" + ], + "timestamp": "2024-02-19T14:53:46.402522" + }, + "vcf_csi": { + "content": [ + "test.vcf.gz.csi" + ], + "timestamp": "2024-02-19T14:53:54.921189" + }, + "sarscov2_vcf_tbi": { + "content": [ + { + "0": [ + [ + { + "id": "tbi_vcf" + }, + "test.vcf.gz.tbi:md5,897f3f378a811b90e6dee56ce08d2bcf" + ] + ], + "1": [ + + ], + "2": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ], + "csi": [ + + ], + "tbi": [ + [ + { + "id": "tbi_vcf" + }, + "test.vcf.gz.tbi:md5,897f3f378a811b90e6dee56ce08d2bcf" + ] + ], + "versions": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ] + } + ], + "timestamp": "2024-02-19T14:53:46.370358" + }, + "sarscov2_vcf_csi_stub": { + "content": [ + { + "0": [ + [ + { + "id": "vcf_csi_stub" + }, + "test.vcf.gz.tbi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "vcf_csi_stub" + }, + "test.vcf.gz.csi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,3d45df6d80883bad358631069a2940fd" + ], + "csi": [ + [ + { + "id": "vcf_csi_stub" + }, + "test.vcf.gz.csi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "tbi": [ + [ + { + "id": "vcf_csi_stub" + }, + "test.vcf.gz.tbi:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,3d45df6d80883bad358631069a2940fd" + ] + } + ], + "timestamp": "2024-03-04T14:51:59.766184" + }, + "sarscov2_vcf_csi": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "vcf_csi" + }, + "test.vcf.gz.csi:md5,0731ad6f40104d2bbb1a2cc478ef8f03" + ] + ], + "2": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ], + "csi": [ + [ + { + "id": "vcf_csi" + }, + "test.vcf.gz.csi:md5,0731ad6f40104d2bbb1a2cc478ef8f03" + ] + ], + "tbi": [ + + ], + "versions": [ + "versions.yml:md5,f4feeda7fdd4b567102f7f8e5d7037a3" + ] + } + ], + "timestamp": "2024-02-19T14:53:54.886876" + }, + "tbi_bed": { + "content": [ + "test.bed.gz.tbi" + ], + "timestamp": "2024-02-19T14:53:28.947628" + } +} \ No newline at end of file diff --git a/modules/nf-core/tabix/tabix/tests/tabix_bed.config b/modules/nf-core/tabix/tabix/tests/tabix_bed.config new file mode 100644 index 00000000..7ff05905 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/tabix_bed.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_TABIX { + ext.args = '-p bed' + } +} \ No newline at end of file diff --git a/modules/nf-core/tabix/tabix/tests/tabix_gff.config b/modules/nf-core/tabix/tabix/tests/tabix_gff.config new file mode 100644 index 00000000..20c0a1e3 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/tabix_gff.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_TABIX { + ext.args = '-p gff' + } +} \ No newline at end of file diff --git a/modules/nf-core/tabix/tabix/tests/tabix_vcf_csi.config b/modules/nf-core/tabix/tabix/tests/tabix_vcf_csi.config new file mode 100644 index 00000000..eb4f2d7e --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/tabix_vcf_csi.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_TABIX { + ext.args = '-p vcf --csi' + } +} diff --git a/modules/nf-core/tabix/tabix/tests/tabix_vcf_tbi.config b/modules/nf-core/tabix/tabix/tests/tabix_vcf_tbi.config new file mode 100644 index 00000000..2774c8a9 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/tabix_vcf_tbi.config @@ -0,0 +1,5 @@ +process { + withName: TABIX_TABIX { + ext.args = '-p vcf' + } +} \ No newline at end of file diff --git a/modules/nf-core/tabix/tabix/tests/tags.yml b/modules/nf-core/tabix/tabix/tests/tags.yml new file mode 100644 index 00000000..6eda0653 --- /dev/null +++ b/modules/nf-core/tabix/tabix/tests/tags.yml @@ -0,0 +1,2 @@ +tabix/tabix: + - "modules/nf-core/tabix/tabix/**" diff --git a/nextflow.config b/nextflow.config index d3ade32d..cd7c6e23 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,21 +9,53 @@ // Global default params, used in configs params { - // TODO nf-core: Specify your pipeline's command line flags + // step + step = null + // Input options - input = null + input = null + input_region = null + map = null + tools = null + + // Panel preparation + panel = null + phased = null + rename_chr = false + // References - genome = null - igenomes_base = 's3://ngi-igenomes/igenomes/' - igenomes_ignore = false + genome = null + igenomes_base = 's3://ngi-igenomes/igenomes/' + igenomes_ignore = false + fasta = null + fasta_fai = null // MultiQC options - multiqc_config = null - multiqc_title = null - multiqc_logo = null - max_multiqc_email_size = '25.MB' + multiqc_config = null + multiqc_title = null + multiqc_logo = null + max_multiqc_email_size = '25.MB' multiqc_methods_description = null + // Simulate + depth = 1 + genotype = null + + // Validation + input_truth = null + bins = "0 0.01 0.05 0.1 0.2 0.5" + min_val_gl = 0.9 + min_val_dp = 5 + + // QUILT + ngen = 100 + buffer = 10000 + + // STITCH + k_val = 2 + seed = 1 + posfile = null + // Boilerplate options outdir = null publish_dir_mode = 'copy' @@ -34,21 +66,22 @@ params { hook_url = null help = false version = false - pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' + pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/' + // Config options - config_profile_name = null - config_profile_description = null - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - config_profile_contact = null - config_profile_url = null + config_profile_name = null + config_profile_description = null + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + config_profile_contact = null + config_profile_url = null // Max resource options // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' // Schema validation default options validationFailUnrecognisedParams = false @@ -62,6 +95,9 @@ params { // Load base.config by default for all pipelines includeConfig 'conf/base.config' +// Load base.config by default for all pipelines +includeConfig 'conf/igenomes.config' + // Load nf-core custom profiles from different Institutions try { includeConfig "${params.custom_config_base}/nfcore_custom.config" @@ -75,6 +111,7 @@ try { } catch (Exception e) { System.err.println("WARNING: Could not load nf-core/config/phaseimpute profiles: ${params.custom_config_base}/pipeline/phaseimpute.config") } + profiles { debug { dumpHashes = true @@ -174,8 +211,15 @@ profiles { executor.cpus = 4 executor.memory = 8.GB } - test { includeConfig 'conf/test.config' } - test_full { includeConfig 'conf/test_full.config' } + + test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } + test_sim { includeConfig 'conf/test_sim.config' } + test_validate { includeConfig 'conf/test_validate.config' } + test_all { includeConfig 'conf/test_all.config' } + test_quilt { includeConfig 'conf/test_quilt.config' } + test_stitch { includeConfig 'conf/test_stitch.config' } + } // Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile @@ -246,6 +290,24 @@ manifest { // Load modules.config for DSL2 module specific options includeConfig 'conf/modules.config' +// initialisation step +includeConfig 'conf/steps/initialisation.config' + +// simulation step +includeConfig 'conf/steps/simulation.config' + +// panel_prep step +includeConfig 'conf/steps/panel_prep.config' + +// imputation step +includeConfig 'conf/steps/imputation.config' +includeConfig 'conf/steps/imputation_glimpse1.config' +includeConfig 'conf/steps/imputation_quilt.config' +includeConfig 'conf/steps/imputation_stitch.config' + +// validation step +includeConfig 'conf/steps/validation.config' + // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { diff --git a/nextflow_schema.json b/nextflow_schema.json index e772ba30..215a0b52 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -10,7 +10,7 @@ "type": "object", "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data.", - "required": ["input", "outdir"], + "required": ["outdir"], "properties": { "input": { "type": "string", @@ -23,12 +23,24 @@ "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/phaseimpute/usage#samplesheet-input).", "fa_icon": "fas fa-file-csv" }, + "input_region": { + "type": "string", + "description": "Region of the genome to use (optional: if no file given, the whole genome will be used). The file should be a comma-separated file with 3 columns, and a header row.", + "schema": "assets/schema_input_region.json", + "format": "file-path", + "pattern": "^\\S+\\.csv$" + }, "outdir": { "type": "string", "format": "directory-path", "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", "fa_icon": "fas fa-folder-open" }, + "rename_chr": { + "type": "boolean", + "description": "Should the panel vcf files be renamed to match the reference genome (e.g. 'chr1' -> '1')", + "pattern": "true|false" + }, "email": { "type": "string", "description": "Email address for completion summary.", @@ -40,6 +52,99 @@ "type": "string", "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", "fa_icon": "fas fa-file-signature" + }, + "step": { + "type": "string", + "description": "Step to run.", + "fa_icon": "fas fa-step-forward", + "pattern": "^((all|simulate|panelprep|impute|validate)?,?)*(? 0, the program exits with an error. Set to zero to have no filter of if using \u2013gt-validation", + "default": 5, + "pattern": "^\\d+$" } } }, @@ -62,9 +167,29 @@ "mimetype": "text/plain", "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", "description": "Path to FASTA genome file.", - "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "help_text": "This parameter is *mandatory* if `--genome` is not specified.", "fa_icon": "far fa-file-code" }, + "fasta_fai": { + "type": "string", + "format": "file-path", + "exists": true, + "mimetype": "text/plain", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?\\.fai$", + "description": "Path to FASTA index genome file.", + "help_text": "This parameter is *optional* even if `--genome` is not specified.", + "fa_icon": "far fa-file-code" + }, + "map": { + "type": "string", + "format": "file-path", + "exists": true, + "description": "Path to gmap genome file.", + "help_text": "This parameter is *optional*. This is used to refine the imputation process to match the recombination event rate in your specie.", + "fa_icon": "far fa-file-code", + "mimetype": "text/csv", + "schema": "assets/schema_map.json" + }, "igenomes_ignore": { "type": "boolean", "description": "Do not load the iGenomes reference config.", @@ -270,16 +395,69 @@ "type": "string", "fa_icon": "far fa-check-circle", "description": "Base URL or local path to location of pipeline test dataset files", - "default": "https://raw.githubusercontent.com/nf-core/test-datasets/", + "default": "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/", "hidden": true } } + }, + "quilt_parameters": { + "title": "QUILT parameters", + "type": "object", + "description": "Arguments to customize QUILT run", + "default": "", + "properties": { + "buffer": { + "type": "integer", + "default": 10000, + "description": "Buffer of region to perform imputation over. So imputation is run form regionStart-buffer to regionEnd+buffer, and reported for regionStart to regionEnd, including the bases of regionStart and regionEnd." + }, + "ngen": { + "type": "integer", + "default": 100, + "description": "Number of generations since founding of the population to use for imputation." + } + } + }, + "stitch_parameters": { + "title": "STITCH parameters", + "type": "object", + "description": "Arguments to customize STITCH run", + "default": "", + "properties": { + "seed": { + "type": "integer", + "default": 1 + }, + "posfile": { + "type": "string", + "description": "Path to comma-separated file containing tab-separated files describing the variable positions to be used for imputation. Refer to the documentation for the `--posfile` argument of STITCH for more information.", + "format": "file-path", + "schema": "assets/schema_posfile.json", + "pattern": "^\\S+\\.(csv|tsv|txt)$", + "mimetype": "text/csv", + "help_text": "" + }, + "k_val": { + "type": "integer", + "default": 2, + "description": "Number of ancestral haplotypes to use for imputation. Refer to the documentation for the `--K` argument of STITCH for more information." + } + } } }, "allOf": [ { "$ref": "#/definitions/input_output_options" }, + { + "$ref": "#/definitions/simulate" + }, + { + "$ref": "#/definitions/panelprep" + }, + { + "$ref": "#/definitions/validation" + }, { "$ref": "#/definitions/reference_genome_options" }, @@ -291,6 +469,12 @@ }, { "$ref": "#/definitions/generic_options" + }, + { + "$ref": "#/definitions/quilt_parameters" + }, + { + "$ref": "#/definitions/stitch_parameters" } ] } diff --git a/nf-core-phaseimpute_logo_light.png b/nf-core-phaseimpute_logo_light.png new file mode 100644 index 00000000..767e1d57 Binary files /dev/null and b/nf-core-phaseimpute_logo_light.png differ diff --git a/nf-test.config b/nf-test.config new file mode 100644 index 00000000..b466a958 --- /dev/null +++ b/nf-test.config @@ -0,0 +1,13 @@ +config { + // location for all nf-tests + testsDir "." + + // nf-test directory including temporary files for each test + workDir System.getenv("NXF_TEST_DIR") ?: ".nf-test" + + // location of an optional nextflow.config file specific for executing tests + configFile "tests/config/nf-test.config" + + // run all test with the defined docker profile from the main nextflow.config + profile "" +} diff --git a/subworkflows/local/bam_downsample/main.nf b/subworkflows/local/bam_downsample/main.nf new file mode 100644 index 00000000..106cf2a3 --- /dev/null +++ b/subworkflows/local/bam_downsample/main.nf @@ -0,0 +1,66 @@ +include { SAMTOOLS_COVERAGE } from '../../../modules/nf-core/samtools/coverage/main.nf' +include { SAMTOOLS_INDEX } from '../../../modules/nf-core/samtools/index/main.nf' +include { SAMTOOLS_VIEW } from '../../../modules/nf-core/samtools/view/main.nf' + +workflow BAM_DOWNSAMPLE { + + take: + ch_bam // channel: [ [id, genome, chr, region], bam, bai ] + ch_depth // channel: [ [depth], depth ] + ch_fasta // channel: [ [genome], fasta, fai ] + + main: + ch_versions = Channel.empty() + + // Add region to channel + ch_coverage = ch_bam + .map{ metaICR, bam, index -> + [ metaICR, bam, index, metaICR["region"] ] + } + + // Get coverage of the region + SAMTOOLS_COVERAGE ( ch_coverage, ch_fasta ) // [ meta, bam, bai, region], [ meta, fasta, fai ] + ch_versions = ch_versions.mix(SAMTOOLS_COVERAGE.out.versions.first()) + + // Compute mean depth of the region + ch_mean_depth = SAMTOOLS_COVERAGE.out.coverage + .splitCsv(header: true, sep:'\t') + .map{ metaICR, row -> + [ metaICR,"${row.meandepth}" as Float ] + } + + // Compute downsampling factor + ch_depth_factor = ch_mean_depth + .combine(ch_depth) + .map{ metaICR, mean, metaD, depth -> + [ metaICR, metaICR + metaD, depth as Float / mean ] + } + + // Add all necessary channel for downsampling + ch_input_downsample = ch_coverage + .combine(ch_depth_factor, by : 0) + .map{ metaICR, bam, index, region, metaICRD, depth -> + [ metaICRD, bam, index, region, depth ] + } + + // Downsample + SAMTOOLS_VIEW( + ch_input_downsample, + ch_fasta.map{ metaG, fasta, fai -> [metaG, fasta] }, + [] + ) + ch_versions = ch_versions.mix(SAMTOOLS_VIEW.out.versions.first()) + + // Index result + SAMTOOLS_INDEX(SAMTOOLS_VIEW.out.bam) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions.first()) + + // Aggregate bam and index + ch_bam_emul = SAMTOOLS_VIEW.out.bam + .combine(SAMTOOLS_INDEX.out.bai, by:0) + + emit: + bam_emul = ch_bam_emul // channel: [ [id, genome, chr, region, depth], bam, bai ] + coverage = SAMTOOLS_COVERAGE.out.coverage // channel: [ [id, genome, chr, region, depth], txt ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/bam_impute_stitch/bam_impute_stitch.nf b/subworkflows/local/bam_impute_stitch/bam_impute_stitch.nf new file mode 100644 index 00000000..ea162fd0 --- /dev/null +++ b/subworkflows/local/bam_impute_stitch/bam_impute_stitch.nf @@ -0,0 +1,31 @@ +include { STITCH } from '../../../modules/nf-core/stitch/main' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index/main' + + +workflow BAM_IMPUTE_STITCH { + + take: + ch_parameters + ch_samples + ch_fasta + + main: + + ch_versions = Channel.empty() + + // Run STITCH + seed = params.seed + STITCH( ch_samples, ch_parameters, ch_fasta, seed ) + + // Index imputed annotated VCF + BCFTOOLS_INDEX(STITCH.out.vcf) + + // Join VCFs and TBIs + ch_vcf_tbi = STITCH.out.vcf.join(BCFTOOLS_INDEX.out.tbi) + + + emit: + vcf_tbi = ch_vcf_tbi // channel: [ meta, vcf, tbi ] + versions = ch_versions // channel: [ versions.yml ] + +} diff --git a/subworkflows/local/bam_region/main.nf b/subworkflows/local/bam_region/main.nf new file mode 100644 index 00000000..1968cd38 --- /dev/null +++ b/subworkflows/local/bam_region/main.nf @@ -0,0 +1,39 @@ +include { SAMTOOLS_INDEX } from '../../../modules/nf-core/samtools/index/main.nf' +include { SAMTOOLS_VIEW } from '../../../modules/nf-core/samtools/view/main.nf' + +workflow BAM_REGION { + + take: + ch_bam // channel: [ [id], bam, bai ] + ch_region // channel: [ [chr, region], val(chr:start-end) ] + ch_fasta // channel: [ [genome], fasta, fai ] + main: + + ch_versions = Channel.empty() + + // Add fasta and region to bam channel + ch_input_region = ch_bam + .combine(ch_region) + .map{ metaI, bam, index, metaCR, region -> + [ metaI + metaCR, bam, index, region, [] ] + } + + // Extract region of interest + SAMTOOLS_VIEW( + ch_input_region, + ch_fasta.map{ metaG, fasta, fai -> [metaG, fasta] }, + [] + ) + ch_versions = ch_versions.mix(SAMTOOLS_VIEW.out.versions.first()) + + // Index region of interest + SAMTOOLS_INDEX(SAMTOOLS_VIEW.out.bam) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions.first()) + + ch_bam_region = SAMTOOLS_VIEW.out.bam + .combine(SAMTOOLS_INDEX.out.bai, by: 0) + + emit: + bam_region = ch_bam_region // channel: [ metaIGCR, bam, index ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/compute_gl/main.nf b/subworkflows/local/compute_gl/main.nf new file mode 100644 index 00000000..b11623d4 --- /dev/null +++ b/subworkflows/local/compute_gl/main.nf @@ -0,0 +1,52 @@ +include { BCFTOOLS_MPILEUP } from '../../../modules/nf-core/bcftools/mpileup/main.nf' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index/main.nf' +include { BCFTOOLS_ANNOTATE } from '../../../modules/nf-core/bcftools/annotate/main.nf' + +workflow COMPUTE_GL { + + take: + ch_input // channel: [ [id, chr, region], bam, bai ] + ch_target // channel: [ [panel, chr], sites, tsv] + ch_fasta // channel: [ [ref], fasta, fai] + + main: + + ch_versions = Channel.empty() + ch_multiqc_files = Channel.empty() + + ch_mpileup = ch_input + .map{metaICR, bam, bai -> [metaICR.subMap("chr"), metaICR, bam, bai]} + .combine(ch_target.map{metaPC, sites, tsv -> [metaPC.subMap("chr"), metaPC, sites, tsv]}, by:0) + .map{metaC, metaICR, bam, bai, metaPC, sites, tsv -> + [metaICR + metaPC, bam, sites, tsv] + } + + BCFTOOLS_MPILEUP( + ch_mpileup, + ch_fasta, + false + ) + ch_versions = ch_versions.mix(BCFTOOLS_MPILEUP.out.versions.first()) + + // Annotate the variants + BCFTOOLS_ANNOTATE(BCFTOOLS_MPILEUP.out.vcf + .join(BCFTOOLS_MPILEUP.out.tbi) + .combine(Channel.of([[], [], [], []])) + ) + ch_versions = ch_versions.mix(BCFTOOLS_ANNOTATE.out.versions.first()) + + // Index annotated VCF + BCFTOOLS_INDEX(BCFTOOLS_ANNOTATE.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX.out.versions.first()) + + // Output + ch_output = BCFTOOLS_ANNOTATE.out.vcf + .join(BCFTOOLS_INDEX.out.tbi) + + ch_multiqc_files = ch_multiqc_files.mix(BCFTOOLS_MPILEUP.out.stats.map{ it[1] }) + + emit: + vcf = ch_output // channel: [ [id, panel], vcf, tbi ] + versions = ch_versions // channel: [ versions.yml ] + multiqc_files = ch_multiqc_files +} diff --git a/subworkflows/local/get_region/main.nf b/subworkflows/local/get_region/main.nf new file mode 100644 index 00000000..58f84e10 --- /dev/null +++ b/subworkflows/local/get_region/main.nf @@ -0,0 +1,33 @@ +include { SAMTOOLS_FAIDX } from '../../../modules/nf-core/samtools/faidx/main' + +workflow GET_REGION { + take: + input_region // Region string to use ["all", "chr1", "chr1:0-1000"] + ch_fasta // [[meta], fasta, fai] + + main: + ch_versions = Channel.empty() + + // Gather regions to use and create the meta map + if (input_region ==~ '^(chr)?[0-9XYM]+$' || input_region == "all") { + ch_regions = ch_fasta.map{it -> it[2]} + .splitCsv(header: ["chr", "size", "offset", "lidebase", "linewidth", "qualoffset"], sep: "\t") + .map{it -> [chr:it.chr, region:"0-"+it.size]} + if (input_region != "all") { + ch_regions = ch_regions.filter{it.chr == input_region} + } + ch_regions = ch_regions + .map{ [[chr: it.chr, region: it.chr + ":" + it.region], it.chr + ":" + it.region]} + } else { + if (input_region ==~ '^chr[0-9XYM]+:[0-9]+-[0-9]+$') { + ch_regions = Channel.from([input_region]) + .map{ [[chr: it.split(":")[0], "region": it], it]} + } else { + error "Invalid input_region: ${input_region}" + } + } + + emit: + regions = ch_regions // channel: [ meta, region ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/get_region/tests/main.workflow.nf.test b/subworkflows/local/get_region/tests/main.workflow.nf.test new file mode 100644 index 00000000..9561b781 --- /dev/null +++ b/subworkflows/local/get_region/tests/main.workflow.nf.test @@ -0,0 +1,74 @@ +nextflow_workflow { + + name "Test Workflow GET_REGION" + script "../main.nf" + workflow "GET_REGION" + tag 'subworkflows' + tag 'get_region' + tag 'subworkflows/get_region' + + test("Should run with 'all'") { + + when { + workflow { + """ + input[0] = "all" + input[1] = Channel.of([ + [genome:"GRCh37"], + file("https://raw.githubusercontent.com/LouisLeNezet/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.s.fa", checkIfExists: true), + file("https://raw.githubusercontent.com/LouisLeNezet/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.s.fa.fai", checkIfExists: true) + ]) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out.regions).match() } + ) + } + } + + test("Should run with specified chr") { + + when { + workflow { + """ + input[0] = "chr22" + input[1] = Channel.of([ + [genome:"GRCh37"], + file("https://raw.githubusercontent.com/LouisLeNezet/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.s.fa", checkIfExists: true), + file("https://raw.githubusercontent.com/LouisLeNezet/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.s.fa.fai", checkIfExists: true) + ]) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out.regions).match() } + ) + } + } + + test("Should run with specified region without fasta") { + + when { + workflow { + """ + input[0] = "chr22:0-4000" + input[1] = Channel.of([[],[],[]]) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out.regions).match() } + ) + } + } +} diff --git a/subworkflows/local/get_region/tests/main.workflow.nf.test.snap b/subworkflows/local/get_region/tests/main.workflow.nf.test.snap new file mode 100644 index 00000000..563a6f5d --- /dev/null +++ b/subworkflows/local/get_region/tests/main.workflow.nf.test.snap @@ -0,0 +1,63 @@ +{ + "Should run with specified region without fasta": { + "content": [ + [ + [ + { + "chr": "chr22", + "region": "chr22:0-4000" + }, + "chr22:0-4000" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-17T15:20:16.458964814" + }, + "Should run with specified chr": { + "content": [ + [ + [ + { + "chr": "chr22", + "region": "chr22:16570000-16610000" + }, + "chr22:16570000-16610000" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-17T15:20:11.51328291" + }, + "Should run with 'all'": { + "content": [ + [ + [ + { + "chr": "chr21", + "region": "chr21:16570000-16610000" + }, + "chr21:16570000-16610000" + ], + [ + { + "chr": "chr22", + "region": "chr22:16570000-16610000" + }, + "chr22:16570000-16610000" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-17T15:20:06.490072121" + } +} diff --git a/subworkflows/local/get_region/tests/tags.yml b/subworkflows/local/get_region/tests/tags.yml new file mode 100644 index 00000000..d1ff83bf --- /dev/null +++ b/subworkflows/local/get_region/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/get_region: + - subworkflows/local/get_region/** diff --git a/subworkflows/local/impute_quilt/impute_quilt.nf b/subworkflows/local/impute_quilt/impute_quilt.nf new file mode 100644 index 00000000..5bba1604 --- /dev/null +++ b/subworkflows/local/impute_quilt/impute_quilt.nf @@ -0,0 +1,73 @@ +include { QUILT_QUILT } from '../../../modules/nf-core/quilt/quilt' +include { BCFTOOLS_ANNOTATE } from '../../../modules/nf-core/bcftools/annotate' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_1 } from '../../../modules/nf-core/bcftools/index' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_2 } from '../../../modules/nf-core/bcftools/index' + + +workflow IMPUTE_QUILT { + + take: + ch_hap_legend // channel: [ [panel, chr], hap, legend ] + ch_input // channel: [ [id, chr], bam, bai ] + ch_chunks // channel: [ [panel, chr], start_coordinate, end_coordinate, number ] + + + main: + + ch_versions = Channel.empty() + + posfile = [] + phasefile = [] + posfile_phasefile = [[id: null], posfile, phasefile] + genetic_map_file = [] + fasta = [[id:'test'], []] + + ngen = params.ngen + buffer = params.buffer + + + if (genetic_map_file.isEmpty()) { + ch_hap_chunks = ch_hap_legend.combine(ch_chunks, by:0).map { it + ngen + buffer + [[]] } + } else { + // Add ngen and buffer + genetic map file (untested) + ch_hap_chunks = ch_hap_legend.join(ch_chunks, by:0).join(genetic_map_file) + } + + ch_quilt = ch_input + .map{ metaIC, bam, bai -> [metaIC.subMap("chr"), metaIC, bam, bai]} + .combine(ch_hap_chunks + .map{ metaIC, hap, legend, chr, start, end, ngen, buffer, gmap -> + [metaIC.subMap("chr"), metaIC, hap, legend, chr, start, end, ngen, buffer, gmap] + }, by:0 + ) + .map { + metaC, metaIC, bam, bai, metaPC, hap, legend, chr, start, end, ngen, buffer, gmap -> + [metaIC + ["panel": metaPC.id], bam, bai, hap, legend, chr, start, end, ngen, buffer, gmap] + } + + // Run QUILT + QUILT_QUILT ( ch_quilt, posfile_phasefile, fasta ) + ch_versions = ch_versions.mix(QUILT_QUILT.out.versions.first()) + + // Index imputed VCF + BCFTOOLS_INDEX_1(QUILT_QUILT.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX_1.out.versions.first()) + + // Annotate the variants + BCFTOOLS_ANNOTATE(QUILT_QUILT.out.vcf + .join(BCFTOOLS_INDEX_1.out.tbi) + .combine(Channel.of([[], [], [], []])) + ) + ch_versions = ch_versions.mix(BCFTOOLS_ANNOTATE.out.versions.first()) + + // Index imputed annotated VCF + BCFTOOLS_INDEX_2(BCFTOOLS_ANNOTATE.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX_2.out.versions.first()) + + // Join VCFs and TBIs + ch_vcf_tbi = BCFTOOLS_ANNOTATE.out.vcf.join(BCFTOOLS_INDEX_2.out.tbi) + + emit: + vcf_tbi = ch_vcf_tbi // channel: [ meta, vcf, tbi ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/make_chunks/make_chunks.nf b/subworkflows/local/make_chunks/make_chunks.nf new file mode 100644 index 00000000..5c4319dc --- /dev/null +++ b/subworkflows/local/make_chunks/make_chunks.nf @@ -0,0 +1,29 @@ +include { GLIMPSE_CHUNK } from '../../../modules/nf-core/glimpse/chunk/main' + +workflow MAKE_CHUNKS { + + take: + ch_reference // channel: [ val(meta),vcf ] + + main: + + ch_versions = Channel.empty() + + // Make chunks + ch_vcf_csi_chr = ch_reference.map{meta, vcf, csi -> [meta, vcf, csi, meta.chr]} + GLIMPSE_CHUNK(ch_vcf_csi_chr) + ch_versions = ch_versions.mix(GLIMPSE_CHUNK.out.versions) + + // Rearrange chunks into channel for QUILT + ch_chunks = GLIMPSE_CHUNK.out.chunk_chr + .splitText() + .map { metamap, line -> + def fields = line.split("\t") + def startEnd = fields[2].split(':')[1].split('-') + [metamap, metamap.chr, startEnd[0], startEnd[1]] + } + + emit: + chunks = ch_chunks // channel: [ chr, val(meta), start, end, number ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/prepare_input_stitch/prepare_input_stitch.nf b/subworkflows/local/prepare_input_stitch/prepare_input_stitch.nf new file mode 100644 index 00000000..6b2e7a37 --- /dev/null +++ b/subworkflows/local/prepare_input_stitch/prepare_input_stitch.nf @@ -0,0 +1,51 @@ +workflow PREPARE_INPUT_STITCH { + + take: + ch_posfile + ch_fasta + ch_input_impute + + main: + + ch_versions = Channel.empty() + + // Value channels + def input_empty = [[]] + def rdata_empty = [[]] + k_val = params.k_val + ngen = params.ngen + + // Get chromosomes of posfile + ch_posfile = ch_posfile.map{meta, posfile -> return[['chr': meta.chr], posfile]} + + // Get chromosomes of fasta + ch_chromosomes = ch_fasta.map{it -> it[2]} + .splitCsv(header: ["chr", "size", "offset", "lidebase", "linewidth", "qualoffset"], sep: "\t") + .map{it -> return [[chr: it.chr], it.chr]} + + // Make final channel with parameters + stitch_parameters = ch_posfile.map { it + input_empty + rdata_empty} + .join(ch_chromosomes) + .map { it + k_val + ngen} + + // Prepare sample files for STITCH + // Group input by ID + ch_bam_bai = ch_input_impute.map {meta, bam, bai -> [[meta.id], bam, bai]}.unique() + + // Make bamlist from bam input + ch_bamlist = ch_bam_bai + .map {it[1].tokenize('/').last()} + .collectFile(name: "bamlist.txt", newLine: true, sort: true) + + // Collect all files + stitch_samples = ch_bam_bai.map {meta, bam, bai -> [["id": "all_samples"], bam, bai]} + .groupTuple() + .combine(ch_bamlist) + .collect() + + emit: + stitch_parameters + stitch_samples + versions = ch_versions // channel: [ versions.yml ] + +} diff --git a/subworkflows/local/prepare_input_stitch/prepare_posfile_tsv.nf b/subworkflows/local/prepare_input_stitch/prepare_posfile_tsv.nf new file mode 100644 index 00000000..0612d9bc --- /dev/null +++ b/subworkflows/local/prepare_input_stitch/prepare_posfile_tsv.nf @@ -0,0 +1,26 @@ +include { BCFTOOLS_QUERY } from '../../../modules/nf-core/bcftools/query/main' +include { GAWK } from '../../../modules/nf-core/gawk' + + +workflow PREPARE_POSFILE_TSV { + + take: + ch_panel_sites + ch_fasta + + main: + + ch_versions = Channel.empty() + + // Convert position file to tab-separated file + BCFTOOLS_QUERY(ch_panel_sites, [], [], []) + ch_posfile = BCFTOOLS_QUERY.out.output + + // Remove multiallelic positions from tsv + GAWK(ch_posfile, []) + + emit: + posfile = GAWK.out.output // channel: [ [id, chr], txt ] + versions = ch_versions // channel: [ versions.yml ] + +} diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf index 5e816a2b..6961c1d6 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf @@ -19,6 +19,8 @@ include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { GET_REGION } from '../get_region' +include { SAMTOOLS_FAIDX } from '../../../modules/nf-core/samtools/faidx' /* ======================================================================================== @@ -39,7 +41,7 @@ workflow PIPELINE_INITIALISATION { main: - ch_versions = Channel.empty() + ch_versions = Channel.empty() // // Print version and exit if required and dump pipeline parameters to JSON file @@ -77,32 +79,158 @@ workflow PIPELINE_INITIALISATION { // validateInputParameters() + // + // Create fasta channel + // + genome = params.genome ? params.genome : file(params.fasta, checkIfExists:true).getBaseName() + if (params.genome) { + genome = params.genome + ch_fasta = Channel.of([[genome:genome], getGenomeAttribute('fasta')]) + fai = getGenomeAttribute('fai') + if (fai == null) { + SAMTOOLS_FAIDX(ch_fasta, Channel.of([[], []])) + ch_versions = ch_versions.mix(SAMTOOLS_FAIDX.out.versions.first()) + fai = SAMTOOLS_FAIDX.out.fai.map{ it[1] } + } + } else if (params.fasta) { + genome = file(params.fasta, checkIfExists:true).getBaseName() + ch_fasta = Channel.of([[genome:genome], file(params.fasta, checkIfExists:true)]) + if (params.fasta_fai) { + fai = file(params.fasta_fai, checkIfExists:true) + } else { + SAMTOOLS_FAIDX(ch_fasta, Channel.of([[], []])) + ch_versions = ch_versions.mix(SAMTOOLS_FAIDX.out.versions.first()) + fai = SAMTOOLS_FAIDX.out.fai.map{ it[1] } + } + } + ch_ref_gen = ch_fasta.combine(fai).collect() + // // Create channel from input file provided through params.input // - Channel + ch_input = Channel .fromSamplesheet("input") .map { - meta, fastq_1, fastq_2 -> - if (!fastq_2) { - return [ meta.id, meta + [ single_end:true ], [ fastq_1 ] ] - } else { - return [ meta.id, meta + [ single_end:false ], [ fastq_1, fastq_2 ] ] + meta, file, index -> + [ meta, file, index ] + } + + // Check if all extension are identical + getAllFilesExtension(ch_input) + // + // Create channel from input file provided through params.input_truth + // + if (params.input_truth) { + if (params.input_truth.endsWith("csv")) { + ch_input_truth = Channel + .fromSamplesheet("input_truth") + .map { + meta, file, index -> + [ meta, file, index ] } + // Check if all extension are identical + getAllFilesExtension(ch_input_truth) + } else { + // #TODO Wait for `oneOf()` to be supported in the nextflow_schema.json + error "Panel file provided is of another format than CSV (not yet supported). Please separate your panel by chromosome and use the samplesheet format." } - .groupTuple() - .map { - validateInputSamplesheet(it) + } else { + ch_input_truth = Channel.empty() + } + + // + // Create channel for panel + // + if (params.panel) { + if (params.panel.endsWith("csv")) { + print("Panel file provided as input is a samplesheet") + ch_panel = Channel.fromSamplesheet("panel") + } else { + // #TODO Wait for `oneOf()` to be supported in the nextflow_schema.json + error "Panel file provided is of another format than CSV (not yet supported). Please separate your panel by chromosome and use the samplesheet format." } - .map { - meta, fastqs -> - return [ meta, fastqs.flatten() ] + } else { + // #TODO check if panel is required + ch_panel = Channel.of([[],[],[]]) + } + + // + // Create channel from region input + // + if (params.input_region == null){ + // #TODO Add support for string input + GET_REGION ( + "all", + ch_ref_gen + ) + ch_versions = ch_versions.mix(GET_REGION.out.versions) + ch_regions = GET_REGION.out.regions + } else if (params.input_region.endsWith(".csv")) { + println "Region file provided as input is a csv file" + ch_regions = Channel.fromSamplesheet("input_region") + .map{ chr, start, end -> [["chr": chr], chr + ":" + start + "-" + end]} + .map{ metaC, region -> [metaC + ["region": region], region]} + } else { + error "Region file provided is of another format than CSV (not yet supported). Please separate your reference genome by chromosome and use the samplesheet format." + } + + // + // Create map channel + // + if (params.map) { + if (params.map.endsWith(".csv")) { + print("Map file provided as input is a samplesheet") + ch_map = Channel.fromSamplesheet("map") + } else { + error "Map file provided is of another format than CSV (not yet supported). Please separate your reference genome by chromosome and use the samplesheet format." } - .set { ch_samplesheet } + } else { + ch_map = ch_regions + .map{ metaCR, regions -> [metaCR.subMap("chr"), []] } + } + + // + // Create depth channel + // + if (params.depth) { + ch_depth = Channel.of([[depth: params.depth], params.depth]) + } else { + ch_depth = Channel.of([[],[]]) + } + + // + // Create genotype array channel + // + if (params.genotype) { + ch_genotype = Channel.of([[gparray: params.genotype], params.genotype]) + } else { + ch_genotype = Channel.of([[],[]]) + } + + // + // Create posfile channel + // + + if (params.posfile) { + ch_posfile = Channel + .fromSamplesheet("posfile") + .map { + meta, file -> + [ meta, file ] + }} else { + ch_posfile = [[]] + } emit: - samplesheet = ch_samplesheet - versions = ch_versions + input = ch_input // [ [meta], file, index ] + input_truth = ch_input_truth // [ [meta], file, index ] + fasta = ch_ref_gen // [ [genome], fasta, fai ] + panel = ch_panel // [ [panel, chr], vcf, index ] + depth = ch_depth // [ [depth], depth ] + regions = ch_regions // [ [chr, region], region ] + map = ch_map // [ [map], map ] + posfile = ch_posfile // [ [chr], txt ] + versions = ch_versions } /* @@ -156,21 +284,58 @@ workflow PIPELINE_COMPLETION { // def validateInputParameters() { genomeExistsError() + // Check that only genome or fasta is provided + assert params.genome == null || params.fasta == null, "Either --genome or --fasta must be provided" + assert !(params.genome == null && params.fasta == null), "Only one of --genome or --fasta must be provided" + + // Check that a step is provided + assert params.step, "A step must be provided" + + // Check that at least one tool is provided + if (params.step.split(',').contains("impute") || params.step.split(',').contains("panelprep")) { + assert params.tools, "No tools provided" + } } // -// Validate channels from input samplesheet +// Check if all input files have the same extension // -def validateInputSamplesheet(input) { - def (metas, fastqs) = input[1..2] +def getAllFilesExtension(ch_input) { + files_ext = ch_input + .map { + if (it[1] instanceof String) { + return it[1].split("\\.").last() + } else if (it[1] instanceof Path) { + return it[1].getName().split("\\.").last() + } else if (it[1] instanceof ArrayList) { + if (it[1] == []) { + return null + } else { + error "Array not supported" + } + } else { + println it[1].getClass() + error "Type not supported" + } + } // Extract files extensions + .toList() // Collect extensions into a list + .map { extensions -> + if (extensions.unique().size() != 1) { + println "Extensions: ${extensions}" + error "All input files must have the same extension" + } + return extensions[0] + } +} - // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end - def endedness_ok = metas.collect{ it.single_end }.unique().size == 1 - if (!endedness_ok) { - error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") - } - return [ metas[0], fastqs ] +// +// Validate channels from input samplesheet +// +def validateInputSamplesheet(input) { + def (meta, bam, bai) = input + // Check that individual IDs are unique + // no validation for the moment } // // Get attribute from genome config file e.g. fasta diff --git a/subworkflows/local/vcf_chr_check/main.nf b/subworkflows/local/vcf_chr_check/main.nf new file mode 100644 index 00000000..3c9d5f79 --- /dev/null +++ b/subworkflows/local/vcf_chr_check/main.nf @@ -0,0 +1,76 @@ +include { VCFCHREXTRACT as VCFCHRBFR } from '../../../modules/local/vcfchrextract/main.nf' +include { VCFCHREXTRACT as VCFCHRAFT } from '../../../modules/local/vcfchrextract/main.nf' +include { VCF_CHR_RENAME } from '../vcf_chr_rename/main.nf' + +workflow VCF_CHR_CHECK { + take: + ch_vcf // channel: [ [id], vcf, index ] + ch_fasta // channel: [ [id], fasta, fai ] + + main: + + ch_versions = Channel.empty() + + // Get contig names from the VCF + VCFCHRBFR(ch_vcf.map{ metaV, vcf, csi -> [metaV, vcf] }) + ch_versions = ch_versions.mix(VCFCHRBFR.out.versions) + + // Check if the contig names are the same as the reference + chr_disjoint = check_chr(VCFCHRBFR.out.chr, ch_vcf, ch_fasta) + + if (params.rename_chr == true) { + // Generate the chromosome renaming file + VCF_CHR_RENAME( + chr_disjoint.to_rename.map{meta, vcf, index, nb -> [meta, vcf, index]}, + ch_fasta + ) + ch_versions = ch_versions.mix(VCF_CHR_RENAME.out.versions) + + // Check if modification has solved the problem + VCFCHRAFT(VCF_CHR_RENAME.out.vcf_renamed.map{ metaV, vcf, csi -> [metaV, vcf] }) + ch_versions = ch_versions.mix(VCFCHRAFT.out.versions) + + chr_disjoint_after = check_chr(VCFCHRAFT.out.chr, VCF_CHR_RENAME.out.vcf_renamed, ch_fasta) + + chr_disjoint_after.to_rename.map{ + error 'Even after renaming errors are still present. Please check that contigs name in vcf and fasta file are equivalent.' + } + ch_vcf_renamed = VCF_CHR_RENAME.out.vcf_renamed + + } else { + chr_disjoint.to_rename.map { + error 'Some contig names in the VCF do not match the reference genome. Please set `rename_chr` to `true` to rename the contigs.' + } + ch_vcf_renamed = Channel.empty() + } + + ch_vcf_out = chr_disjoint.no_rename + .map{meta, vcf, csi, chr -> [meta, vcf, csi]} + .mix(ch_vcf_renamed) + + emit: + vcf = ch_vcf_out // [ meta, vcf, csi ] + versions = ch_versions // channel: [ versions.yml ] +} + + +def check_chr(ch_chr, ch_vcf, ch_fasta){ + chr_checked = ch_chr + .combine(ch_vcf, by:0) + .combine(ch_fasta) + .map{metaI, chr, vcf, csi, metaG, fasta, fai -> + [ + metaI, vcf, csi, + chr.readLines()*.split(' ').collect{it[0]}, + fai.readLines()*.split('\t').collect{it[0]} + ] + } + .map { meta, vcf, csi, chr, fai -> + [meta, vcf, csi, (chr-fai).size()] + } + .branch{ + no_rename: it[3] == 0 + to_rename: it[3] > 0 + } + return chr_checked +} diff --git a/subworkflows/local/vcf_chr_check/tests/main.nf.test b/subworkflows/local/vcf_chr_check/tests/main.nf.test new file mode 100644 index 00000000..da76fdc9 --- /dev/null +++ b/subworkflows/local/vcf_chr_check/tests/main.nf.test @@ -0,0 +1,174 @@ +nextflow_workflow { + + name "Test Subworkflow VCF_CHR_CHECK" + script "../main.nf" + + workflow "VCF_CHR_CHECK" + + tag "subworkflows" + tag "subworkflows_local" + tag "subworkflows/vcf_chr_check" + tag "vcf_chr_check" + + tag "bcftools" + tag "bcftools/annotate" + tag "bcftools/index" + tag "gawk" + + test("Rename: panel chr + fasta chr") { + config "./nextflow_rename.config" + when { + workflow { + """ + fai_file = Channel.of('chr22\t10000\t7\t60\t61', 'chr21\t10000\t7\t60\t61').collectFile(name: 'chr21_22.fai', newLine: true) + input[0] = Channel.fromList([ + [ + [id: "chr22"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz.tbi",checkIfExist:true) + ], + [ + [id: "chr21"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi",checkIfExist:true) + ] + ]) + input[1] = Channel.of([[id:"GRCh37"],[]]) + .combine(fai_file) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + } + + test("Rename: panel chr + fasta no chr") { + config "./nextflow_rename.config" + when { + workflow { + """ + fai_file = Channel.of('22\t10000\t7\t60\t61', '21\t10000\t7\t60\t61').collectFile(name: '21_22.fai', newLine: true) + input[0] = Channel.fromList([ + [ + [id: "chr22"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz.tbi",checkIfExist:true) + ], + [ + [id: "chr21"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi",checkIfExist:true) + ] + ]) + input[1] = Channel.of([[id:"GRCh37"],[]]) + .combine(fai_file) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + } + + test("Rename: panel no chr + fasta chr") { + config "./nextflow_rename.config" + when { + workflow { + """ + fai_file = Channel.of( + 'chr1\t10000\t7\t60\t61','chr2\t10000\t7\t60\t61','chr3\t10000\t7\t60\t61','chr4\t10000\t7\t60\t61','chr5\t10000\t7\t60\t61','chr6\t10000\t7\t60\t61', + 'chr7\t10000\t7\t60\t61','chr8\t10000\t7\t60\t61','chr9\t10000\t7\t60\t61','chr10\t10000\t7\t60\t61','chr11\t10000\t7\t60\t61','chr12\t10000\t7\t60\t61', + 'chr13\t10000\t7\t60\t61','chr14\t10000\t7\t60\t61','chr15\t10000\t7\t60\t61','chr16\t10000\t7\t60\t61','chr17\t10000\t7\t60\t61','chr18\t10000\t7\t60\t61', + 'chr19\t10000\t7\t60\t61','chr20\t10000\t7\t60\t61','chr21\t10000\t7\t60\t61','chr22\t10000\t7\t60\t61', + 'chrX\t10000\t7\t60\t61','chrY\t10000\t7\t60\t61', 'chrMT\t10000\t7\t60\t61' + ).collectFile(name: 'chr.fai', newLine: true) + input[0] = Channel.fromList([ + [ + [id: "22"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/test_models.vcf.gz",checkIfExist:true), + [] + ] + ]) + input[1] = Channel.of([[id:"GRCh37"],[]]) + .combine(fai_file) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + } + + test("Error : missing renaming params") { + config "./nextflow.config" + when { + workflow { + """ + input[0] = Channel.fromList([ + [ + [id: "multi"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/test.rnaseq.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/test.rnaseq.vcf.gz.tbi",checkIfExist:true) + ], + [ + [id: "chr21"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi",checkIfExist:true) + ] + ]) + input[1] = Channel.of([[id:"GRCh37"],[], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta.fai",checkIfExist:true) + ]) + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.errorReport.contains("Some contig names in the VCF do not match the reference genome. Please set `rename_chr` to `true` to rename the contigs.")} + ) + } + } + test("Error : still difference after renaming"){ + config "./nextflow_rename.config" + when { + workflow { + """ + input[0] = Channel.fromList([ + [ + [id: "multi"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/NA24385_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/NA24385_sv.vcf.gz.tbi",checkIfExist:true) + ] + ]) // Error due to multiple contigs name in header not present in fasta file + input[1] = Channel.of([ + [id:"GRCh37"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta.fai",checkIfExist:true) + ]) + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.errorReport.contains("Even after renaming errors are still present. Please check that contigs name in vcf and fasta file are equivalent.")} + ) + } + } +} diff --git a/subworkflows/local/vcf_chr_check/tests/main.nf.test.snap b/subworkflows/local/vcf_chr_check/tests/main.nf.test.snap new file mode 100644 index 00000000..10f7f443 --- /dev/null +++ b/subworkflows/local/vcf_chr_check/tests/main.nf.test.snap @@ -0,0 +1,145 @@ +{ + "Rename: panel chr + fasta chr": { + "content": [ + { + "0": [ + [ + { + "id": "chr21" + }, + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz", + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi" + ], + [ + { + "id": "chr22" + }, + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz", + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz.tbi" + ] + ], + "1": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42" + ], + "vcf": [ + [ + { + "id": "chr21" + }, + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz", + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi" + ], + [ + { + "id": "chr22" + }, + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz", + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz.tbi" + ] + ], + "versions": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-27T17:21:13.588561053" + }, + "Rename: panel no chr + fasta chr": { + "content": [ + { + "0": [ + [ + { + "id": "22" + }, + "22_chrrename.vcf.gz:md5,070a96d1053a64f2de2132ee8800847c", + "22_chrrename.vcf.gz.csi:md5,e190b690b4b0a4d088231862e5408582" + ] + ], + "1": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42", + "versions.yml:md5,e576f40503c3506c782228485d06fbf1" + ], + "vcf": [ + [ + { + "id": "22" + }, + "22_chrrename.vcf.gz:md5,070a96d1053a64f2de2132ee8800847c", + "22_chrrename.vcf.gz.csi:md5,e190b690b4b0a4d088231862e5408582" + ] + ], + "versions": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42", + "versions.yml:md5,e576f40503c3506c782228485d06fbf1" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-27T17:21:39.92481538" + }, + "Rename: panel chr + fasta no chr": { + "content": [ + { + "0": [ + [ + { + "id": "chr21" + }, + "chr21_chrrename.vcf.gz:md5,22785a5d7ec1132f766efae5f8e00adf", + "chr21_chrrename.vcf.gz.csi:md5,b5b5fd753ee54ebd3c8e4b1fe2261cdb" + ], + [ + { + "id": "chr22" + }, + "chr22_chrrename.vcf.gz:md5,23de9b4db1406415806e627969cec749", + "chr22_chrrename.vcf.gz.csi:md5,ba370ca13289fee4be59253a1f4609e2" + ] + ], + "1": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42", + "versions.yml:md5,e576f40503c3506c782228485d06fbf1" + ], + "vcf": [ + [ + { + "id": "chr21" + }, + "chr21_chrrename.vcf.gz:md5,22785a5d7ec1132f766efae5f8e00adf", + "chr21_chrrename.vcf.gz.csi:md5,b5b5fd753ee54ebd3c8e4b1fe2261cdb" + ], + [ + { + "id": "chr22" + }, + "chr22_chrrename.vcf.gz:md5,23de9b4db1406415806e627969cec749", + "chr22_chrrename.vcf.gz.csi:md5,ba370ca13289fee4be59253a1f4609e2" + ] + ], + "versions": [ + "versions.yml:md5,395e1cde3f38a30f5d80769972ba23d8", + "versions.yml:md5,ad4c5338cd27e20789c70e28b8c74a42", + "versions.yml:md5,e576f40503c3506c782228485d06fbf1" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-27T17:21:28.214969089" + } +} \ No newline at end of file diff --git a/subworkflows/local/vcf_chr_check/tests/nextflow.config b/subworkflows/local/vcf_chr_check/tests/nextflow.config new file mode 100644 index 00000000..ff02f295 --- /dev/null +++ b/subworkflows/local/vcf_chr_check/tests/nextflow.config @@ -0,0 +1,14 @@ +params { + max_memory = '7.GB' + rename_chr = false +} + +process { + withName: BCFTOOLS_ANNOTATE { + ext.args = [ + "-Oz", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_chrrename" } + } +} diff --git a/subworkflows/local/vcf_chr_check/tests/nextflow_rename.config b/subworkflows/local/vcf_chr_check/tests/nextflow_rename.config new file mode 100644 index 00000000..d048cbcb --- /dev/null +++ b/subworkflows/local/vcf_chr_check/tests/nextflow_rename.config @@ -0,0 +1,14 @@ +params { + max_memory = '7.GB' + rename_chr = true +} + +process { + withName: BCFTOOLS_ANNOTATE { + ext.args = [ + "-Oz", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_chrrename" } + } +} diff --git a/subworkflows/local/vcf_chr_check/tests/tags.yml b/subworkflows/local/vcf_chr_check/tests/tags.yml new file mode 100644 index 00000000..d090629e --- /dev/null +++ b/subworkflows/local/vcf_chr_check/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/vcf_chr_check: + - subworkflows/local/vcf_chr_check/** diff --git a/subworkflows/local/vcf_chr_rename/main.nf b/subworkflows/local/vcf_chr_rename/main.nf new file mode 100644 index 00000000..20c2e967 --- /dev/null +++ b/subworkflows/local/vcf_chr_rename/main.nf @@ -0,0 +1,40 @@ +include { BCFTOOLS_ANNOTATE } from '../../../modules/nf-core/bcftools/annotate' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index' +include { GAWK as FAITOCHR } from '../../../modules/nf-core/gawk' + +workflow VCF_CHR_RENAME { + take: + ch_vcf // channel: [ [id], vcf, index ] + ch_fasta // channel: [ [id], fasta, fai ] + + main: + + ch_versions = Channel.empty() + + // Generate the chromosome renaming file + FAITOCHR( + ch_fasta.map{ metaG, fasta, fai -> [metaG, fai] }, + Channel.of( + 'BEGIN {FS="\\t"} NR==1 { if ($1 ~ /^chr/) { col1=""; col2="chr" } else { col1="chr"; col2="" } } { sub(/^chr/, "", $1); if ($1 ~ /^[0-9]+|[XYMT]$/) print col1$1, col2$1; else print $1, $1 }' + ).collectFile(name:"program.txt") + ) + ch_versions = ch_versions.mix(FAITOCHR.out.versions) + + // Rename the chromosome without prefix + BCFTOOLS_ANNOTATE( + ch_vcf // channel: [ [id], vcf, index ] + .combine(Channel.of([[],[],[]])) + .combine(FAITOCHR.out.output.map{it[1]}) + ) + ch_versions = ch_versions.mix(BCFTOOLS_ANNOTATE.out.versions.first()) + + BCFTOOLS_INDEX(BCFTOOLS_ANNOTATE.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX.out.versions.first()) + + ch_vcf_renamed = BCFTOOLS_ANNOTATE.out.vcf + .combine(BCFTOOLS_INDEX.out.csi, by:0) + + emit: + vcf_renamed = ch_vcf_renamed // [ meta, vcf, csi ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/vcf_chr_rename/tests/main.nf.test b/subworkflows/local/vcf_chr_rename/tests/main.nf.test new file mode 100644 index 00000000..d8d9c4e4 --- /dev/null +++ b/subworkflows/local/vcf_chr_rename/tests/main.nf.test @@ -0,0 +1,52 @@ +nextflow_workflow { + + name "Test Subworkflow VCF_CHR_RENAME" + script "../main.nf" + + config "./nextflow.config" + + workflow "VCF_CHR_RENAME" + + tag "subworkflows" + tag "subworkflows_local" + tag "subworkflows/vcf_chr_rename" + tag "vcf_chr_rename" + + tag "bcftools" + tag "bcftools/annotate" + tag "bcftools/index" + tag "gawk" + + test("Should run without error") { + when { + workflow { + """ + input[0] = Channel.fromList([ + [ + [id: "multi"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/NA24385_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/NA24385_sv.vcf.gz.tbi",checkIfExist:true) + ], + [ + [id: "chr21"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi",checkIfExist:true) + ] + ]) + input[1] = Channel.of([ + [id:"GRCh37"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa",checkIfExist:true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21_22/hs38DH.chr21_22.fa.fai",checkIfExist:true) + ]) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + } +} diff --git a/subworkflows/local/vcf_chr_rename/tests/main.nf.test.snap b/subworkflows/local/vcf_chr_rename/tests/main.nf.test.snap new file mode 100644 index 00000000..52c5f8fe --- /dev/null +++ b/subworkflows/local/vcf_chr_rename/tests/main.nf.test.snap @@ -0,0 +1,55 @@ +{ + "Should run without error": { + "content": [ + { + "0": [ + [ + { + "id": "chr21" + }, + "chr21_chrrename.vcf.gz:md5,39cd8e316cd9b9282b8289d69d81260b", + "chr21_chrrename.vcf.gz.csi:md5,3bbbb50b0dd3515d380eabe0013cde19" + ], + [ + { + "id": "multi" + }, + "multi_chrrename.vcf.gz:md5,5f6f1ca261270d55eec054368f3d9587", + "multi_chrrename.vcf.gz.csi:md5,5d175780d5611d962430bff3377f649f" + ] + ], + "1": [ + "versions.yml:md5,176431a832f84d4c329f6d1e9c74d203", + "versions.yml:md5,260c4004a4bb0936c43f932e50de9c19", + "versions.yml:md5,3698013e288e15d392e1cd3e22d2022a" + ], + "vcf_renamed": [ + [ + { + "id": "chr21" + }, + "chr21_chrrename.vcf.gz:md5,39cd8e316cd9b9282b8289d69d81260b", + "chr21_chrrename.vcf.gz.csi:md5,3bbbb50b0dd3515d380eabe0013cde19" + ], + [ + { + "id": "multi" + }, + "multi_chrrename.vcf.gz:md5,5f6f1ca261270d55eec054368f3d9587", + "multi_chrrename.vcf.gz.csi:md5,5d175780d5611d962430bff3377f649f" + ] + ], + "versions": [ + "versions.yml:md5,176431a832f84d4c329f6d1e9c74d203", + "versions.yml:md5,260c4004a4bb0936c43f932e50de9c19", + "versions.yml:md5,3698013e288e15d392e1cd3e22d2022a" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-27T17:18:53.771496074" + } +} \ No newline at end of file diff --git a/subworkflows/local/vcf_chr_rename/tests/nextflow.config b/subworkflows/local/vcf_chr_rename/tests/nextflow.config new file mode 100644 index 00000000..cf2f7a63 --- /dev/null +++ b/subworkflows/local/vcf_chr_rename/tests/nextflow.config @@ -0,0 +1,13 @@ +params { + max_memory = '7.GB' +} + +process { + withName: BCFTOOLS_ANNOTATE { + ext.args = [ + "-Oz", + "--no-version" + ].join(' ') + ext.prefix = { "${meta.id}_chrrename" } + } +} diff --git a/subworkflows/local/vcf_chr_rename/tests/tags.yml b/subworkflows/local/vcf_chr_rename/tests/tags.yml new file mode 100644 index 00000000..f75be2bf --- /dev/null +++ b/subworkflows/local/vcf_chr_rename/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/vcf_chr_rename: + - subworkflows/local/vcf_chr_rename/** diff --git a/subworkflows/local/vcf_concatenate_bcftools/main.nf b/subworkflows/local/vcf_concatenate_bcftools/main.nf new file mode 100644 index 00000000..583b6070 --- /dev/null +++ b/subworkflows/local/vcf_concatenate_bcftools/main.nf @@ -0,0 +1,33 @@ +include { BCFTOOLS_CONCAT } from '../../../modules/nf-core/bcftools/concat' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index' + +workflow VCF_CONCATENATE_BCFTOOLS { + + take: + ch_vcf_tbi // channel: [ val(meta), vcf, tbi ] + + main: + + ch_versions = Channel.empty() + + // Remove chromosome from meta + ch_vcf_tbi_grouped = ch_vcf_tbi.map{ meta, vcf, tbi -> [['id' : meta.id], vcf, tbi] } + + // Group by ID + ch_vcf_tbi_grouped = ch_vcf_tbi_grouped.groupTuple( by:0 ) + + // Ligate and concatenate chunks + BCFTOOLS_CONCAT(ch_vcf_tbi_grouped) + ch_versions = ch_versions.mix(BCFTOOLS_CONCAT.out.versions.first()) + + // Index concatenated VCF + BCFTOOLS_INDEX(BCFTOOLS_CONCAT.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX.out.versions.first()) + + // Join VCFs and TBIs + ch_vcf_tbi_join = BCFTOOLS_CONCAT.out.vcf.join(BCFTOOLS_INDEX.out.tbi) + + emit: + vcf_tbi_join = ch_vcf_tbi_join // channel: [ meta, vcf, tbi ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/vcf_concordance_glimpse2/main.nf b/subworkflows/local/vcf_concordance_glimpse2/main.nf new file mode 100644 index 00000000..bc76d0f2 --- /dev/null +++ b/subworkflows/local/vcf_concordance_glimpse2/main.nf @@ -0,0 +1,60 @@ +include { GLIMPSE2_CONCORDANCE } from '../../../modules/nf-core/glimpse2/concordance' +include { GAWK as CONCATENATE } from '../../../modules/nf-core/gawk' +include { ADD_COLUMNS } from '../../../modules/local/addcolumns' +include { GUNZIP } from '../../../modules/nf-core/gunzip' + +workflow VCF_CONCORDANCE_GLIMPSE2 { + + take: + ch_vcf_emul // VCF file with imputed genotypes [[id, chr, region, panel, simulate, tools], vcf, csi] + ch_vcf_truth // VCF file with truth genotypes [[id, chr, region], vcf, csi] + ch_vcf_freq // VCF file with panel frequencies [[panel, chr], vcf, csi] + ch_region // Regions to process [[chr, region], region] + + main: + + ch_versions = Channel.empty() + ch_multiqc_files = Channel.empty() + + ch_concordance = ch_vcf_emul + .join(ch_vcf_truth) + .combine(ch_vcf_freq) + .combine(ch_region.map{[it[1]]}.collect().toList()) + .map{metaI, emul, e_csi, truth, t_csi, metaP, freq, f_csi, regions -> + [metaI, emul, e_csi, truth, t_csi, freq, f_csi, [], regions] + } + + GLIMPSE2_CONCORDANCE ( + ch_concordance, + [[], [], params.bins, [], []], + params.min_val_gl, params.min_val_dp + ) + ch_versions = ch_versions.mix(GLIMPSE2_CONCORDANCE.out.versions.first()) + + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.errors_cal.map{meta, txt -> [txt]}) + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.errors_grp.map{meta, txt -> [txt]}) + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.errors_spl.map{meta, txt -> [txt]}) + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.rsquare_grp.map{meta, txt -> [txt]}) + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.rsquare_spl.map{meta, txt -> [txt]}) + ch_multiqc_files = ch_multiqc_files.mix(GLIMPSE2_CONCORDANCE.out.rsquare_per_site.map{meta, txt -> [txt]}) + + GUNZIP(GLIMPSE2_CONCORDANCE.out.errors_grp) + ch_versions = ch_versions.mix(GUNZIP.out.versions.first()) + ADD_COLUMNS(GUNZIP.out.gunzip) + ch_versions = ch_versions.mix(ADD_COLUMNS.out.versions.first()) + + CONCATENATE( + ADD_COLUMNS.out.txt + .map{meta, txt -> [["id":"TestQuality"], txt]} + .groupTuple(), + Channel.of( + '(NR == 1) || (FNR > 1)' + ).collectFile(name:"program.txt") + ) + ch_versions = ch_versions.mix(CONCATENATE.out.versions.first()) + + emit: + stats = CONCATENATE.out.output // [ meta, txt ] + versions = ch_versions // channel: [ versions.yml ] + multiqc_files = ch_multiqc_files +} diff --git a/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test b/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test new file mode 100644 index 00000000..e1aca9a5 --- /dev/null +++ b/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test @@ -0,0 +1,156 @@ +nextflow_workflow { + + name "Test Subworkflow VCF_CONCORDANCE_GLIMPSE2" + script "../main.nf" + config "./nextflow.config" + + workflow "VCF_CONCORDANCE_GLIMPSE2" + + tag "subworkflows" + tag "subworkflows_local" + tag "subworkflows/vcf_concordance_glimpse2" + tag "vcf_concordance_glimpse2" + + tag "bcftools" + tag "bcftools/index" + tag "glimpse" + tag "glimpse/phase" + tag "glimpse/concordance" + + test("vcf_concordance_glimpse2") { + setup { + run("GLIMPSE_PHASE") { + script "../../../../modules/nf-core/glimpse/phase/main.nf" + process { + """ + ch_sample = Channel.of('NA12878 2', 'NA12878_2 2').collectFile(name: 'sampleinfos.txt', newLine: true) + region = Channel.fromList([ + ["chr21:16600000-16750000","chr21:16650000-16700000"] + ]) + input_vcf = Channel.fromList([ + [[ id:'NA12878', chr:'21', region:'chr21:16650000-16700000', panel: '1000GP', depth:'1', tools: 'Glimpse'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true)], + [[ id:'NA12878_2', chr:'21', region:'chr21:16650000-16700000', panel: '1000GP', depth:'0.5', tools: 'Glimpse2'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.1x.vcf.gz.csi", checkIfExists: true)] + ]) + ref_panel = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf", checkIfExists: true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.bcf.csi", checkIfExists: true) + ]) + ch_map = Channel.of([ + file(params.modules_testdata_base_path + "delete_me/glimpse/chr21.b38.gmap.gz", checkIfExists: true), + ]) + + input[0] = input_vcf + | combine(ch_sample) + | combine(region) + | combine(ref_panel) + | combine(ch_map) + """ + } + } + run("BCFTOOLS_INDEX") { + script "../../../../modules/nf-core/bcftools/index/main.nf" + process { + """ + input[0] = GLIMPSE_PHASE.out.phased_variants + """ + } + } + } + when { + workflow { + """ + allele_freq = Channel.of([ + [panel:'1000GP', chr:'21'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.sites.vcf.gz",checkIfExists:true), + file(params.modules_testdata_base_path + "delete_me/glimpse/1000GP.chr21.noNA12878.s.sites.vcf.gz.csi",checkIfExists:true) + ]) + truth = Channel.fromList([ + [[id:'NA12878', chr:'21', region:'chr21:16650000-16700000'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.bcf",checkIfExists:true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.bcf.csi",checkIfExists:true)], + [[id:'NA12878_2', chr:'21', region:'chr21:16650000-16700000'], // meta map + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.bcf",checkIfExists:true), + file(params.modules_testdata_base_path + "delete_me/glimpse/NA12878.chr21.s.bcf.csi",checkIfExists:true)] + ]) + estimate = GLIMPSE_PHASE.out.phased_variants + | join (BCFTOOLS_INDEX.out.csi) + input[0] = estimate + input[1] = truth + input[2] = allele_freq + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + + } + + test("vcf_concordance_glimpse2 direct") { + when { + workflow { + """ + allele_freq = Channel.fromList([ + [ + [panel:'1000GP', chr:'21'], // meta map + file(params.phaseimpute_testdata_path + "panel/21/1000GP.chr21.s.norel.sites.vcf.gz",checkIfExists:true), + file(params.phaseimpute_testdata_path + "panel/21/1000GP.chr21.s.norel.sites.vcf.gz.csi",checkIfExists:true) + ], + [ + [panel:'1000GP', chr:'22'], // meta map + file(params.phaseimpute_testdata_path + "panel/22/1000GP.chr22.s.norel.sites.vcf.gz",checkIfExists:true), + file(params.phaseimpute_testdata_path + "panel/22/1000GP.chr22.s.norel.sites.vcf.gz.csi",checkIfExists:true) + ] + ]) + truth = Channel.fromList([ + [[id:'NA12878', chr:'21', region:'chr21:16570000-16610000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s.bcf.csi",checkIfExists:true)], + [[id:'NA12878', chr:'22', region:'chr22:16570000-16610000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s.bcf.csi",checkIfExists:true)], + [[id:'NA19401', chr:'21', region:'chr21:16570000-16610000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s.bcf.csi",checkIfExists:true)], + [[id:'NA19401', chr:'22', region:'chr22:16570000-16610000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s.bcf.csi",checkIfExists:true)] + ]) + estimate = Channel.fromList([ + [[id:'NA12878', chr:'21', region:'chr21:16650000-16700000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s_imputed.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s_imputed.bcf.csi",checkIfExists:true)], + [[id:'NA12878', chr:'22', region:'chr22:16650000-16700000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s_imputed.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA12878/NA12878.s_imputed.bcf.csi",checkIfExists:true)], + [[id:'NA19401', chr:'21', region:'chr21:16650000-16700000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s_imputed.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s_imputed.bcf.csi",checkIfExists:true)], + [[id:'NA19401', chr:'22', region:'chr22:16650000-16700000'], // meta map + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s_imputed.bcf",checkIfExists:true), + file(params.phaseimpute_testdata_path + "individuals/NA19401/NA19401.s_imputed.bcf.csi",checkIfExists:true)] + ]) + input[0] = estimate + input[1] = truth + input[2] = allele_freq + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out).match() } + ) + } + + } +} diff --git a/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test.snap b/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test.snap new file mode 100644 index 00000000..880fa8a2 --- /dev/null +++ b/subworkflows/local/vcf_concordance_glimpse2/tests/main.nf.test.snap @@ -0,0 +1,58 @@ +{ + "vcf_concordance_glimpse2 direct": { + "content": [ + { + "0": [ + + ], + "1": [ + + ], + "stats": [ + + ], + "versions": [ + + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-12T16:24:33.217544644" + }, + "vcf_concordance_glimpse2": { + "content": [ + { + "0": [ + [ + { + "id": "TestQuality" + }, + "TestQuality.txt:md5,865f1cf1a32256467010c10bfef1fa04" + ] + ], + "1": [ + + ], + "stats": [ + [ + { + "id": "TestQuality" + }, + "TestQuality.txt:md5,865f1cf1a32256467010c10bfef1fa04" + ] + ], + "versions": [ + + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-12T16:22:59.476875738" + } +} \ No newline at end of file diff --git a/subworkflows/local/vcf_concordance_glimpse2/tests/nextflow.config b/subworkflows/local/vcf_concordance_glimpse2/tests/nextflow.config new file mode 100644 index 00000000..8b9e3c3f --- /dev/null +++ b/subworkflows/local/vcf_concordance_glimpse2/tests/nextflow.config @@ -0,0 +1,9 @@ +params { + max_memory = '7.GB' +} + +process { + withName: 'VCF_CONCORDANCE_GLIMPSE2:CONCATENATE' { + ext.suffix = "txt" + } +} diff --git a/subworkflows/local/vcf_concordance_glimpse2/tests/tags.yml b/subworkflows/local/vcf_concordance_glimpse2/tests/tags.yml new file mode 100644 index 00000000..35cfc8a3 --- /dev/null +++ b/subworkflows/local/vcf_concordance_glimpse2/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/vcf_concordance_glimpse2: + - subworkflows/local/vcf_concordance_glimpse2/** diff --git a/subworkflows/local/vcf_normalize_bcftools/vcf_normalize_bcftools.nf b/subworkflows/local/vcf_normalize_bcftools/vcf_normalize_bcftools.nf new file mode 100644 index 00000000..e96e4a02 --- /dev/null +++ b/subworkflows/local/vcf_normalize_bcftools/vcf_normalize_bcftools.nf @@ -0,0 +1,47 @@ +include { BCFTOOLS_NORM } from '../../../modules/nf-core/bcftools/norm/main' +include { BCFTOOLS_VIEW } from '../../../modules/nf-core/bcftools/view/main' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index/main' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_2} from '../../../modules/nf-core/bcftools/index/main' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_3} from '../../../modules/nf-core/bcftools/index/main' +include { BCFTOOLS_CONVERT } from '../../../modules/nf-core/bcftools/convert/main' + + +workflow VCF_NORMALIZE_BCFTOOLS { + take: + ch_vcf // channel: [ [id, chr], vcf, index ] + ch_fasta // channel: [ [genome], fasta, fai ] + + main: + + ch_versions = Channel.empty() + ch_fasta = ch_fasta.map { meta, fasta, fai -> [meta, fasta] } + + // Join duplicated biallelic sites into multiallelic records + BCFTOOLS_NORM(ch_vcf, ch_fasta) + + // Index multiallelic VCF + BCFTOOLS_INDEX(BCFTOOLS_NORM.out.vcf) + + // Join multiallelic VCF and TBI + ch_multiallelic_vcf_tbi = BCFTOOLS_NORM.out.vcf.join(BCFTOOLS_INDEX.out.tbi) + + // Remove all multiallelic records: + BCFTOOLS_VIEW(ch_multiallelic_vcf_tbi, [], [], []) + + // Index biallelic VCF + BCFTOOLS_INDEX_2(BCFTOOLS_VIEW.out.vcf) + + // Join biallelic VCF and TBI + ch_biallelic_vcf_tbi = BCFTOOLS_VIEW.out.vcf.join(BCFTOOLS_INDEX_2.out.tbi) + + // Convert VCF to Hap and Legend files + BCFTOOLS_CONVERT(ch_biallelic_vcf_tbi, ch_fasta, []) + + // Output hap and legend files + ch_hap_legend = BCFTOOLS_CONVERT.out.hap.join(BCFTOOLS_CONVERT.out.legend) + + emit: + vcf_tbi = ch_biallelic_vcf_tbi // channel: [ [id, chr], vcf, tbi ] + hap_legend = ch_hap_legend // channel: [ [id, chr] '.hap', '.legend' ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/vcf_phase_panel/main.nf b/subworkflows/local/vcf_phase_panel/main.nf new file mode 100644 index 00000000..a54ca8e6 --- /dev/null +++ b/subworkflows/local/vcf_phase_panel/main.nf @@ -0,0 +1,40 @@ +include { VCF_PHASE_SHAPEIT5 } from '../../../subworkflows/nf-core/vcf_phase_shapeit5/main' + +workflow VCF_PHASE_PANEL { + take: + ch_vcf // channel: [ [id, chr], vcf, index ] + ch_panel_norm + ch_panel_sites + ch_panel_tsv + + main: + + ch_versions = Channel.empty() + + // Phase panel + if (params.phased == false) { + VCF_PHASE_SHAPEIT5(ch_vcf + .map { meta, vcf, csi -> [meta, vcf, csi, [], meta.region] }, + Channel.of([[],[],[]]).collect(), + Channel.of([[],[],[]]).collect(), + Channel.of([[],[]]).collect()) + ch_versions = ch_versions.mix(VCF_PHASE_SHAPEIT5.out.versions) + ch_panel_phased = VCF_PHASE_SHAPEIT5.out.variants_phased + .combine(VCF_PHASE_SHAPEIT5.out.variants_index, by: 0) + } else { + ch_panel_phased = ch_vcf + } + + ch_panel = ch_panel_norm + .combine(ch_panel_sites, by: 0) + .combine(ch_panel_tsv, by: 0) + .combine(ch_panel_phased, by: 0) + .map{ metaIC, norm, n_index, sites, s_index, tsv, t_index, phased, p_index + -> [[panel:metaIC.id, chr:metaIC.chr ], norm, n_index, sites, s_index, tsv, t_index, phased, p_index] + } + + emit: + vcf_tbi = ch_panel_phased + panel = ch_panel + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/vcf_region/main.nf b/subworkflows/local/vcf_region/main.nf new file mode 100644 index 00000000..d4337793 --- /dev/null +++ b/subworkflows/local/vcf_region/main.nf @@ -0,0 +1,35 @@ +include { BCFTOOLS_VIEW as VIEW_VCF_REGION } from '../../../modules/nf-core/bcftools/view/main.nf' +include { BCFTOOLS_INDEX as VCF_INDEX } from '../../../modules/nf-core/bcftools/index/main.nf' + + +workflow VCF_REGION { + take: + ch_vcf // channel: [ [id], vcf ] + ch_region // channel: [ [region], val(region) ] + ch_fasta // channel: [ fasta ] + + main: + + ch_versions = Channel.empty() + + // Filter the region of interest of the panel file + ch_input_region = ch_vcf + .combine(ch_fasta) + .combine(ch_region) + .map{ metaI, vcf, index, fasta, metaR, region -> + [metaI + metaR, vcf, index, region+",chr"+region]} + + VIEW_VCF_REGION(ch_input_region, [], [], []) + ch_versions = ch_versions.mix(VIEW_VCF_REGION.out.versions.first()) + + VCF_INDEX(VIEW_VCF_REGION.out.vcf) + ch_versions = ch_versions.mix(VCF_INDEX.out.versions.first()) + + ch_vcf_region = VIEW_VCF_REGION.out.vcf + .combine(VCF_INDEX.out.csi) + + emit: + vcf_region = ch_vcf_region // channel: [ metaIR, vcf, index ] + versions = ch_versions // channel: [ versions.yml ] + +} diff --git a/subworkflows/local/vcf_sites_extract_bcftools/main.nf b/subworkflows/local/vcf_sites_extract_bcftools/main.nf new file mode 100644 index 00000000..d3d30674 --- /dev/null +++ b/subworkflows/local/vcf_sites_extract_bcftools/main.nf @@ -0,0 +1,74 @@ +include { BCFTOOLS_VIEW as VIEW_VCF_SNPS } from '../../../modules/nf-core/bcftools/view/main.nf' +include { BCFTOOLS_VIEW as VIEW_VCF_SITES } from '../../../modules/nf-core/bcftools/view/main.nf' +include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index/main.nf' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_2 } from '../../../modules/nf-core/bcftools/index/main.nf' +include { TABIX_BGZIP } from '../../../modules/nf-core/tabix/bgzip/main' +include { TABIX_TABIX } from '../../../modules/nf-core/tabix/tabix/main' +include { BCFTOOLS_QUERY } from '../../../modules/nf-core/bcftools/query/main.nf' +include { BCFTOOLS_QUERY as BCFTOOLS_QUERY_STITCH} from '../../../modules/nf-core/bcftools/query/main.nf' +include { GAWK as GAWK_STITCH } from '../../../modules/nf-core/gawk' + + + +workflow VCF_SITES_EXTRACT_BCFTOOLS { + take: + ch_vcf // channel: [ [id, chr], vcf, index ] + + main: + + ch_versions = Channel.empty() + + // Extract only SNPs from VCF + VIEW_VCF_SNPS(ch_vcf, [], [], []) + ch_versions = ch_versions.mix(VIEW_VCF_SNPS.out.versions.first()) + + // Index SNPs + BCFTOOLS_INDEX(VIEW_VCF_SNPS.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX.out.versions.first()) + + // Join VCF and Index + ch_panel_norm = VIEW_VCF_SNPS.out.vcf.combine(BCFTOOLS_INDEX.out.csi, by:0) + + // Extract sites positions + VIEW_VCF_SITES( ch_panel_norm,[], [], []) + ch_versions = ch_versions.mix(VIEW_VCF_SITES.out.versions.first()) + + // Index extracted sites + BCFTOOLS_INDEX_2(VIEW_VCF_SITES.out.vcf) + ch_versions = ch_versions.mix(BCFTOOLS_INDEX_2.out.versions.first()) + + // Join extracted sites and index + ch_panel_sites = VIEW_VCF_SITES.out.vcf.combine(BCFTOOLS_INDEX_2.out.csi, by:0) + + // Create TSVs for different tools + + // Convert to TSV with structure for Glimpse + BCFTOOLS_QUERY(ch_panel_sites, [], [], []) + ch_versions = ch_versions.mix(BCFTOOLS_QUERY.out.versions.first()) + + // Compress TSV + TABIX_BGZIP(BCFTOOLS_QUERY.out.output) + ch_versions = ch_versions.mix(TABIX_BGZIP.out.versions.first()) + + // Index compressed TSV + TABIX_TABIX(TABIX_BGZIP.out.output) + ch_versions = ch_versions.mix(TABIX_TABIX.out.versions.first()) + + // Join compressed TSV and index + ch_panel_tsv = TABIX_BGZIP.out.output.combine(TABIX_TABIX.out.tbi, by: 0) + + // TSV for STITCH + // Convert position file to tab-separated file + BCFTOOLS_QUERY_STITCH(ch_panel_sites, [], [], []) + ch_posfile = BCFTOOLS_QUERY_STITCH.out.output + + // Remove multiallelic positions from tsv + GAWK_STITCH(ch_posfile, []) + + emit: + panel_tsv = ch_panel_tsv + vcf_tbi = ch_panel_norm + panel_sites = ch_panel_sites + posfile = GAWK_STITCH.out.output + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/nf-core/multiple_impute_glimpse2/main.nf b/subworkflows/nf-core/multiple_impute_glimpse2/main.nf new file mode 100644 index 00000000..b300f771 --- /dev/null +++ b/subworkflows/nf-core/multiple_impute_glimpse2/main.nf @@ -0,0 +1,73 @@ +include { GLIMPSE2_CHUNK } from '../../../modules/nf-core/glimpse2/chunk/main' +include { GLIMPSE2_SPLITREFERENCE } from '../../../modules/nf-core/glimpse2/splitreference/main' +include { GLIMPSE2_PHASE } from '../../../modules/nf-core/glimpse2/phase/main' +include { GLIMPSE2_LIGATE } from '../../../modules/nf-core/glimpse2/ligate/main' +include { BCFTOOLS_INDEX as INDEX_PHASE } from '../../../modules/nf-core/bcftools/index/main.nf' +include { BCFTOOLS_INDEX as INDEX_LIGATE } from '../../../modules/nf-core/bcftools/index/main.nf' + +workflow MULTIPLE_IMPUTE_GLIMPSE2 { + + take: + ch_input // channel (mandatory): [ meta, vcf, csi, infos ] + ch_ref // channel (mandatory): [ meta, vcf, csi, region ] + ch_map // channel (optional): [ meta, map ] + ch_fasta // channel (optional): [ meta, fasta, index ] + chunk_model // string: model used to chunk the reference panel + + main: + + ch_versions = Channel.empty() + + // Chunk reference panel + GLIMPSE2_CHUNK ( ch_ref, ch_map, chunk_model ) + ch_versions = ch_versions.mix( GLIMPSE2_CHUNK.out.versions.first() ) + + chunk_output = GLIMPSE2_CHUNK.out.chunk_chr + .splitCsv(header: ['ID', 'Chr', 'RegionBuf', 'RegionCnk', 'WindowCm', + 'WindowMb', 'NbTotVariants', 'NbComVariants'], + sep: "\t", skip: 0) + .map { meta, it -> [meta, it["RegionBuf"], it["RegionCnk"]]} + + // Split reference panel in bin files + split_input = ch_ref.map{ meta, ref, index, region -> [meta, ref, index]} + .combine(chunk_output, by: 0) + + GLIMPSE2_SPLITREFERENCE( split_input, ch_map ) + ch_versions = ch_versions.mix( GLIMPSE2_SPLITREFERENCE.out.versions.first() ) + + phase_input = ch_input.combine( GLIMPSE2_SPLITREFERENCE.out.bin_ref ) + .map{ input_meta, input_file, input_index, input_infos, + panel_meta, panel_bin -> + [input_meta, input_file, input_index, input_infos, + [], [], panel_bin, [], []] + }/* Remove unnecessary meta maps + add null index as we use a bin file, + add null value for input and output region as we use a bin file */ + + // Phase input files for each reference bin files + indexing + GLIMPSE2_PHASE ( phase_input, ch_fasta ) // [meta, vcf, index, sample_infos, regionin, regionout, regionindex, ref, ref_index, map], [ meta, fasta, index ] + ch_versions = ch_versions.mix( GLIMPSE2_PHASE.out.versions.first() ) + + INDEX_PHASE ( GLIMPSE2_PHASE.out.phased_variant ) + ch_versions = ch_versions.mix( INDEX_PHASE.out.versions.first() ) + + // Ligate all phased files in one and index it + ligate_input = GLIMPSE2_PHASE.out.phased_variant + .groupTuple() + .combine( INDEX_PHASE.out.csi + .groupTuple() + .collect(), by: 0 ) + + GLIMPSE2_LIGATE ( ligate_input ) + ch_versions = ch_versions.mix( GLIMPSE2_LIGATE.out.versions.first() ) + + INDEX_LIGATE ( GLIMPSE2_LIGATE.out.merged_variants ) + ch_versions = ch_versions.mix( INDEX_LIGATE.out.versions.first() ) + + emit: + chunk_chr = GLIMPSE2_CHUNK.out.chunk_chr // channel: [ val(meta), txt ] + merged_variants = GLIMPSE2_LIGATE.out.merged_variants // channel: [ val(meta), bcf ] + merged_variants_index = INDEX_LIGATE.out.csi // channel: [ val(meta), csi ] + + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/nf-core/multiple_impute_glimpse2/meta.yml b/subworkflows/nf-core/multiple_impute_glimpse2/meta.yml new file mode 100644 index 00000000..6fea6251 --- /dev/null +++ b/subworkflows/nf-core/multiple_impute_glimpse2/meta.yml @@ -0,0 +1,64 @@ +name: "multiple_impute_glimpse2" +description: Impute VCF/BCF files, but also CRAM and BAM files with Glimpse2 +keywords: + - glimpse + - chunk + - phase + - ligate + - split_reference +components: + - glimpse2/chunk + - glimpse2/phase + - glimpse2/ligate + - glimpse2/splitreference + - bcftools/index +input: + - ch_input: + type: file + description: | + Target dataset in CRAM, BAM or VCF/BCF format. + Index file of the input file. + File with sample names and ploidy information. + Structure: [ meta, file, index, txt ] + - ch_ref: + type: file + description: | + Reference panel of haplotypes in VCF/BCF format. + Index file of the Reference panel file. + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + Structure: [ meta, vcf, csi, region ] + - ch_map: + type: file + description: | + File containing the genetic map. + Structure: [ meta, gmap ] + - ch_fasta: + type: file + description: | + Reference genome in fasta format. + Reference genome index in fai format + Structure: [ meta, fasta, fai ] +output: + - chunk_chr: + type: file + description: | + Tab delimited output txt file containing buffer and imputation regions. + Structure: [meta, txt] + - merged_variants: + type: file + description: | + Output VCF/BCF file for the merged regions. + Phased information (HS field) is updated accordingly for the full region. + Structure: [ val(meta), bcf ] + - merged_variants_index: + type: file + description: Index file of the ligated phased variants files. + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/subworkflows/nf-core/vcf_impute_glimpse/main.nf b/subworkflows/nf-core/vcf_impute_glimpse/main.nf new file mode 100644 index 00000000..94262e34 --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/main.nf @@ -0,0 +1,59 @@ +include { GLIMPSE_CHUNK } from '../../../modules/nf-core/glimpse/chunk/main' +include { GLIMPSE_PHASE } from '../../../modules/nf-core/glimpse/phase/main' +include { GLIMPSE_LIGATE } from '../../../modules/nf-core/glimpse/ligate/main' +include { BCFTOOLS_INDEX as INDEX_PHASE } from '../../../modules/nf-core/bcftools/index/main.nf' +include { BCFTOOLS_INDEX as INDEX_LIGATE } from '../../../modules/nf-core/bcftools/index/main.nf' + +workflow VCF_IMPUTE_GLIMPSE { + + take: + ch_input // channel (mandatory): [ meta, vcf, csi, sample, region, ref, ref_index, map ] + + main: + + ch_versions = Channel.empty() + + input_chunk = ch_input.map{ + meta, vcf, csi, sample, region, ref, ref_index, map -> + [ meta, vcf, csi, region] + } + + GLIMPSE_CHUNK ( input_chunk ) + ch_versions = ch_versions.mix( GLIMPSE_CHUNK.out.versions ) + + chunk_output = GLIMPSE_CHUNK.out.chunk_chr + .splitCsv(header: ['ID', 'Chr', 'RegionIn', 'RegionOut', 'Size1', 'Size2'], sep: "\t", skip: 0) + .map { meta, it -> [meta, it["RegionIn"], it["RegionOut"]]} + + phase_input = ch_input.map{ meta, vcf, csi, sample, region, ref, ref_index, map -> [meta, vcf, csi, sample, ref, ref_index, map]} + .combine(chunk_output, by: 0) + .map{meta, vcf, csi, sample, ref, ref_index, map, regionin, regionout -> + [meta, vcf, csi, sample, regionin, regionout, ref, ref_index, map]} + + GLIMPSE_PHASE ( phase_input ) // [meta, vcf, index, sample_infos, regionin, regionout, ref, ref_index, map] + ch_versions = ch_versions.mix(GLIMPSE_PHASE.out.versions ) + + INDEX_PHASE ( GLIMPSE_PHASE.out.phased_variants ) + ch_versions = ch_versions.mix( INDEX_PHASE.out.versions ) + + // Ligate all phased files in one and index it + ligate_input = GLIMPSE_PHASE.out.phased_variants + .groupTuple( by: 0 ) + .combine( INDEX_PHASE.out.csi + .groupTuple( by: 0 ), + by: 0 + ) + + GLIMPSE_LIGATE ( ligate_input ) + ch_versions = ch_versions.mix(GLIMPSE_LIGATE.out.versions ) + + INDEX_LIGATE ( GLIMPSE_LIGATE.out.merged_variants ) + ch_versions = ch_versions.mix( INDEX_LIGATE.out.versions ) + + emit: + chunk_chr = GLIMPSE_CHUNK.out.chunk_chr // channel: [ val(meta), txt ] + merged_variants = GLIMPSE_LIGATE.out.merged_variants // channel: [ val(meta), bcf ] + merged_variants_index = INDEX_LIGATE.out.csi // channel: [ val(meta), csi ] + + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/nf-core/vcf_impute_glimpse/meta.yml b/subworkflows/nf-core/vcf_impute_glimpse/meta.yml new file mode 100644 index 00000000..81b3b4d5 --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/meta.yml @@ -0,0 +1,51 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "vcf_impute_glimpse" +description: Impute VCF/BCF files with Glimpse +keywords: + - glimpse + - chunk + - phase + - ligate +components: + - glimpse/chunk + - glimpse/phase + - glimpse/ligate + - bcftools/index +input: + - ch_input: + type: file + description: | + Target dataset in VCF/BCF format defined at all variable positions. + Index file of the input VCF/BCF file containing genotype likelihoods. + File with sample names and ploidy information. + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + Reference panel of haplotypes in VCF/BCF format. + Index file of the Reference panel file. + File containing the genetic map. + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + Structure: [ meta, vcf, csi, txt, region, ref_vcf, ref_csi, gmap ] +output: + - chunk_chr: + type: file + description: | + Tab delimited output txt file containing buffer and imputation regions. + Structure: [meta, txt] + - merged_variants: + type: file + description: | + Output phased VCF/BCF file for the merged regions. + Phased information (HS field) is updated accordingly for the full region. + Structure: [ val(meta), bcf ] + - merged_variants_index: + type: file + description: | + Index output of phased VCF/BCF file for the merged regions. + Structure: [ val(meta), csi ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test b/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test new file mode 100644 index 00000000..46db4244 --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test @@ -0,0 +1,101 @@ +nextflow_workflow { + + name "Test Workflow VCF_IMPUTE_GLIMPSE" + script "../main.nf" + workflow "VCF_IMPUTE_GLIMPSE" + + tag "subworkflows" + tag "bcftools/index" + tag "subworkflows_nfcore" + tag "subworkflows/vcf_impute_glimpse" + tag "glimpse/phase" + tag "glimpse/ligate" + tag "glimpse/chunk" + + test("Should run without failures") { + config "./nextflow.config" + + when { + params { + outdir = "tests/results" + } + workflow { + """ + samples_infos = Channel.of('NA12878 2').collectFile(name: 'sampleinfos.txt') + + ch_panel = Channel.fromList([ + [[ ref:'ref_panel'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf", + checkIfExists: true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf.csi", + checkIfExists: true)], + [[ ref:'ref_panel2'], + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf", + checkIfExists: true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf.csi", + checkIfExists: true)] + ]) + region = Channel.fromList([ + [[chr: "chr21", region: "chr21:16600000-16800000"], "chr21:16600000-16800000"], + [[chr: "chr22", region: "chr22:16600000-16800000"], "chr22:16600000-16800000"] + ]) + + input_vcf = Channel.fromList([ + [[ id:'NA12878'], // meta map + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bcf", checkIfExists: true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bcf.csi", checkIfExists: true), + ], + [[ id:'NA19401'], // meta map + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bcf", checkIfExists: true), + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bcf.csi", checkIfExists: true), + ] + ]) + input_vcf_multiple = input_vcf + .combine( samples_infos ) + .combine( region ) + .map{ metaI, vcf, index, sample, metaCR, region -> + [metaI + metaCR, vcf, index, sample, region ] + } + + ch_map = Channel.fromList([ + [[ chr: "chr21"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/GRCh38.chr21.s.map", checkIfExists: true) + ], + [[ chr: "chr22"], + file("https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/GRCh38.chr22.s.map", checkIfExists: true) + ] + ]) + + // Combine input and map depending on chromosome name + ch_input_map = input_vcf_multiple + .map{ metaIRC, vcf, index, sample, region -> + [metaIRC.subMap(["chr"]), metaIRC, vcf, index, sample, region] + } + .combine(ch_map, by: 0) + .map{ metaC, metaIRC, vcf, index, sample, region, map -> + [metaIRC, vcf, index, sample, region, map] } + + // Combine input and map to reference panel could also be done by chromosome + input[0] = ch_input_map + .combine(ch_panel) + .map{ metaIRC, vcf, index, sample, region, map, metaP, ref, ref_index -> + [ metaIRC + metaP, vcf, index, sample, region, ref, ref_index, map ] + } + """ + } + } + + then { + def lines = path(workflow.out.merged_variants.get(0).get(1)).linesGzip.last() + assertAll( + { assert workflow.success }, + { assert snapshot(workflow.out.versions).match("versions") }, + { assert snapshot(workflow.out.chunk_chr).match("chunk_chr") }, + { assert workflow.out.merged_variants.size() == 8}, + { assert snapshot(lines).match("merged") } + ) + } + + } + +} diff --git a/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test.snap b/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test.snap new file mode 100644 index 00000000..287930cd --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/tests/main.nf.test.snap @@ -0,0 +1,146 @@ +{ + "chunk_chr": { + "content": [ + [ + [ + { + "id": "NA12878", + "chr": "chr21", + "region": "chr21:16600000-16800000", + "ref": "ref_panel2" + }, + "NA12878_chr21:16600000-16800000_chunk.txt:md5,775240b195e782b3b83adf52e0d17089" + ], + [ + { + "id": "NA12878", + "chr": "chr21", + "region": "chr21:16600000-16800000", + "ref": "ref_panel" + }, + "NA12878_chr21:16600000-16800000_chunk.txt:md5,775240b195e782b3b83adf52e0d17089" + ], + [ + { + "id": "NA12878", + "chr": "chr22", + "region": "chr22:16600000-16800000", + "ref": "ref_panel2" + }, + "NA12878_chr22:16600000-16800000_chunk.txt:md5,f5270ed0faa4f9697618444b267442ce" + ], + [ + { + "id": "NA12878", + "chr": "chr22", + "region": "chr22:16600000-16800000", + "ref": "ref_panel" + }, + "NA12878_chr22:16600000-16800000_chunk.txt:md5,f5270ed0faa4f9697618444b267442ce" + ], + [ + { + "id": "NA19401", + "chr": "chr21", + "region": "chr21:16600000-16800000", + "ref": "ref_panel2" + }, + "NA19401_chr21:16600000-16800000_chunk.txt:md5,775240b195e782b3b83adf52e0d17089" + ], + [ + { + "id": "NA19401", + "chr": "chr21", + "region": "chr21:16600000-16800000", + "ref": "ref_panel" + }, + "NA19401_chr21:16600000-16800000_chunk.txt:md5,775240b195e782b3b83adf52e0d17089" + ], + [ + { + "id": "NA19401", + "chr": "chr22", + "region": "chr22:16600000-16800000", + "ref": "ref_panel2" + }, + "NA19401_chr22:16600000-16800000_chunk.txt:md5,f5270ed0faa4f9697618444b267442ce" + ], + [ + { + "id": "NA19401", + "chr": "chr22", + "region": "chr22:16600000-16800000", + "ref": "ref_panel" + }, + "NA19401_chr22:16600000-16800000_chunk.txt:md5,f5270ed0faa4f9697618444b267442ce" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T23:16:59.014613786" + }, + "versions": { + "content": [ + [ + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,227d8e960e4382d8a615e040b874fc27", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,73621eae1bfd89c2ceb009524fe680d4", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,7ae4d2b0252f9382dd08d783b7a234d2", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,a802158fea97c36620863658efb7ae68", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96", + "versions.yml:md5,e37bdea2d40f36ce8546f87c3e572c96" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T23:16:58.336134471" + }, + "merged": { + "content": [ + "chr21\t16609998\t21:16609998:A:G\tA\tG\t.\t.\tRAF=0.00125156;AF=0;INFO=1\tGT:DS:GP:HS\t0/0:0:1,0,0:0" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T23:16:59.786459305" + } +} \ No newline at end of file diff --git a/subworkflows/nf-core/vcf_impute_glimpse/tests/nextflow.config b/subworkflows/nf-core/vcf_impute_glimpse/tests/nextflow.config new file mode 100644 index 00000000..09ab6858 --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/tests/nextflow.config @@ -0,0 +1,12 @@ +process { + withName: GLIMPSE_CHUNK { + ext.prefix = { "${meta.id}_${meta.region}_chunk" } + ext.args = "--window-size 50000 --buffer-size 1000" + } + withName: GLIMPSE_PHASE { + ext.prefix = { "${meta.id}_${meta.region}_${meta.ref}_phase_${input_region.replace(":","_")}" } + } + withName: GLIMPSE_LIGATE { + ext.prefix = { "${meta.id}_${meta.chr}_${meta.ref}_ligate" } + } +} diff --git a/subworkflows/nf-core/vcf_impute_glimpse/tests/tags.yml b/subworkflows/nf-core/vcf_impute_glimpse/tests/tags.yml new file mode 100644 index 00000000..24003ec0 --- /dev/null +++ b/subworkflows/nf-core/vcf_impute_glimpse/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/vcf_impute_glimpse: + - subworkflows/nf-core/vcf_impute_glimpse/** diff --git a/subworkflows/nf-core/vcf_phase_shapeit5/main.nf b/subworkflows/nf-core/vcf_phase_shapeit5/main.nf new file mode 100644 index 00000000..51061373 --- /dev/null +++ b/subworkflows/nf-core/vcf_phase_shapeit5/main.nf @@ -0,0 +1,96 @@ +include { BEDTOOLS_MAKEWINDOWS } from '../../../modules/nf-core/bedtools/makewindows/main.nf' +include { SHAPEIT5_PHASECOMMON } from '../../../modules/nf-core/shapeit5/phasecommon/main' +include { SHAPEIT5_LIGATE } from '../../../modules/nf-core/shapeit5/ligate/main' +include { BCFTOOLS_INDEX as VCF_BCFTOOLS_INDEX_1 } from '../../../modules/nf-core/bcftools/index/main.nf' +include { BCFTOOLS_INDEX as VCF_BCFTOOLS_INDEX_2 } from '../../../modules/nf-core/bcftools/index/main.nf' + +workflow VCF_PHASE_SHAPEIT5 { + + take: + ch_vcf // channel (mandatory): [ val(meta), path(vcf), path(csi), path(pedigree), val(region) ] + ch_ref // channel (optional) : [ val(meta), path(ref), path(csi) ] + ch_scaffold // channel (optional) : [ val(meta), path(scaffold), path(csi) ] + ch_map // channel (optional) : [ val(meta), path(map)] + + main: + + ch_versions = Channel.empty() + + // It is needed to generate a file containing the region to phase in a Chr \tab Start \tab End format + // The meta map needing to be conserved the following steps a required + + // Keep the meta map and the region in two separated channel but keed id field to link them back + ch_region = ch_vcf + .multiMap { meta, vcf, csi, pedigree, region -> + metadata: [ meta.id, meta] + region : [ meta.id, region] + } + + // Create the File in bed format and use the meta id for the file name + ch_merged_region = ch_region.region + .collectFile { metaid, region -> ["${metaid}.bed", region.replace(":","\t").replace("-","\t")] } + .map { file -> [file.baseName, file] } + + // Link back the meta map with the file + ch_region_file = ch_region.metadata + .join(ch_merged_region, failOnMismatch:true, failOnDuplicate:true) + .map { mid, meta, region_file -> [meta, region_file]} + + BEDTOOLS_MAKEWINDOWS(ch_region_file) + ch_versions = ch_versions.mix(BEDTOOLS_MAKEWINDOWS.out.versions.first()) + + ch_chunk_output = BEDTOOLS_MAKEWINDOWS.out.bed + .splitCsv(header: ['Chr', 'Start', 'End'], sep: "\t", skip: 0) + .map { meta, it -> [meta, it["Chr"]+":"+it["Start"]+"-"+it["End"]]} + + // Count the number of chunks + ch_chunks_number = BEDTOOLS_MAKEWINDOWS.out.bed + .map { meta, bed -> [meta, bed.countLines().intValue()]} + + ch_phase_input = ch_vcf + .map { meta, vcf, index, pedigree, region -> + [meta, vcf, index, pedigree] } + .combine(ch_chunk_output, by:0) + .map { meta, vcf, index, pedigree, chunk -> + [meta + [id: "${meta.id}_${chunk.replace(":","-")}"], // The meta.id field need to be modified to be unique for each chunk + vcf, index, pedigree, chunk]} + + SHAPEIT5_PHASECOMMON ( ch_phase_input, + ch_ref, + ch_scaffold, + ch_map ) + ch_versions = ch_versions.mix(SHAPEIT5_PHASECOMMON.out.versions.first()) + + VCF_BCFTOOLS_INDEX_1(SHAPEIT5_PHASECOMMON.out.phased_variant) + ch_versions = ch_versions.mix(VCF_BCFTOOLS_INDEX_1.out.versions.first()) + + ch_ligate_input = SHAPEIT5_PHASECOMMON.out.phased_variant + .join(VCF_BCFTOOLS_INDEX_1.out.csi, failOnMismatch:true, failOnDuplicate:true) + .map{ meta, vcf, csi -> + newmeta = meta + [id: meta.id.split("_")[0..-2].join("_")] + [newmeta, vcf, csi]} + .combine(ch_chunks_number, by:0) + .map{meta, vcf, csi, chunks_num -> + [groupKey(meta, chunks_num), vcf, csi]} + .groupTuple() + .map{ meta, vcf, csi -> + [ meta, + vcf.sort { a, b -> + def aStart = a.getName().split('-')[-2].toInteger() + def bStart = b.getName().split('-')[-2].toInteger() + aStart <=> bStart}, + csi]} + + SHAPEIT5_LIGATE(ch_ligate_input) + ch_versions = ch_versions.mix(SHAPEIT5_LIGATE.out.versions.first()) + + VCF_BCFTOOLS_INDEX_2(SHAPEIT5_LIGATE.out.merged_variants) + ch_versions = ch_versions.mix(VCF_BCFTOOLS_INDEX_2.out.versions.first()) + + emit: + bed = BEDTOOLS_MAKEWINDOWS.out.bed // channel: [ val(meta), bed ] + variants_phased = SHAPEIT5_LIGATE.out.merged_variants // channel: [ val(meta), vcf ] + variants_index = VCF_BCFTOOLS_INDEX_2.out.csi // channel: [ val(meta), csi ] + versions = ch_versions // channel: [ versions.yml ] +} + diff --git a/subworkflows/nf-core/vcf_phase_shapeit5/meta.yml b/subworkflows/nf-core/vcf_phase_shapeit5/meta.yml new file mode 100644 index 00000000..54c8cd01 --- /dev/null +++ b/subworkflows/nf-core/vcf_phase_shapeit5/meta.yml @@ -0,0 +1,71 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "vcf_phase_shapeit5" +description: Phase vcf panel with Shapeit5 tools +keywords: + - chunk + - phase + - ligate + - index + - vcf +components: + - bedtools/makewindows + - shapeit5/phasecommon + - shapeit5/ligate + - bcftools/index +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - ch_vcf: + type: file + description: | + Target dataset in VCF/BCF format defined at all variable positions. + Index file of the input VCF/BCF file containing genotype likelihoods. + Pedigree information in the following format: offspring father mother. + Target region, usually a full chromosome (e.g. chr20:1000000-2000000 or chr20). + The file could possibly be without GT field (for efficiency reasons a file containing only the positions is recommended). + Structure: [ val(meta), path(vcf), path(csi), path(pedigree), val(region) ] + - ch_ref: + type: file + description: | + Reference panel of haplotypes in VCF/BCF format. + Index file of the Reference panel file. + Structure: [ val(meta), path(ref), path(csi) ] + - ch_scaffold: + type: file + description: | + Scaffold of haplotypes in VCF/BCF format. + Index file of the Scaffold of haplotypes file. + Structure: [ val(meta), path(scaffold), path(csi) ] + - ch_map: + type: file + description: File containing the genetic map. + Structure: [val(meta), path(map)] +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - bed: + type: file + description: BED file containing the windows + pattern: "*.bed" + - variants_phased: + type: file + description: Phased haplotypes in VCF/BCF format. + pattern: "*.{vcf,bcf,vcf.gz,bcf.gz}" + - variants_index: + type: file + description: CSI bcftools index + pattern: "*.csi" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@LouisLeNezet" +maintainers: + - "@LouisLeNezet" diff --git a/tests/config/env_nf.yml b/tests/config/env_nf.yml new file mode 100644 index 00000000..e3b11408 --- /dev/null +++ b/tests/config/env_nf.yml @@ -0,0 +1,13 @@ +name: env_nf +channels: + - conda-forge + - bioconda + - anaconda + - defaults +dependencies: + - openjdk>=17.0 + - nextflow>=23.10 + - singularity>=3.8 + - nf-core>=2.13.0 + - prettier>=3.0 + - nf-test>=0.8 diff --git a/tests/config/nf-test.config b/tests/config/nf-test.config new file mode 100644 index 00000000..32ca7b47 --- /dev/null +++ b/tests/config/nf-test.config @@ -0,0 +1,52 @@ +params { + publish_dir_mode = "copy" + singularity_pull_docker_container = false + test_data_base = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules' + modules_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/' + phaseimpute_testdata_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/' +} + +process { + cpus = 2 + memory = 3.GB + time = 2.h +} + +profiles { + singularity { + singularity.enabled = true + singularity.autoMounts = true + } + conda { + conda.enabled = true + } + mamba { + conda.enabled = true + conda.useMamba = true + } + podman { + podman.enabled = true + podman.userEmulation = true + podman.runOptions = "--runtime crun --platform linux/x86_64 --systemd=always" + } + docker { + docker.enabled = true + docker.userEmulation = false + docker.fixOwnership = true + docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)' + } +} + +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' + +// Increase time available to build Conda environment +conda { createTimeout = "120 min" } + +// Load test_data.config containing paths to test data +includeConfig 'test_data.config' + +manifest { + nextflowVersion = '!>=23.04.0' +} diff --git a/tests/config/test_data.config b/tests/config/test_data.config new file mode 100644 index 00000000..d514c9c9 --- /dev/null +++ b/tests/config/test_data.config @@ -0,0 +1,729 @@ +// README: +// https://github.com/nf-core/test-datasets/blob/modules/README.md + +params { + // Base directory for test data + test_data_base = "https://raw.githubusercontent.com/nf-core/test-datasets/modules" + + test_data { + 'sarscov2' { + 'genome' { + genome_fasta = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.fasta" + genome_fasta_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.fasta.gz" + genome_fasta_fai = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.fasta.fai" + genome_fasta_txt_zst = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.fasta.txt.zst" + genome_dict = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.dict" + genome_gff3 = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.gff3" + genome_gff3_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.gff3.gz" + genome_gtf = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.gtf" + genome_paf = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.paf" + genome_sizes = "${params.test_data_base}/data/genomics/sarscov2/genome/genome.sizes" + transcriptome_fasta = "${params.test_data_base}/data/genomics/sarscov2/genome/transcriptome.fasta" + proteome_fasta = "${params.test_data_base}/data/genomics/sarscov2/genome/proteome.fasta" + transcriptome_paf = "${params.test_data_base}/data/genomics/sarscov2/genome/transcriptome.paf" + + test_bed = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/test.bed" + test_bed_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/test.bed.gz" + test2_bed = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/test2.bed" + test_bed12 = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/test.bed12" + baits_bed = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/baits.bed" + bed_autosql = "${params.test_data_base}/data/genomics/sarscov2/genome/bed/bed6alt.as" + + reference_cnn = "${params.test_data_base}/data/genomics/sarscov2/genome/cnn/reference.cnn" + + kraken2 = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kraken2" + kraken2_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kraken2.tar.gz" + + kraken2_bracken = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kraken2_bracken" + kraken2_bracken_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kraken2_bracken.tar.gz" + + kaiju = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kaiju" + kaiju_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kaiju.tar.gz" + + kofamscan_profiles_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kofamscan/profiles.tar.gz" + kofamscan_ko_list_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/kofamscan/ko_list.gz" + + ncbi_taxmap_zip = "${params.test_data_base}/data/genomics/sarscov2/genome/db/maltextract/ncbi_taxmap.zip" + taxon_list_txt = "${params.test_data_base}/data/genomics/sarscov2/genome/db/maltextract/taxon_list.txt" + + mmseqs_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/db/mmseqs.tar.gz" + + all_sites_fas = "${params.test_data_base}/data/genomics/sarscov2/genome/alignment/all_sites.fas" + informative_sites_fas = "${params.test_data_base}/data/genomics/sarscov2/genome/alignment/informative_sites.fas" + + contigs_genome_maf_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/alignment/last/contigs.genome.maf.gz" + contigs_genome_par = "${params.test_data_base}/data/genomics/sarscov2/genome/alignment/last/contigs.genome.par" + lastdb_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/genome/alignment/last/lastdb.tar.gz" + + baits_interval_list = "${params.test_data_base}/data/genomics/sarscov2/genome/picard/baits.interval_list" + targets_interval_list = "${params.test_data_base}/data/genomics/sarscov2/genome/picard/targets.interval_list" + regions_txt = "${params.test_data_base}/data/genomics/sarscov2/genome/graphtyper/regions.txt" + } + 'illumina' { + test_single_end_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.single_end.bam" + test_single_end_sorted_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.single_end.sorted.bam" + test_single_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.single_end.sorted.bam.bai" + test_paired_end_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.bam" + test_paired_end_sorted_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam" + test_paired_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam.bai" + test_paired_end_methylated_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.methylated.bam" + test_paired_end_methylated_sorted_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.methylated.sorted.bam" + test_paired_end_methylated_sorted_bam_bai = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.paired_end.methylated.sorted.bam.bai" + test_unaligned_bam = "${params.test_data_base}/data/genomics/sarscov2/illumina/bam/test.unaligned.bam" + + test_1_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_1.fastq.gz" + test_2_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_2.fastq.gz" + test_interleaved_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz" + test_1_fastq_txt_zst = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_1.fastq.txt.zst" + test2_1_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test2_1.fastq.gz" + test2_2_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test2_2.fastq.gz" + test_methylated_1_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test.methylated_1.fastq.gz" + test_methylated_2_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test.methylated_2.fastq.gz" + + test_bedgraph = "${params.test_data_base}/data/genomics/sarscov2/illumina/bedgraph/test.bedgraph" + + test_bigwig = "${params.test_data_base}/data/genomics/sarscov2/illumina/bigwig/test.bigwig" + + test_wig_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/wig/test.wig.gz" + + test_baserecalibrator_table = "${params.test_data_base}/data/genomics/sarscov2/illumina/gatk/test.baserecalibrator.table" + + test_computematrix_mat_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/deeptools/test.computeMatrix.mat.gz" + + test_bcf = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test.bcf" + + test_vcf = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test.vcf" + test_vcf_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test.vcf.gz" + test_vcf_gz_tbi = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test.vcf.gz.tbi" + test2_vcf = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test2.vcf" + test2_vcf_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test2.vcf.gz" + test2_vcf_gz_tbi = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test2.vcf.gz.tbi" + test2_vcf_targets_tsv_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test2.targets.tsv.gz" + test3_vcf = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test3.vcf" + test3_vcf_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test3.vcf.gz" + test3_vcf_gz_tbi = "${params.test_data_base}/data/genomics/sarscov2/illumina/vcf/test3.vcf.gz.tbi" + + contigs_fasta = "${params.test_data_base}/data/genomics/sarscov2/illumina/fasta/contigs.fasta" + scaffolds_fasta = "${params.test_data_base}/data/genomics/sarscov2/illumina/fasta/scaffolds.fasta" + + assembly_gfa = "${params.test_data_base}/data/genomics/sarscov2/illumina/gfa/assembly.gfa" + assembly_gfa_bgz = "${params.test_data_base}/data/genomics/sarscov2/illumina/gfa/assembly.gfa.bgz" + assembly_gfa_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/gfa/assembly.gfa.gz" + assembly_gfa_zst = "${params.test_data_base}/data/genomics/sarscov2/illumina/gfa/assembly.gfa.zst" + + test_single_end_bam_readlist_txt = "${params.test_data_base}/data/genomics/sarscov2/illumina/picard/test.single_end.bam.readlist.txt" + + SRR13255544_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/sra/SRR13255544.tar.gz" + SRR11140744_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/sra/SRR11140744.tar.gz" + } + 'nanopore' { + test_sorted_bam = "${params.test_data_base}/data/genomics/sarscov2/nanopore/bam/test.sorted.bam" + test_sorted_bam_bai = "${params.test_data_base}/data/genomics/sarscov2/nanopore/bam/test.sorted.bam.bai" + + fast5_tar_gz = "${params.test_data_base}/data/genomics/sarscov2/nanopore/fast5/fast5.tar.gz" + + test_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/nanopore/fastq/test.fastq.gz" + + test_sequencing_summary = "${params.test_data_base}/data/genomics/sarscov2/nanopore/sequencing_summary/test.sequencing_summary.txt" + } + 'metagenome' { + classified_reads_assignment = "${params.test_data_base}/data/genomics/sarscov2/metagenome/test_1.kraken2.reads.txt" + kraken_report = "${params.test_data_base}/data/genomics/sarscov2/metagenome/test_1.kraken2.report.txt" + krona_taxonomy = "${params.test_data_base}/data/genomics/sarscov2/metagenome/krona_taxonomy.tab" + seqid2taxid_map = "${params.test_data_base}/data/genomics/sarscov2/metagenome/seqid2taxid.map" + nodes_dmp = "${params.test_data_base}/data/genomics/sarscov2/metagenome/nodes.dmp" + names_dmp = "${params.test_data_base}/data/genomics/sarscov2/metagenome/names.dmp" + } + } + 'mus_musculus' { + 'genome' { + rnaseq_samplesheet = "${params.test_data_base}/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv" + rnaseq_genemeta = "${params.test_data_base}/data/genomics/mus_musculus/rnaseq_expression/SRP254919.gene_meta.tsv" + rnaseq_contrasts = "${params.test_data_base}/data/genomics/mus_musculus/rnaseq_expression/SRP254919.contrasts.csv" + rnaseq_matrix = "${params.test_data_base}/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv" + deseq_results = "${params.test_data_base}/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.deseq2.results.tsv" + } + 'illumina' { + test_1_fastq_gz = "${params.test_data_base}/data/genomics/mus_musculus/mageck/ERR376998.small.fastq.gz" + test_2_fastq_gz = "${params.test_data_base}/data/genomics/mus_musculus/mageck/ERR376999.small.fastq.gz" + } + 'csv' { + count_table = "${params.test_data_base}/data/genomics/mus_musculus/mageck/count_table.csv" + library = "${params.test_data_base}/data/genomics/mus_musculus/mageck/yusa_library.csv" + } + 'txt' { + design_matrix = "${params.test_data_base}/data/genomics/mus_musculus/mageck/design_matrix.txt" + } + } + 'homo_sapiens' { + '10xgenomics' { + cellranger { + test_10x_10k_pbmc_5fb_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5fb/subsampled_sc5p_v2_hs_PBMC_10k_5fb_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_5fb_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5fb/subsampled_sc5p_v2_hs_PBMC_10k_5fb_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_5gex_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5gex/subsampled_sc5p_v2_hs_PBMC_10k_5gex_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_5gex_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/5gex/5gex/subsampled_sc5p_v2_hs_PBMC_10k_5gex_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_b_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/bcell/subsampled_sc5p_v2_hs_PBMC_10k_b_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_b_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/bcell/subsampled_sc5p_v2_hs_PBMC_10k_b_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_t_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/tcell/subsampled_sc5p_v2_hs_PBMC_10k_t_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_t_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/fastqs/tcell/subsampled_sc5p_v2_hs_PBMC_10k_t_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_feature_ref_csv = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc/sc5p_v2_hs_PBMC_10k_multi_5gex_5fb_b_t_feature_ref.csv" + + test_10x_10k_pbmc_cmo_cmo_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/cmo/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_multiplexing_capture_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_cmo_cmo_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/cmo/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_multiplexing_capture_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_cmo_gex1_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_1/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_gex_S2_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_cmo_gex1_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_1/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_1_gex_S2_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_cmo_gex2_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_2/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_2_gex_S1_L001_R1_001.fastq.gz" + test_10x_10k_pbmc_cmo_gex2_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/fastqs/gex_2/subsampled_SC3_v3_NextGem_DI_CellPlex_Human_PBMC_10K_2_gex_S1_L001_R2_001.fastq.gz" + test_10x_10k_pbmc_cmo_feature_ref_csv = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/10k_pbmc_cmo/10k_pbmc_cmo_count_feature_reference.csv" + + test_10x_5k_cmvpos_tcells_ab_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/ab/subsampled_5k_human_antiCMV_T_TBNK_connect_AB_S2_L004_R1_001.fastq.gz" + test_10x_5k_cmvpos_tcells_ab_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/ab/subsampled_5k_human_antiCMV_T_TBNK_connect_AB_S2_L004_R2_001.fastq.gz" + test_10x_5k_cmvpos_tcells_gex1_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/gex_1/subsampled_5k_human_antiCMV_T_TBNK_connect_GEX_1_S1_L001_R1_001.fastq.gz" + test_10x_5k_cmvpos_tcells_gex1_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/gex_1/subsampled_5k_human_antiCMV_T_TBNK_connect_GEX_1_S1_L001_R2_001.fastq.gz" + test_10x_5k_cmvpos_tcells_vdj_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/vdj/subsampled_5k_human_antiCMV_T_TBNK_connect_VDJ_S1_L001_R1_001.fastq.gz" + test_10x_5k_cmvpos_tcells_vdj_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/fastqs/vdj/subsampled_5k_human_antiCMV_T_TBNK_connect_VDJ_S1_L001_R2_001.fastq.gz" + test_10x_5k_cmvpos_tcells_feature_ref_csv = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/5k_cmvpos_tcells/5k_human_antiCMV_T_TBNK_connect_Multiplex_count_feature_reference.csv" + + test_10x_vdj_ref_json = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/references/vdj/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0/reference.json" + test_10x_vdj_ref_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/references/vdj/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0/fasta/regions.fa" + test_10x_vdj_ref_suppfasta = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger/references/vdj/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0/fasta/supp_regions.fa" + + test_scATAC_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger-atac/test_scATAC_S1_L001_R1_001.fastq.gz" + test_scATAC_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger-atac/test_scATAC_S1_L001_R2_001.fastq.gz" + test_scATAC_3_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger-atac/test_scATAC_S1_L001_R3_001.fastq.gz" + test_scATAC_I_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/cellranger-atac/test_scATAC_S1_L001_I1_001.fastq.gz" + } + spaceranger { + test_10x_ffpe_cytassist_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-brain-cancer-11-mm-capture-area-ffpe-2-standard_v2_ffpe_cytassist/CytAssist_11mm_FFPE_Human_Glioblastoma_2_S1_L001_R1_001.fastq.gz" + test_10x_ffpe_cytassist_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-brain-cancer-11-mm-capture-area-ffpe-2-standard_v2_ffpe_cytassist/CytAssist_11mm_FFPE_Human_Glioblastoma_2_S1_L001_R2_001.fastq.gz" + test_10x_ffpe_cytassist_image = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-brain-cancer-11-mm-capture-area-ffpe-2-standard_v2_ffpe_cytassist/CytAssist_11mm_FFPE_Human_Glioblastoma_image.tif" + test_10x_ffpe_cytassist_probeset = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-brain-cancer-11-mm-capture-area-ffpe-2-standard_v2_ffpe_cytassist/CytAssist_11mm_FFPE_Human_Glioblastoma_probe_set.csv" + + test_10x_ffpe_v1_fastq_1_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-ovarian-cancer-1-standard_v1_ffpe/Visium_FFPE_Human_Ovarian_Cancer_S1_L001_R1_001.fastq.gz" + test_10x_ffpe_v1_fastq_2_gz = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-ovarian-cancer-1-standard_v1_ffpe/Visium_FFPE_Human_Ovarian_Cancer_S1_L001_R2_001.fastq.gz" + test_10x_ffpe_v1_image = "${params.test_data_base}/data/genomics/homo_sapiens/10xgenomics/spaceranger/human-ovarian-cancer-1-standard_v1_ffpe/Visium_FFPE_Human_Ovarian_Cancer_image.jpg" + } + } + 'genome' { + genome_elfasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.elfasta" + genome_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.fasta" + genome_fasta_fai = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.fasta.fai" + genome_fasta_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.fasta.gz" + genome_fasta_gz_fai = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.fasta.gz.fai" + genome_fasta_gz_gzi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.fasta.gz.gzi" + genome_strtablefile = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome_strtablefile.zip" + genome_dict = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.dict" + genome_gff3 = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.gff3" + genome_gtf = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.gtf" + genome_interval_list = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.interval_list" + genome_multi_interval_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.multi_intervals.bed" + genome_blacklist_interval_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.blacklist_intervals.bed" + genome_sizes = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.sizes" + genome_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.bed" + genome_header = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.header" + genome_bed_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.bed.gz" + genome_bed_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.bed.gz.tbi" + genome_elsites = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.elsites" + transcriptome_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/transcriptome.fasta" + genome2_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome2.fasta" + genome_chain_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.chain.gz" + genome_annotated_interval_tsv = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.annotated_intervals.tsv" + genome_mt_gb = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.NC_012920_1.gb" + genome_preprocessed_count_tsv = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.preprocessed_intervals.counts.tsv" + genome_preprocessed_interval_list = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.preprocessed_intervals.interval_list" + genome_ploidy_model = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.ploidy_model.tar.gz" + genome_ploidy_calls = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.ploidy_calls.tar.gz" + genome_germline_cnv_model = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.germline_cnv_model.tar.gz" + genome_germline_cnv_calls = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome.germline_cnv_calls.tar.gz" + genome_motifs = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome_motifs.txt" + genome_config = "${params.test_data_base}/data/genomics/homo_sapiens/genome/genome_config.json" + + genome_1_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr1/genome.fasta.gz" + genome_1_gtf = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr1/genome.gtf" + + genome_21_sdf = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome_sdf.tar.gz" + genome_21_fasta = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome.fasta" + genome_21_fasta_fai = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome.fasta.fai" + genome_21_gencode_gtf = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/chr21_gencode.gtf" + genome_21_dict = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome.dict" + genome_21_sizes = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome.sizes" + genome_21_interval_list = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/genome.interval_list" + genome_21_annotated_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/annotated.bed" + genome_21_multi_interval_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.bed" + genome_21_multi_interval_antitarget_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.antitarget.bed" + genome_21_multi_interval_bed_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.bed.gz" + genome_21_multi_interval_bed_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/multi_intervals.bed.gz.tbi" + genome_21_chromosomes_dir = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/chromosomes.tar.gz" + genome_21_reference_cnn = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/reference_chr21.cnn" + genome_21_eigenstrat_snp = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/chr_21.snp" + genome_21_stitch_posfile = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/sequence/dbsnp_138.hg38.first_10_biallelic_sites.tsv" + + dbsnp_146_hg38_elsites = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.elsites" + dbsnp_146_hg38_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz" + dbsnp_146_hg38_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz.tbi" + gnomad_r2_1_1_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/gnomAD.r2.1.1.vcf.gz" + gnomad_r2_1_1_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/gnomAD.r2.1.1.vcf.gz.tbi" + mills_and_1000g_indels_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/mills_and_1000G.indels.vcf.gz" + mills_and_1000g_indels_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/mills_and_1000G.indels.vcf.gz.tbi" + syntheticvcf_short_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/syntheticvcf_short.vcf.gz" + syntheticvcf_short_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/syntheticvcf_short.vcf.gz.tbi" + syntheticvcf_short_score = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/syntheticvcf_short.score" + gnomad_r2_1_1_sv_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/gnomAD.r2.1.1-sv.vcf.gz" + gnomad2_r2_1_1_sv_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/gnomAD2.r2.1.1-sv.vcf.gz" + + hapmap_3_3_hg38_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/hapmap_3.3.hg38.vcf.gz" + hapmap_3_3_hg38_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/hapmap_3.3.hg38.vcf.gz.tbi" + res_1000g_omni2_5_hg38_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/1000G_omni2.5.hg38.vcf.gz" + res_1000g_omni2_5_hg38_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/1000G_omni2.5.hg38.vcf.gz.tbi" + res_1000g_phase1_snps_hg38_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/1000G_phase1.snps.hg38.vcf.gz" + res_1000g_phase1_snps_hg38_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/1000G_phase1.snps.hg38.vcf.gz.tbi" + dbsnp_138_hg38_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/dbsnp_138.hg38.vcf.gz" + dbsnp_138_hg38_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/dbsnp_138.hg38.vcf.gz.tbi" + gnomad_r2_1_1_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz" + gnomad_r2_1_1_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/gnomAD.r2.1.1.vcf.gz.tbi" + mills_and_1000g_indels_21_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/mills_and_1000G.indels.hg38.vcf.gz" + mills_and_1000g_indels_21_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/mills_and_1000G.indels.hg38.vcf.gz.tbi" + haplotype_map = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/haplotype_map.txt" + dbNSFP_4_1a_21_hg38_txt_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/dbNSFP4.1a.21.txt.gz" + dbNSFP_4_1a_21_hg38_txt_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/chr21/germlineresources/dbNSFP4.1a.21.txt.gz.tbi" + + index_salmon = "${params.test_data_base}/data/genomics/homo_sapiens/genome/index/salmon" + repeat_expansions = "${params.test_data_base}/data/genomics/homo_sapiens/genome/loci/repeat_expansions.json" + justhusky_ped = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/ped/justhusky.ped" + justhusky_minimal_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/ped/justhusky_minimal.vcf.gz" + justhusky_minimal_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/ped/justhusky_minimal.vcf.gz.tbi" + + vcfanno_tar_gz = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/vcfanno/vcfanno_grch38_module_test.tar.gz" + vcfanno_toml = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vcf/vcfanno/vcfanno.toml" + updsites_bed = "${params.test_data_base}/data/genomics/homo_sapiens/genome/updsites.bed" + + prg_input = "${params.test_data_base}/data/genomics/homo_sapiens/genome/PRG_test.zip" + crispr_functional_counts = "${params.test_data_base}/data/genomics/homo_sapiens/genome/tsv/functional_genomics_counts.tsv" + crispr_functional_library = "${params.test_data_base}/data/genomics/homo_sapiens/genome/tsv/library_functional_genomics.tsv" + + vep_cache = "${params.test_data_base}/data/genomics/homo_sapiens/genome/vep.tar.gz" + affy_array_samplesheet = "${params.test_data_base}/data/genomics/homo_sapiens/array_expression/GSE38751.csv" + affy_array_celfiles_tar = "${params.test_data_base}/data/genomics/homo_sapiens/array_expression/GSE38751_RAW.tar" + + } + 'pangenome' { + pangenome_fa = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.fa" + pangenome_fa_bgzip = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.fa.gz" + pangenome_fa_bgzip_fai = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.fa.gz.fai" + pangenome_fa_bgzip_gzi = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.fa.gz.gzi" + pangenome_paf = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.paf" + pangenome_paf_gz = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.paf.gz" + pangenome_seqwish_gfa = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.seqwish.gfa" + pangenome_smoothxg_gfa = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.smoothxg.gfa" + pangenome_gfaffix_gfa = "${params.test_data_base}/data/pangenomics/homo_sapiens/pangenome.gfaffix.gfa" + 'odgi' { + pangenome_og = "${params.test_data_base}/data/pangenomics/homo_sapiens/odgi/pangenome.og" + pangenome_lay = "${params.test_data_base}/data/pangenomics/homo_sapiens/odgi/pangenome.lay" + } + } + 'illumina' { + test_paired_end_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam" + test_paired_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.sorted.bam.bai" + test_paired_end_name_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.name.sorted.bam" + test_paired_end_markduplicates_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.markduplicates.sorted.bam" + test_paired_end_markduplicates_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.markduplicates.sorted.bam.bai" + test_paired_end_markduplicates_sorted_referencesn_txt = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.markduplicates.sorted.referencesn.txt" + test_paired_end_recalibrated_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.recalibrated.sorted.bam" + test_paired_end_recalibrated_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.paired_end.recalibrated.sorted.bam.bai" + test_paired_end_umi_consensus_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.umi_consensus.bam" + test_paired_end_umi_converted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.umi_converted.bam" + test_paired_end_umi_grouped_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.umi_grouped.bam" + test_paired_end_umi_histogram_txt = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.umi_histogram.txt" + test_paired_end_umi_unsorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.umi_unsorted.bam" + test_paired_end_umi_unsorted_tagged_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.unsorted_tagged.bam" + test_paired_end_hla = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/example_hla_pe.bam" + test_paired_end_hla_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/example_hla_pe.sorted.bam" + test_paired_end_hla_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/example_hla_pe.sorted.bam.bai" + test_rna_paired_end_sorted_chr6_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.rna.paired_end.sorted.chr6.bam" + test_rna_paired_end_sorted_chr6_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test.rna.paired_end.sorted.chr6.bam.bai" + + test2_paired_end_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.sorted.bam" + test2_paired_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.sorted.bam.bai" + test2_paired_end_name_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.name.sorted.bam" + test2_paired_end_markduplicates_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.markduplicates.sorted.bam" + test2_paired_end_markduplicates_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.markduplicates.sorted.bam.bai" + test2_paired_end_recalibrated_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.recalibrated.sorted.bam" + test2_paired_end_recalibrated_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test2.paired_end.recalibrated.sorted.bam.bai" + test2_paired_end_umi_consensus_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.umi_consensus.bam" + test2_paired_end_umi_converted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.umi_converted.bam" + test2_paired_end_umi_grouped_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.umi_grouped.bam" + test2_paired_end_umi_histogram_txt = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.umi_histogram.txt" + test2_paired_end_umi_unsorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.umi_unsorted.bam" + test2_paired_end_umi_unsorted_tagged_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test2.paired_end.unsorted_tagged.bam" + test_paired_end_duplex_umi_unmapped_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.duplex_umi_unmapped.bam" + test_paired_end_duplex_umi_mapped_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.duplex_umi_mapped.bam" + test_paired_end_duplex_umi_mapped_tagged_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.duplex_umi_mapped_tagged.bam" + test_paired_end_duplex_umi_grouped_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.duplex_umi_grouped.bam" + test_paired_end_duplex_umi_duplex_consensus_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/umi/test.paired_end.duplex_umi_duplex_consensus.bam" + + mitochon_standin_recalibrated_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/mitochon_standin.recalibrated.sorted.bam" + mitochon_standin_recalibrated_sorted_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/mitochon_standin.recalibrated.sorted.bam.bai" + test_illumina_mt_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test_illumina_mt.bam" + test_illumina_mt_bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test_illumina_mt.bam.bai" + + test3_single_end_markduplicates_sorted_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/test3.single_end.markduplicates.sorted.bam" + + read_group_settings_txt = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bam/read_group_settings.txt" + + test_paired_end_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram" + test_paired_end_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.sorted.cram.crai" + test_paired_end_markduplicates_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.markduplicates.sorted.cram" + test_paired_end_markduplicates_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.markduplicates.sorted.cram.crai" + test_paired_end_recalibrated_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.recalibrated.sorted.cram" + test_paired_end_recalibrated_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test.paired_end.recalibrated.sorted.cram.crai" + + test2_paired_end_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.sorted.cram" + test2_paired_end_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.sorted.cram.crai" + test2_paired_end_markduplicates_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.markduplicates.sorted.cram" + test2_paired_end_markduplicates_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.markduplicates.sorted.cram.crai" + test2_paired_end_recalibrated_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.recalibrated.sorted.cram" + test2_paired_end_recalibrated_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test2.paired_end.recalibrated.sorted.cram.crai" + test3_paired_end_recalibrated_sorted_cram = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test3.paired_end.recalibrated.sorted.cram" + test3_paired_end_recalibrated_sorted_cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/cram/test3.paired_end.recalibrated.sorted.cram.crai" + + test_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz" + test_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_2.fastq.gz" + test_umi_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test.umi_1.fastq.gz" + test_umi_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test.umi_2.fastq.gz" + test2_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test2_1.fastq.gz" + test2_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test2_2.fastq.gz" + test2_umi_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test2.umi_1.fastq.gz" + test2_umi_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test2.umi_2.fastq.gz" + test_rnaseq_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_rnaseq_1.fastq.gz" + test_rnaseq_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_rnaseq_2.fastq.gz" + test_paired_end_duplex_umi_1_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_duplex_umi_1.fastq.gz" + test_paired_end_duplex_umi_2_fastq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/fastq/test_duplex_umi_2.fastq.gz" + + test_baserecalibrator_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test.baserecalibrator.table" + test2_baserecalibrator_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test2.baserecalibrator.table" + test_pileups_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test.pileups.table" + test2_pileups_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test2.pileups.table" + + test_paired_end_sorted_dragstrmodel = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_paired_end_sorted_dragstrmodel.txt" + + test_genomicsdb_tar_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_genomicsdb.tar.gz" + test_pon_genomicsdb_tar_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_pon_genomicsdb.tar.gz" + + test2_haplotc_ann_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test2_haplotc.ann.vcf.gz" + test2_haplotc_ann_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test2_haplotc.ann.vcf.gz.tbi" + test_haplotc_cnn_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test_haplotcaller.cnn.vcf.gz" + test_haplotc_cnn_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test_haplotcaller.cnn.vcf.gz.tbi" + + test2_haplotc_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test2_haplotc.vcf.gz" + test2_haplotc_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/haplotypecaller_calls/test2_haplotc.vcf.gz.tbi" + + test2_recal = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2.recal" + test2_recal_idx = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2.recal.idx" + test2_tranches = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2.tranches" + test2_allele_specific_recal = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2_allele_specific.recal" + test2_allele_specific_recal_idx = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2_allele_specific.recal.idx" + test2_allele_specific_tranches = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/variantrecalibrator/test2_allele_specific.tranches" + + test_test2_paired_mutect2_calls_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.vcf.gz" + test_test2_paired_mutect2_calls_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.vcf.gz.tbi" + test_test2_paired_mutect2_calls_vcf_gz_stats = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.vcf.gz.stats" + test_test2_paired_mutect2_calls_f1r2_tar_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.f1r2.tar.gz" + test_test2_paired_mutect2_calls_artifact_prior_tar_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_test2_paired_mutect2_calls.artifact-prior.tar.gz" + test_test2_paired_segmentation_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_test2_paired.segmentation.table" + test_test2_paired_contamination_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/test_test2_paired.contamination.table" + + test_genome_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf" + test_genome_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.gz" + test_genome_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.gz.tbi" + test_genome_vcf_idx = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.idx" + + test_genome_vcf_ud = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/svd/test.genome.vcf.UD" + test_genome_vcf_mu = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/svd/test.genome.vcf.mu" + test_genome_vcf_bed = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/svd/test.genome.vcf.bed" + + test2_genome_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf" + test2_genome_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf.gz" + test2_genome_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf.gz.tbi" + test2_genome_vcf_idx = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf.idx" + + test_genome21_indels_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test.genome_21.somatic_sv.vcf.gz" + test_genome21_indels_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test.genome_21.somatic_sv.vcf.gz.tbi" + + test_mpileup = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/mpileup/test.mpileup.gz" + test2_mpileup = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/mpileup/test2.mpileup.gz" + + test_broadpeak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/broadpeak/test.broadPeak" + test2_broadpeak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/broadpeak/test2.broadPeak" + + test_narrowpeak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/narrowpeak/test.narrowPeak" + test2_narrowpeak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/narrowpeak/test2.narrowPeak" + + test_yak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/yak/test.yak" + test2_yak = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/yak/test2.yak" + + cutandrun_bedgraph_test_1 = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bedgraph/cutandtag_h3k27me3_test_1.bedGraph" + cutandrun_bedgraph_test_2 = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bedgraph/cutandtag_igg_test_1.bedGraph" + + empty_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/empty.vcf.gz" + empty_vcf_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/empty.vcf.gz.tbi" + + simulated_sv = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz" + simulated_sv_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv.vcf.gz.tbi" + simulated_sv2 = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv2.vcf.gz" + simulated_sv2_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/chr21/simulated_sv2.vcf.gz.tbi" + + test_rnaseq_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test.rnaseq.vcf" + test_sv_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/sv_query.vcf.gz" + test_sv_vcf_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/sv_query.vcf.gz.tbi" + genmod_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/genmod.vcf.gz" + genmod_annotate_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test_annotate.vcf.gz" + genmod_models_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test_models.vcf.gz" + genmod_score_vcf_gz = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/test_score.vcf.gz" + + test_mito_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/vcf/NA12878_chrM.vcf.gz" + + test_pytor = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/pytor/test.pytor" + rank_model = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/genmod/svrank_model_-v1.8-.ini" + + test_flowcell = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bcl/flowcell.tar.gz" + test_flowcell_samplesheet = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/bcl/flowcell_samplesheet.csv" + + varlociraptor_scenario = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/varlociraptor/scenario.yml" + + contig_ploidy_priors_table = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/gatk/contig_ploidy_priors_table.tsv" + + purecn_ex1_bam = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_ex1.bam" + purecn_ex1_bai = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_ex1.bam.bai" + purecn_ex1_interval = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_ex1_intervals.txt" + purecn_ex1_normal = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_ex1_normal.txt.gz" + purecn_ex2_normal = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_ex2_normal.txt.gz" + purecn_normalpanel_vcf = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_normalpanel.vcf.gz" + purecn_normalpanel_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/illumina/purecn/purecn_normalpanel.vcf.gz.tbi" + } + 'pacbio' { + primers = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fasta/primers.fasta" + alz = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.bam" + alzpbi = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.bam.pbi" + ccs = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.bam" + ccs_fa = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fasta/alz.ccs.fasta" + ccs_fa_gz = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fasta/alz.ccs.fasta.gz" + ccs_fq = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fastq/alz.ccs.fastq" + ccs_fq_gz = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fastq/alz.ccs.fastq.gz" + ccs_xml = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/xml/alz.ccs.consensusreadset.xml" + hifi = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/fastq/test_hifi.fastq.gz" + lima = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.bam" + refine = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.bam" + cluster = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.bam" + singletons = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.bam" + aligned = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned.bam" + alignedbai = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned.bam.bai" + genemodel1 = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bed/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned_tc.bed" + genemodel2 = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/bed/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.singletons.merged.aligned_tc.2.bed" + filelist = "${params.test_data_base}/data/genomics/homo_sapiens/pacbio/txt/filelist.txt" + } + 'scramble' { + fasta = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.fa" + fasta_fai = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.fa.fai" + bam = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.bam" + bam_bai = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.bam.bai" + cram = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.cram" + cram_crai = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.cram.crai" + bed = "${params.test_data_base}/data/genomics/homo_sapiens/scramble/test.bed" + } + 'gene_set_analysis' { + gct = "${params.test_data_base}/data/genomics/homo_sapiens/gene_set_analysis/P53_6samples_collapsed_symbols.gct" + cls = "${params.test_data_base}/data/genomics/homo_sapiens/gene_set_analysis/P53_6samples.cls" + gmx = "${params.test_data_base}/data/genomics/homo_sapiens/gene_set_analysis/c1.symbols.reduced.gmx" + } + 'cnvkit' { + amplicon_cnr = "https://raw.githubusercontent.com/etal/cnvkit/v0.9.9/test/formats/amplicon.cnr" + amplicon_cns = "https://raw.githubusercontent.com/etal/cnvkit/v0.9.9/test/formats/amplicon.cns" + } + } + 'bacteroides_fragilis' { + 'genome' { + genome_fna_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/genome/genome.fna.gz" + genome_gbff_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/genome/genome.gbff.gz" + genome_paf = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/genome/genome.paf" + genome_gff_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/genome/genome.gff.gz" + + } + 'hamronization' { + genome_abricate_tsv = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/hamronization/genome.abricate.tsv" + genome_mapping_potential_arg = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/hamronization/genome.mapping.potential.ARG" + } + 'illumina' { + test1_contigs_fa_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fasta/test1.contigs.fa.gz" + test1_1_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test1_1.fastq.gz" + test1_2_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test1_2.fastq.gz" + test2_1_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test2_1.fastq.gz" + test2_2_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test2_2.fastq.gz" + test1_paired_end_bam = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test1.bam" + test1_paired_end_sorted_bam = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test1.sorted.bam" + test1_paired_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test1.sorted.bam.bai" + test2_paired_end_bam = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test2.bam" + test2_paired_end_sorted_bam = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test2.sorted.bam" + test2_paired_end_sorted_bam_bai = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/illumina/bam/test2.sorted.bam.bai" + } + 'nanopore' { + test_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/nanopore/fastq/test.fastq.gz" + overlap_paf = "${params.test_data_base}/data/genomics/prokaryotes/bacteroides_fragilis/nanopore/overlap.paf" + } + } + 'candidatus_portiera_aleyrodidarum' { + 'genome' { + genome_fasta = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/genome.fasta" + genome_sizes = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/genome.sizes" + genome_aln_gz = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/genome.aln.gz" + genome_aln_nwk = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/genome.aln.nwk" + proteome_fasta = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/proteome.fasta" + test1_gff = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/gff/test1.gff" + test2_gff = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/gff/test2.gff" + test3_gff = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/genome/gff/test3.gff" + } + 'illumina' { + test_1_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/illumina/fastq/test_1.fastq.gz" + test_2_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/illumina/fastq/test_2.fastq.gz" + test_se_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/illumina/fastq/test_se.fastq.gz" + } + 'nanopore' { + test_fastq_gz = "${params.test_data_base}/data/genomics/prokaryotes/candidatus_portiera_aleyrodidarum/nanopore/fastq/test.fastq.gz" + } + } + 'haemophilus_influenzae' { + 'genome' { + genome_fna_gz = "${params.test_data_base}/data/genomics/prokaryotes/haemophilus_influenzae/genome/genome.fna.gz" + genome_aln_gz = "${params.test_data_base}/data/genomics/prokaryotes/haemophilus_influenzae/genome/genome.aln.gz" + genome_aln_nwk = "${params.test_data_base}/data/genomics/prokaryotes/haemophilus_influenzae/genome/genome.aln.nwk" + } + } + 'generic' { + 'csv' { + test_csv = "${params.test_data_base}/data/generic/csv/test.csv" + } + 'notebooks' { + rmarkdown = "${params.test_data_base}/data/generic/notebooks/rmarkdown/rmarkdown_notebook.Rmd" + ipython_md = "${params.test_data_base}/data/generic/notebooks/jupyter/ipython_notebook.md" + ipython_ipynb = "${params.test_data_base}/data/generic/notebooks/jupyter/ipython_notebook.ipynb" + } + 'tar' { + tar_gz = "${params.test_data_base}/data/generic/tar/hello.tar.gz" + } + 'tsv' { + test_tsv = "${params.test_data_base}/data/generic/tsv/test.tsv" + } + 'txt' { + hello = "${params.test_data_base}/data/generic/txt/hello.txt" + } + 'cooler'{ + test_pairix_pair_gz = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.GM12878-MboI.pairs.subsample.blksrt.txt.gz" + test_pairix_pair_gz_px2 = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.GM12878-MboI.pairs.subsample.blksrt.txt.gz.px2" + test_pairs_pair = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.sample1.pairs" + test_tabix_pair_gz = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.GM12878-MboI.pairs.subsample.sorted.possrt.txt.gz" + test_tabix_pair_gz_tbi = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.GM12878-MboI.pairs.subsample.sorted.possrt.txt.gz.tbi" + hg19_chrom_sizes = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/cload/hg19/hg19.chrom.sizes" + test_merge_cool = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/merge/toy/toy.symm.upper.2.cool" + test_merge_cool_cp2 = "${params.test_data_base}/data/genomics/homo_sapiens/cooler/merge/toy/toy.symm.upper.2.cp2.cool" + + } + 'pairtools' { + mock_4dedup_pairsam = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/mock.4dedup.pairsam" + mock_4flip_pairs = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/mock.4flip.pairs" + mock_chrom_sizes = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/mock.chrom.sizes" + mock_pairsam = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/mock.pairsam" + mock_sam = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/mock.sam" + frag_bed = "${params.test_data_base}/data/genomics/homo_sapiens/pairtools/frag.bed" + } + 'config' { + ncbi_user_settings = "${params.test_data_base}/data/generic/config/ncbi_user_settings.mkfg" + } + 'unsorted_data' { + 'unsorted_text' { + genome_file = "${params.test_data_base}/data/generic/unsorted_data/unsorted_text/test.genome" + intervals = "${params.test_data_base}/data/generic/unsorted_data/unsorted_text/test.bed" + numbers_csv = "${params.test_data_base}/data/generic/unsorted_data/unsorted_text/test.csv" + } + } + } + 'proteomics' { + 'msspectra' { + ups_file1 = "${params.test_data_base}/data/proteomics/msspectra/OVEMB150205_12.raw" + ups_file2 = "${params.test_data_base}/data/proteomics/msspectra/OVEMB150205_14.raw" + } + 'database' { + yeast_ups = "${params.test_data_base}/data/proteomics/database/yeast_UPS.fasta" + } + 'maxquant' { + mq_contrasts = "${params.test_data_base}/data/proteomics/maxquant/MaxQuant_contrasts.csv" + mq_proteingroups = "${params.test_data_base}/data/proteomics/maxquant/MaxQuant_proteinGroups.txt" + mq_samplesheet = "${params.test_data_base}/data/proteomics/maxquant/MaxQuant_samplesheet.tsv" + mq_proteus_mat = "${params.test_data_base}/data/proteomics/maxquant/proteus.raw_MaxQuant_proteingroups_tab.tsv" + } + 'parameter' { + maxquant = "${params.test_data_base}/data/proteomics/parameter/mqpar.xml" + } + 'idfile' { + openms_idxml = "${params.test_data_base}/data/proteomics/openms_idxml/BSA_QC_file.idXML" + } + } + 'galaxea_fascicularis' { + hic { + pretext = "${params.test_data_base}/data/genomics/eukaryotes/galaxea_fascicularis/hic/jaGalFasc40_2.pretext" + } + } + 'deilephila_porcellus' { + 'mito' { + ref_fa = "${params.test_data_base}/data/genomics/eukaryotes/deilephila_porcellus/mito/MW539688.1.fasta" + ref_gb = "${params.test_data_base}/data/genomics/eukaryotes/deilephila_porcellus/mito/MW539688.1.gb" + hifi_reads = "${params.test_data_base}/data/genomics/eukaryotes/deilephila_porcellus/mito/ilDeiPorc1.HiFi.reads.fa" + contigs = "${params.test_data_base}/data/genomics/eukaryotes/deilephila_porcellus/mito/ilDeiPorc1.contigs.fa" + } + } + 'imaging' { + 'h5' { + plant_wga = "${params.test_data_base}/data/imaging/h5/plant_wga.h5" + plant_wga_prob = "${params.test_data_base}/data/imaging/h5/plant_wga_probabilities.h5" + } + 'ilp' { + plant_wga_multicut = "${params.test_data_base}/data/imaging/ilp/plant_wga.multicut.ilp" + plant_wga_pixel_class = "${params.test_data_base}/data/imaging/ilp/plant_wga.pixel_prob.ilp" + } + 'tiff' { + mouse_heart_wga = "${params.test_data_base}/data/imaging/tiff/mindagap.mouse_heart.wga.tiff" + } + 'ome-tiff' { + cycif_tonsil_channels = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-channels.csv" + cycif_tonsil_cycle1 = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-cycle1.ome.tif" + cycif_tonsil_cycle2 = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-cycle2.ome.tif" + cycif_tonsil_cycle3 = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-cycle3.ome.tif" + cycif_tonsil_dfp = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-dfp.ome.tif" + cycif_tonsil_ffp = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-ffp.ome.tif" + } + 'registration' { + markers = "${params.test_data_base}/data/imaging/registration/markers.csv" + cycle1 = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-cycle1.ome.tif" + cycle2 = "${params.test_data_base}/data/imaging/ome-tiff/cycif-tonsil-cycle2.ome.tif" + } + 'segmentation' { + markers = "${params.test_data_base}/data/imaging/segmentation/markers.csv" + image = "${params.test_data_base}/data/imaging/segmentation/cycif_tonsil_registered.ome.tif" + } + 'quantification' { + markers = "${params.test_data_base}/data/imaging/quantification/markers.csv" + image = "${params.test_data_base}/data/imaging/quantification/cycif_tonsil_registered.ome.tif" + mask = "${params.test_data_base}/data/imaging/quantification/cell.ome.tif" + } + 'downstream' { + markers = "${params.test_data_base}/data/imaging/downstream/markers.csv" + cell_feature_array = "${params.test_data_base}/data/imaging/downstream/cycif_tonsil_cell.csv" + } + 'background_subtraction' { + markers = "${params.test_data_base}/data/imaging/background_subtraction/markers.csv" + image = "${params.test_data_base}/data/imaging/background_subtraction/cycif_tonsil_registered.ome.tif" + } + 'core_detection' { + image = "${params.test_data_base}/data/imaging/core_detection/single_core_dapi.tif" + } + } + } +} diff --git a/tests/csv/map.csv b/tests/csv/map.csv new file mode 100644 index 00000000..bf33ce2c --- /dev/null +++ b/tests/csv/map.csv @@ -0,0 +1,3 @@ +chr,map +chr21,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/21/GRCh38_chr21.s.map +chr22,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/reference_genome/22/GRCh38_chr22.s.map diff --git a/tests/csv/panel.csv b/tests/csv/panel.csv new file mode 100644 index 00000000..8a5c58b1 --- /dev/null +++ b/tests/csv/panel.csv @@ -0,0 +1,3 @@ +panel,chr,vcf,index +1000GP.s.norel,chr21,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21/1000GP.chr21.s.norel.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21/1000GP.chr21.s.norel.bcf.csi +1000GP.s.norel,chr22,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/22/1000GP.chr22.s.norel.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/22/1000GP.chr22.s.norel.bcf.csi diff --git a/tests/csv/panel_full.csv b/tests/csv/panel_full.csv new file mode 100644 index 00000000..782b4a78 --- /dev/null +++ b/tests/csv/panel_full.csv @@ -0,0 +1,23 @@ +panel,chr,vcf,index +1000GP.s.norel,chr1,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr2,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr2.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr2.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr3,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr3.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr3.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr4,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr4.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr4.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr5,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr5.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr5.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr6,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr6.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr6.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr7,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr7.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr7.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr8,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr8.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr8.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr9,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr9.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr9.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr10,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr10.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr10.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr11,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr11.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr11.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr12,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr12.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr12.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr13,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr13.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr13.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr14,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr14.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr14.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr15,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr15.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr15.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr16,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr16.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr16.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr17,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr17.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr17.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr18,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr18.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr18.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr19,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr19.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr19.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr20,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr21,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.filtered.shapeit2-duohmm-phased.vcf.gz.tbi +1000GP.s.norel,chr22,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr22.filtered.shapeit2-duohmm-phased.vcf.gz,http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/CCDG_14151_B01_GRM_WGS_2020-08-05_chr22.filtered.shapeit2-duohmm-phased.vcf.gz.tbi diff --git a/tests/csv/posfile.csv b/tests/csv/posfile.csv new file mode 100644 index 00000000..d5a92024 --- /dev/null +++ b/tests/csv/posfile.csv @@ -0,0 +1,2 @@ +chr,file +chr22,"https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/22/chr22_posfile_stitch.txt" diff --git a/tests/csv/region.csv b/tests/csv/region.csv new file mode 100644 index 00000000..7ef04608 --- /dev/null +++ b/tests/csv/region.csv @@ -0,0 +1,3 @@ +chr,start,end +chr21,16570000,16610000 +chr22,16570000,16610000 diff --git a/tests/csv/sample_bam.csv b/tests/csv/sample_bam.csv new file mode 100644 index 00000000..17e3a87e --- /dev/null +++ b/tests/csv/sample_bam.csv @@ -0,0 +1,4 @@ +sample,file,index +NA12878,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bam.bai +NA19401,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bam.bai +NA20359,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.1x.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.1x.bam.bai diff --git a/tests/csv/sample_sim.csv b/tests/csv/sample_sim.csv new file mode 100644 index 00000000..cb6be1c1 --- /dev/null +++ b/tests/csv/sample_sim.csv @@ -0,0 +1,4 @@ +sample,file,index +NA12878,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.bam.bai +NA19401,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.bam.bai +NA20359,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.bam,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.bam.bai diff --git a/tests/csv/sample_sim_full.csv b/tests/csv/sample_sim_full.csv new file mode 100644 index 00000000..592282cc --- /dev/null +++ b/tests/csv/sample_sim_full.csv @@ -0,0 +1,2 @@ +sample,file,index +#TODO find bam not in 1000G panel diff --git a/tests/csv/sample_validate_imputed.csv b/tests/csv/sample_validate_imputed.csv new file mode 100644 index 00000000..3f3da2e2 --- /dev/null +++ b/tests/csv/sample_validate_imputed.csv @@ -0,0 +1,4 @@ +sample,file,index +NA12878,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s_imputed.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s_imputed.bcf.csi +NA19401,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s_imputed.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s_imputed.bcf.csi +NA20359,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s_imputed.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s_imputed.bcf.csi diff --git a/tests/csv/sample_validate_truth.csv b/tests/csv/sample_validate_truth.csv new file mode 100644 index 00000000..828bad9c --- /dev/null +++ b/tests/csv/sample_validate_truth.csv @@ -0,0 +1,4 @@ +sample,file,index +NA12878,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.bcf.csi +NA19401,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.bcf.csi +NA20359,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.bcf.csi diff --git a/tests/csv/sample_vcf.csv b/tests/csv/sample_vcf.csv new file mode 100644 index 00000000..e1e92a6c --- /dev/null +++ b/tests/csv/sample_vcf.csv @@ -0,0 +1,4 @@ +sample,file,index +NA12878,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA12878/NA12878.s.1x.bcf.csi +NA19401,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA19401/NA19401.s.1x.bcf.csi +NA20359,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.1x.bcf,https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/individuals/NA20359/NA20359.s.1x.bcf.csi diff --git a/tests/test_default.yml b/tests/test_default.yml new file mode 100644 index 00000000..d7994f56 --- /dev/null +++ b/tests/test_default.yml @@ -0,0 +1,7 @@ +- name: Run default pipeline + command: nextflow run main.nf -profile test --outdir results --genome GRCh37 + tags: + - default + files: + - path: results/csv/markduplicates.csv + md5sum: 0d6120bb99e92f6810343270711ca53e diff --git a/workflows/phaseimpute.nf b/workflows/phaseimpute.nf deleted file mode 100644 index d6f7bece..00000000 --- a/workflows/phaseimpute.nf +++ /dev/null @@ -1,98 +0,0 @@ -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -include { FASTQC } from '../modules/nf-core/fastqc/main' -include { MULTIQC } from '../modules/nf-core/multiqc/main' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_phaseimpute_pipeline' - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - RUN MAIN WORKFLOW -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ - -workflow PHASEIMPUTE { - - take: - ch_samplesheet // channel: samplesheet read in from --input - - main: - - ch_versions = Channel.empty() - ch_multiqc_files = Channel.empty() - - // - // MODULE: Run FastQC - // - FASTQC ( - ch_samplesheet - ) - ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}) - ch_versions = ch_versions.mix(FASTQC.out.versions.first()) - - // - // Collate and save software versions - // - softwareVersionsToYAML(ch_versions) - .collectFile( - storeDir: "${params.outdir}/pipeline_info", - name: 'nf_core_pipeline_software_mqc_versions.yml', - sort: true, - newLine: true - ).set { ch_collated_versions } - - // - // MODULE: MultiQC - // - ch_multiqc_config = Channel.fromPath( - "$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? - Channel.fromPath(params.multiqc_config, checkIfExists: true) : - Channel.empty() - ch_multiqc_logo = params.multiqc_logo ? - Channel.fromPath(params.multiqc_logo, checkIfExists: true) : - Channel.empty() - - summary_params = paramsSummaryMap( - workflow, parameters_schema: "nextflow_schema.json") - ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - - ch_multiqc_custom_methods_description = params.multiqc_methods_description ? - file(params.multiqc_methods_description, checkIfExists: true) : - file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) - ch_methods_description = Channel.value( - methodsDescriptionText(ch_multiqc_custom_methods_description)) - - ch_multiqc_files = ch_multiqc_files.mix( - ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) - ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) - ch_multiqc_files = ch_multiqc_files.mix( - ch_methods_description.collectFile( - name: 'methods_description_mqc.yaml', - sort: true - ) - ) - - MULTIQC ( - ch_multiqc_files.collect(), - ch_multiqc_config.toList(), - ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() - ) - - emit: - multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [ path(versions.yml) ] -} - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - THE END -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ diff --git a/workflows/phaseimpute/main.nf b/workflows/phaseimpute/main.nf new file mode 100644 index 00000000..ab9cc73d --- /dev/null +++ b/workflows/phaseimpute/main.nf @@ -0,0 +1,354 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +// +// MODULE: Installed directly from nf-core/modules +// +include { MULTIQC } from '../../modules/nf-core/multiqc/main' +include { paramsSummaryMap } from 'plugin/nf-validation' +include { paramsSummaryMultiqc } from '../../subworkflows/nf-core/utils_nfcore_pipeline' +include { softwareVersionsToYAML } from '../../subworkflows/nf-core/utils_nfcore_pipeline' +include { methodsDescriptionText } from '../../subworkflows/local/utils_nfcore_phaseimpute_pipeline' +include { getAllFilesExtension } from '../../subworkflows/local/utils_nfcore_phaseimpute_pipeline' + +// +// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules +// + +// Simulate subworkflows +include { BAM_REGION } from '../../subworkflows/local/bam_region' +include { BAM_DOWNSAMPLE } from '../../subworkflows/local/bam_downsample' + +// Panelprep subworkflows +include { VCF_CHR_CHECK } from '../../subworkflows/local/vcf_chr_check' +include { VCF_NORMALIZE_BCFTOOLS } from '../../subworkflows/local/vcf_normalize_bcftools/vcf_normalize_bcftools' +include { VCF_SITES_EXTRACT_BCFTOOLS } from '../../subworkflows/local/vcf_sites_extract_bcftools' +include { VCF_PHASE_PANEL } from '../../subworkflows/local/vcf_phase_panel' +include { PREPARE_POSFILE_TSV } from '../../subworkflows/local/prepare_input_stitch/prepare_posfile_tsv' + +// GLIMPSE subworkflows +include { VCF_IMPUTE_GLIMPSE as VCF_IMPUTE_GLIMPSE1 } from '../../subworkflows/nf-core/vcf_impute_glimpse' +include { COMPUTE_GL as GL_TRUTH } from '../../subworkflows/local/compute_gl' +include { COMPUTE_GL as GL_INPUT } from '../../subworkflows/local/compute_gl' +include { VCF_CONCATENATE_BCFTOOLS as CONCAT_GLIMPSE1} from '../../subworkflows/local/vcf_concatenate_bcftools' + +// QUILT subworkflows +include { MAKE_CHUNKS } from '../../subworkflows/local/make_chunks/make_chunks' +include { IMPUTE_QUILT } from '../../subworkflows/local/impute_quilt/impute_quilt' +include { VCF_CONCATENATE_BCFTOOLS as CONCAT_QUILT } from '../../subworkflows/local/vcf_concatenate_bcftools' + +// STITCH subworkflows +include { PREPARE_INPUT_STITCH } from '../../subworkflows/local/prepare_input_stitch/prepare_input_stitch' +include { BAM_IMPUTE_STITCH } from '../../subworkflows/local/bam_impute_stitch/bam_impute_stitch' +include { VCF_CONCATENATE_BCFTOOLS as CONCAT_STITCH } from '../../subworkflows/local/vcf_concatenate_bcftools' + +// CONCAT subworkflows +include { VCF_CONCATENATE_BCFTOOLS as CONCAT_TRUTH } from '../../subworkflows/local/vcf_concatenate_bcftools' +include { VCF_CONCATENATE_BCFTOOLS as CONCAT_PANEL } from '../../subworkflows/local/vcf_concatenate_bcftools' + +// Concordance subworkflows +include { VCF_CONCORDANCE_GLIMPSE2 } from '../../subworkflows/local/vcf_concordance_glimpse2' + + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + RUN MAIN WORKFLOW +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/ + +workflow PHASEIMPUTE { + + take: + ch_input_impute // channel: input file [ [id], file, index ] + ch_input_sim // channel: input file [ [id], file, index ] + ch_input_validate // channel: input file [ [id], file, index ] + ch_input_validate_truth // channel: truth file [ [id], file, index ] + ch_fasta // channel: fasta file [ [genome], fasta, fai ] + ch_panel // channel: panel file [ [id, chr], chr, vcf, index ] + ch_region // channel: region to use [ [chr, region], region] + ch_depth // channel: depth select [ [depth], depth ] + ch_map // channel: genetic map [ [chr], map] + ch_posfile // channel: posfile [ [chr], txt] + ch_versions // channel: versions of software used + + main: + + ch_multiqc_files = Channel.empty() + + // + // Simulate data if asked + // + if (params.step.split(',').contains("simulate") || params.step.split(',').contains("all")) { + // Output channel of simulate process + ch_sim_output = Channel.empty() + + // Test if the input are all bam files + getAllFilesExtension(ch_input_sim) + .map{ if (it != "bam") { + error "All input files must be in BAM format to perform simulation" + } } + + // Split the bam into the region specified + BAM_REGION(ch_input_sim, ch_region, ch_fasta) + ch_versions = ch_versions.mix(BAM_REGION.out.versions) + + // Initialize channel to impute + ch_bam_to_impute = Channel.empty() + + if (params.depth) { + // Downsample input to desired depth + BAM_DOWNSAMPLE( + BAM_REGION.out.bam_region, + ch_depth, + ch_fasta + ) + ch_versions = ch_versions.mix(BAM_DOWNSAMPLE.out.versions) + ch_multiqc_files = ch_multiqc_files.mix(BAM_DOWNSAMPLE.out.coverage.map{ [it[1]] }) + ch_input_impute = BAM_DOWNSAMPLE.out.bam_emul + ch_input_validate_truth = BAM_REGION.out.bam_region + } + + if (params.genotype) { + error "Genotype simulation not yet implemented" + } + } + + // + // Prepare panel + // + if (params.step.split(',').contains("panelprep") || params.step.split(',').contains("validate") || params.step.split(',').contains("all")) { + // Check chr prefix and remove if necessary + VCF_CHR_CHECK(ch_panel, ch_fasta) + ch_versions = ch_versions.mix(VCF_CHR_CHECK.out.versions) + + // Normalize indels in panel + VCF_NORMALIZE_BCFTOOLS(VCF_CHR_CHECK.out.vcf, ch_fasta) + ch_versions = ch_versions.mix(VCF_NORMALIZE_BCFTOOLS.out.versions) + + // Extract sites from normalized vcf + VCF_SITES_EXTRACT_BCFTOOLS(VCF_NORMALIZE_BCFTOOLS.out.vcf_tbi) + ch_versions = ch_versions.mix(VCF_SITES_EXTRACT_BCFTOOLS.out.versions) + + // Phase panel + VCF_PHASE_PANEL(VCF_SITES_EXTRACT_BCFTOOLS.out.vcf_tbi, + VCF_SITES_EXTRACT_BCFTOOLS.out.vcf_tbi, + VCF_SITES_EXTRACT_BCFTOOLS.out.panel_sites, + VCF_SITES_EXTRACT_BCFTOOLS.out.panel_tsv) + ch_versions = ch_versions.mix(VCF_PHASE_PANEL.out.versions) + + // Generate channels (to be simplified) + ch_panel_sites_tsv = VCF_PHASE_PANEL.out.panel + .map{ metaPC, norm, n_index, sites, s_index, tsv, t_index, phased, p_index + -> [metaPC, sites, tsv] + } + CONCAT_PANEL(VCF_PHASE_PANEL.out.panel + .map{ metaPC, norm, n_index, sites, s_index, tsv, t_index, phased, p_index + -> [[id:metaPC.panel], sites, s_index] + } + ) + ch_panel_sites = CONCAT_PANEL.out.vcf_tbi_join + ch_versions = ch_versions.mix(CONCAT_PANEL.out.versions) + + ch_panel_phased = VCF_PHASE_PANEL.out.panel + .map{ metaPC, norm, n_index, sites, s_index, tsv, t_index, phased, p_index + -> [metaPC, phased, p_index] + } + // Prepare posfile stitch + PREPARE_POSFILE_TSV(VCF_SITES_EXTRACT_BCFTOOLS.out.panel_sites, ch_fasta) + ch_versions = ch_versions.mix(PREPARE_POSFILE_TSV.out.versions) + + // Create chunks from reference VCF + MAKE_CHUNKS(ch_panel) + ch_versions = ch_versions.mix(MAKE_CHUNKS.out.versions) + } + + if (params.step.split(',').contains("impute") || params.step.split(',').contains("all")) { + // Output channel of input process + ch_impute_output = Channel.empty() + if (params.tools.split(',').contains("glimpse1")) { + println "Impute with Glimpse1" + // Glimpse1 subworkflow + GL_INPUT( // Compute GL for input data once per panel + ch_input_impute, + ch_panel_sites_tsv, + ch_fasta + ) + ch_multiqc_files = ch_multiqc_files.mix(GL_INPUT.out.multiqc_files) + ch_versions = ch_versions.mix(GL_INPUT.out.versions) + + impute_input = GL_INPUT.out.vcf // [metaIPC, vcf, index] + .map {metaIPC, vcf, index -> [metaIPC.subMap("panel", "chr"), metaIPC, vcf, index] } + .combine(ch_panel_phased, by: 0) + .combine(Channel.of([[]])) + .map { metaPC, metaIPC, vcf, index, panel, p_index, sample -> + [metaPC.subMap("chr"), metaIPC, vcf, index, panel, p_index, sample]} + .combine(ch_region + .map {metaCR, region -> [metaCR.subMap("chr"), metaCR, region]}, + by: 0) + .combine(ch_map, by: 0) + .map{ + metaC, metaIPC, vcf, index, panel, p_index, sample, metaCR, region, map + -> [metaIPC+metaCR.subMap("Region"), vcf, index, sample, region, panel, p_index, map] + } //[ metaIPCR, vcf, csi, sample, region, ref, ref_index, map ] + + VCF_IMPUTE_GLIMPSE1(impute_input) + output_glimpse1 = VCF_IMPUTE_GLIMPSE1.out.merged_variants + .combine(VCF_IMPUTE_GLIMPSE1.out.merged_variants_index, by: 0) + .map{ metaIPCR, vcf, csi -> [metaIPCR + [tools: "Glimpse1"], vcf, csi] } + ch_multiqc_files = ch_multiqc_files.mix(VCF_IMPUTE_GLIMPSE1.out.chunk_chr.map{ [it[1]]}) + ch_versions = ch_versions.mix(VCF_IMPUTE_GLIMPSE1.out.versions) + + // Add to output channel + ch_impute_output = ch_impute_output.mix(output_glimpse1) + + // Concatenate by chromosomes + CONCAT_GLIMPSE1(output_glimpse1) + ch_versions = ch_versions.mix(CONCAT_GLIMPSE1.out.versions) + + // Add results to input validate + ch_input_validate = ch_input_validate.mix(CONCAT_GLIMPSE1.out.vcf_tbi_join) + + } + if (params.tools.split(',').contains("glimpse2")) { + error "Glimpse2 not yet implemented" + // Glimpse2 subworkflow + } + + if (params.tools.split(',').contains("stitch")) { + print("Impute with STITCH") + + // Obtain the user's posfile if provided or calculate it from ref panel file + if (params.posfile) { // User supplied posfile + ch_posfile = ch_posfile + } else if (params.panel && params.step.split(',').contains("panelprep")) { // Panelprep posfile + ch_posfile = PREPARE_POSFILE_TSV.out.posfile + } else { + error "No posfile or reference panel preparation was included" + } + // Prepare inputs + PREPARE_INPUT_STITCH(ch_posfile, ch_fasta, ch_input_impute) + ch_versions = ch_versions.mix(PREPARE_INPUT_STITCH.out.versions) + + // Impute with STITCH + BAM_IMPUTE_STITCH ( PREPARE_INPUT_STITCH.out.stitch_parameters, + PREPARE_INPUT_STITCH.out.stitch_samples, + ch_fasta ) + ch_versions = ch_versions.mix(BAM_IMPUTE_STITCH.out.versions) + + // Output channel to concat + ch_impute_output = ch_impute_output.mix(BAM_IMPUTE_STITCH.out.vcf_tbi) + + // Concatenate by chromosomes + CONCAT_STITCH(BAM_IMPUTE_STITCH.out.vcf_tbi) + ch_versions = ch_versions.mix(CONCAT_STITCH.out.versions) + + // Add results to input validate + ch_input_validate = ch_input_validate.mix(CONCAT_STITCH.out.vcf_tbi_join) + + } + + if (params.tools.split(',').contains("quilt")) { + print("Impute with QUILT") + + // Impute BAMs with QUILT + IMPUTE_QUILT(VCF_NORMALIZE_BCFTOOLS.out.hap_legend, ch_input_impute, MAKE_CHUNKS.out.chunks) + ch_versions = ch_versions.mix(IMPUTE_QUILT.out.versions) + + // Add to output channel + ch_impute_output = ch_impute_output.mix(IMPUTE_QUILT.out.vcf_tbi) + + // Concatenate by chromosomes + CONCAT_QUILT(IMPUTE_QUILT.out.vcf_tbi) + ch_versions = ch_versions.mix(CONCAT_QUILT.out.versions) + + // Add results to input validate + ch_input_validate = ch_input_validate.mix(CONCAT_QUILT.out.vcf_tbi_join) + } + } + + if (params.step.split(',').contains("validate") || params.step.split(',').contains("all")) { + ch_truth_vcf = Channel.empty() + // Get extension of input files + truth_ext = getAllFilesExtension(ch_input_validate_truth) + + // Channels for branching + ch_truth = ch_input_validate_truth + .combine(truth_ext) + .branch { + bam: it[3] == 'bam' + vcf: it[3] =~ 'vcf|bcf' + } + + GL_TRUTH( + ch_truth.bam.map { [it[0], it[1], it[2]] }, + ch_panel_sites_tsv, + ch_fasta + ) + ch_multiqc_files = ch_multiqc_files.mix(GL_TRUTH.out.multiqc_files) + ch_versions = ch_versions.mix(GL_TRUTH.out.versions) + + // Mix the original vcf and the computed vcf + ch_truth_vcf = ch_truth.vcf + .map { [it[0], it[1], it[2]] } + .mix(GL_TRUTH.out.vcf) + + // Concatenate by chromosomes + // CONCAT_TRUTH(ch_truth_vcf) + // ch_versions = ch_versions.mix(CONCAT_TRUTH.out.versions) + + // Compute concordance analysis + VCF_CONCORDANCE_GLIMPSE2( + ch_input_validate, + ch_truth_vcf, + ch_panel_sites, + ch_region + ) + ch_multiqc_files = ch_multiqc_files.mix(VCF_CONCORDANCE_GLIMPSE2.out.multiqc_files) + ch_versions = ch_versions.mix(VCF_CONCORDANCE_GLIMPSE2.out.versions) + } + + if (params.step.split(',').contains("refine")) { + error "refine step not yet implemented" + } + + // + // Collate and save software versions + // + softwareVersionsToYAML(ch_versions) + .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_pipeline_software_mqc_versions.yml', sort: true, newLine: true) + .set { ch_collated_versions } + + // + // MODULE: MultiQC + // + ch_multiqc_config = Channel.fromPath("$projectDir/assets/multiqc_config.yml", checkIfExists: true) + ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() + ch_multiqc_logo = params.multiqc_logo ? Channel.fromPath(params.multiqc_logo, checkIfExists: true) : Channel.empty() + summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") + ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) + ch_multiqc_custom_methods_description = params.multiqc_methods_description ? file(params.multiqc_methods_description, checkIfExists: true) : file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) + ch_methods_description = Channel.value(methodsDescriptionText(ch_multiqc_custom_methods_description)) + ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) + ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) + ch_multiqc_files = ch_multiqc_files.mix(ch_methods_description.collectFile(name: 'methods_description_mqc.yaml', sort: false)) + + MULTIQC ( + ch_multiqc_files.collect(), + ch_multiqc_config.toList(), + ch_multiqc_custom_config.toList(), + ch_multiqc_logo.toList() + ) + + emit: + multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html + versions = ch_versions // channel: [ path(versions.yml) ] +} + +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + THE END +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +*/
Process Name \\", + " \\ Software Version
CUSTOM_DUMPSOFTWAREVERSIONSpython3.11.7
yaml5.4.1
TOOL1tool10.11.9
TOOL2tool21.9
WorkflowNextflow
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls
File typeConventional base calls