diff --git a/CHANGELOG.md b/CHANGELOG.md index ebdf5f4a..57bc2391 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -60,6 +60,7 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co - [#135](https://github.com/nf-core/phaseimpute/pull/135) - Impute by batch of 100 individuals by default using `--batch_size` parameter. All individuals BAM files are gathered and VCF are allowed for glimpse1 and glimpse2. Channel preprocessing of stitch is done in stitch subworkflow. Genotype likelihood computation for glimpse1 is now done outside of the subworkflow and merge the resulting vcf with all the samples. New test added to check batch separation. Improve `usage.md` documentation. Add validation to initialisation of the pipeline to ensure compatibility between tools, steps and the files provided by the user. - [#139](https://github.com/nf-core/phaseimpute/pull/139) - Update all nf-core modules - [#146](https://github.com/nf-core/phaseimpute/pull/146) - Remove conda CI check for PR due to nextflow error +- [#144](https://github.com/nf-core/phaseimpute/pull/144) - Documentation updates ### `Fixed` @@ -81,3 +82,4 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co [Maxime U Garcia](https://github.com/maxulysse) [Matias Romero Victorica](https://github.com/mrvictorica) [Nicolas Schcolnicov](https://github.com/nschcolnicov) +[Hemanoel Passarelli](https://github.com/hemanoel) diff --git a/README.md b/README.md index 2301c5c7..70895708 100644 --- a/README.md +++ b/README.md @@ -20,23 +20,18 @@ ## Introduction -**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. Different steps are available, each corresponding to a dedicated mode. +**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. The pipeline is constituted of five main steps: -### Main steps of the pipeline - -The **phaseimpute** pipeline consists of 5 main steps: - -| Metro map | Modes | -| ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| metromap | - **Panel preparation**: Phasing, QC, variant filtering, variant annotation of the reference panel
- **Imputation**: Impute the target dataset on the reference panel
- **Simulate**: Simulation of the target dataset from high quality target data
- **Concordance**: Concordance between the target dataset and a truth dataset | +| Metro map | Modes | +| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| metromap | - **Check chromosomes names**: Validates the presence of the different contigs in all variants and alignment files, ensuring data compatibility for further processing
- **Panel preparation**: Perfoms the phasing, QC, variant filtering, variant annotation of the reference panel
- **Imputation**: Imputes genotypes in the target dataset using the reference panel
- **Simulate**: Generates simulated datasets from high-quality target data for testing and validation purposes.
- **Concordance**: Evaluates the accuracy of imputation by comparing the imputed data against a truth dataset. | ## Usage > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. -The basic usage of this pipeline is to impute a target dataset based on a phased panel. -First, prepare a samplesheet with your input data that looks as follows: +The primary function of this pipeline is to impute a target dataset based on a phased panel. Begin by preparing a samplesheet with your input data, formatted as follows: `samplesheet.csv`: @@ -45,10 +40,9 @@ sample,file,index SAMPLE_1X,/path/to/.,/path/to/. ``` -Each row represents a BAM or CRAM file along with its index file. All input files need to be of the same extension. -For some tools and steps, you will also need to submit a samplesheet with the reference panel. +Each row represents either a bam or a cram file along with its corresponding index file. Ensure that all input files have consistent file extensions. -A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes. +For certain tools and steps within the pipeline, you will also need to provide a samplesheet for the reference panel. Here's an example of what a final samplesheet for a reference panel might look like, covering three chromosomes: ```csv panel,chr,vcf,index @@ -57,7 +51,9 @@ Phase3,2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf. Phase3,3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi ``` -Now, you can run the pipeline using: +## Running the pipeline + +Execute the pipeline with the following command: ```bash nextflow run nf-core/phaseimpute \ @@ -101,6 +97,7 @@ We thank the following people for their extensive assistance in the development - Saul Pierotti - Eugenia Fontecha - Matias Romero Victorica +- Hemanoel Passarelli ## Contributions and Support diff --git a/docs/usage.md b/docs/usage.md index 412fb540..8037a2d3 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,9 +6,11 @@ ## Introduction +The **nf-core/phaseimpute** pipeline is designed to perform genomic phasing and imputation techniques. Some key functionalities include chromosome checking, panel preparation, imputation, simulation, and concordance. + ## Samplesheet input -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use the `--input` parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. ```bash --input '[path to samplesheet file]' @@ -16,7 +18,7 @@ You will need to create a samplesheet with information about the samples you wou ### Structure -The samplesheet can have as many columns as you desire, however, there is a strict requirement for at least 3 columns to match those defined in the table below. +The samplesheet can have as many columns as you desire. However, there is a strict requirement for at least 3 columns to match those defined in the table below. A final samplesheet file may look something like the one below. This is for 6 samples. @@ -40,7 +42,7 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p ## Samplesheet reference panel -You will need to create a samplesheet with information about the reference panel you would like to use. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. +You will need to create a samplesheet with information about the reference panel you would like to use. Use the `--panel` parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. ```bash --panel '[path to samplesheet file]' @@ -68,7 +70,7 @@ An [example samplesheet](../assets/samplesheet_reference.csv) has been provided ## Samplesheet posfile -You will need a samplesheet with information about the reference panel sites for using the `--steps [impute,validate]`. You can generate this samplesheet from `--steps panelprep`. Use this parameter to specify its location. It has to be a comma-separated file with at least 5 columns, and a header row as shown in the examples below. +You will need a samplesheet with information about the reference panel sites for using the `--steps [impute,validate]`. You can generate this samplesheet from `--steps panelprep`. Use the `--posfile` parameter to specify its location. It has to be a comma-separated file with at least 5 columns, and a header row as shown in the examples below. ```bash --posfile '[path to samplesheet file]' @@ -112,7 +114,7 @@ chr21:16609476_A_G 16609476 A G chr21:16609525_T_A 16609525 T A ``` -## Genome reference +## Reference genome Remember to use the same reference genome for all the files. You can specify the [reference genome](https://nf-co.re/docs/usage/reference_genomes) using: @@ -128,12 +130,28 @@ or you can specify a custom genome using: ## Running the pipeline -The typical command for running the pre-processing of the panel and imputation of samples is as follows: +A quick running example only with the imputation step can be performed as follows: + +``` +nextflow run nf-core/phaseimpute \ + --input samplesheet.csv \ + --steps impute \ + --chunks chunks.csv \ + --posfile posfile_legend.csv \ + --outdir results \ + --genome GRCh38 \ + --panel panel.csv \ + --tools glimpse1 \ + -profile docker + +``` + +The typical command for running the pre-processing of the panel and imputation of samples is shown below: ```bash nextflow run nf-core/phaseimpute \ --input samplesheet.csv \ - --steps panelprep,impute + --steps panelprep,impute \ --outdir results \ --genome GRCh37 \ -profile docker @@ -150,21 +168,15 @@ work # Directory containing the nextflow working files # Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` -If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file. - -Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `. +To facilitate multiple runs of the pipeline with consistent settings without specifying each parameter in the command line, you can use a parameter file. This allows for setting parameters once and reusing them across different executions. -:::warning -Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args). -::: - -The above pipeline run specified with a params file in yaml format: +You can provide pipeline settings in a `yaml` or `json` file, which can be specified using the `-params-file` option: ```bash nextflow run nf-core/phaseimpute -profile docker -params-file params.yaml ``` -with: +Example of a `params.yaml` file: ```yaml title="params.yaml" input: './samplesheet.csv' @@ -173,7 +185,11 @@ genome: 'GRCh37' <...> ``` -You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). +:::warning +Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args). +::: + +You can also generate `YAML` or `JSON` files easily using the [nf-core/launch](https://nf-co.re/launch) tool, which guides you creating the files that can be used directly with `-params-file`. ### Check of the contigs name