Skip to content

Commit

Permalink
Documentation updates (nf-core#144)
Browse files Browse the repository at this point in the history
* updates in the README and usage files

* updated README file

* documents after prettier

* lastest update in usage file

* Conda check removed and seed parameters added for QUILT (nf-core#146)

* Remove conda from PR test

* Add seed for stitch and quilt

* Bump version in nf-core yml

* Update changelog

* updates in the README and usage files

* Fix linting

* CHANGELOG prettier

* alteration added to the Changed section

---------

Co-authored-by: Louis LE NEZET <58640615+LouisLeNezet@users.noreply.github.com>
  • Loading branch information
hemanoel and LouisLeNezet authored Oct 29, 2024
1 parent 679f50e commit a790ca1
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 31 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co
- [#135](https://github.com/nf-core/phaseimpute/pull/135) - Impute by batch of 100 individuals by default using `--batch_size` parameter. All individuals BAM files are gathered and VCF are allowed for glimpse1 and glimpse2. Channel preprocessing of stitch is done in stitch subworkflow. Genotype likelihood computation for glimpse1 is now done outside of the subworkflow and merge the resulting vcf with all the samples. New test added to check batch separation. Improve `usage.md` documentation. Add validation to initialisation of the pipeline to ensure compatibility between tools, steps and the files provided by the user.
- [#139](https://github.com/nf-core/phaseimpute/pull/139) - Update all nf-core modules
- [#146](https://github.com/nf-core/phaseimpute/pull/146) - Remove conda CI check for PR due to nextflow error
- [#144](https://github.com/nf-core/phaseimpute/pull/144) - Documentation updates

### `Fixed`

Expand All @@ -81,3 +82,4 @@ Initial release of nf-core/phaseimpute, created with the [nf-core](https://nf-co
[Maxime U Garcia](https://github.com/maxulysse)
[Matias Romero Victorica](https://github.com/mrvictorica)
[Nicolas Schcolnicov](https://github.com/nschcolnicov)
[Hemanoel Passarelli](https://github.com/hemanoel)
25 changes: 11 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,18 @@

## Introduction

**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. Different steps are available, each corresponding to a dedicated mode.
**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. The pipeline is constituted of five main steps:

### Main steps of the pipeline

The **phaseimpute** pipeline consists of 5 main steps:

| Metro map | Modes |
| ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <img src="docs/images/metro/MetroMap.svg" alt="metromap" width="800"/> | - **Panel preparation**: Phasing, QC, variant filtering, variant annotation of the reference panel <br> - **Imputation**: Impute the target dataset on the reference panel <br> - **Simulate**: Simulation of the target dataset from high quality target data <br> - **Concordance**: Concordance between the target dataset and a truth dataset |
| Metro map | Modes |
| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <img src="docs/images/metro/MetroMap.svg" alt="metromap" width="800"/> | - **Check chromosomes names**: Validates the presence of the different contigs in all variants and alignment files, ensuring data compatibility for further processing <br> - **Panel preparation**: Perfoms the phasing, QC, variant filtering, variant annotation of the reference panel <br> - **Imputation**: Imputes genotypes in the target dataset using the reference panel <br> - **Simulate**: Generates simulated datasets from high-quality target data for testing and validation purposes. <br> - **Concordance**: Evaluates the accuracy of imputation by comparing the imputed data against a truth dataset. |

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
The basic usage of this pipeline is to impute a target dataset based on a phased panel.
First, prepare a samplesheet with your input data that looks as follows:
The primary function of this pipeline is to impute a target dataset based on a phased panel. Begin by preparing a samplesheet with your input data, formatted as follows:

`samplesheet.csv`:

Expand All @@ -45,10 +40,9 @@ sample,file,index
SAMPLE_1X,/path/to/.<bam/cram>,/path/to/.<bai,crai>
```

Each row represents a BAM or CRAM file along with its index file. All input files need to be of the same extension.
For some tools and steps, you will also need to submit a samplesheet with the reference panel.
Each row represents either a bam or a cram file along with its corresponding index file. Ensure that all input files have consistent file extensions.

A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes.
For certain tools and steps within the pipeline, you will also need to provide a samplesheet for the reference panel. Here's an example of what a final samplesheet for a reference panel might look like, covering three chromosomes:

```csv
panel,chr,vcf,index
Expand All @@ -57,7 +51,9 @@ Phase3,2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.
Phase3,3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi
```

Now, you can run the pipeline using:
## Running the pipeline

Execute the pipeline with the following command:

```bash
nextflow run nf-core/phaseimpute \
Expand Down Expand Up @@ -101,6 +97,7 @@ We thank the following people for their extensive assistance in the development
- Saul Pierotti
- Eugenia Fontecha
- Matias Romero Victorica
- Hemanoel Passarelli

## Contributions and Support

Expand Down
50 changes: 33 additions & 17 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,19 @@
## Introduction

The **nf-core/phaseimpute** pipeline is designed to perform genomic phasing and imputation techniques. Some key functionalities include chromosome checking, panel preparation, imputation, simulation, and concordance.

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use the `--input` parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.

```bash
--input '[path to samplesheet file]'
```

### Structure

The samplesheet can have as many columns as you desire, however, there is a strict requirement for at least 3 columns to match those defined in the table below.
The samplesheet can have as many columns as you desire. However, there is a strict requirement for at least 3 columns to match those defined in the table below.

A final samplesheet file may look something like the one below. This is for 6 samples.

Expand All @@ -40,7 +42,7 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p

## Samplesheet reference panel

You will need to create a samplesheet with information about the reference panel you would like to use. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
You will need to create a samplesheet with information about the reference panel you would like to use. Use the `--panel` parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.

```bash
--panel '[path to samplesheet file]'
Expand Down Expand Up @@ -68,7 +70,7 @@ An [example samplesheet](../assets/samplesheet_reference.csv) has been provided

## Samplesheet posfile

You will need a samplesheet with information about the reference panel sites for using the `--steps [impute,validate]`. You can generate this samplesheet from `--steps panelprep`. Use this parameter to specify its location. It has to be a comma-separated file with at least 5 columns, and a header row as shown in the examples below.
You will need a samplesheet with information about the reference panel sites for using the `--steps [impute,validate]`. You can generate this samplesheet from `--steps panelprep`. Use the `--posfile` parameter to specify its location. It has to be a comma-separated file with at least 5 columns, and a header row as shown in the examples below.

```bash
--posfile '[path to samplesheet file]'
Expand Down Expand Up @@ -112,7 +114,7 @@ chr21:16609476_A_G 16609476 A G
chr21:16609525_T_A 16609525 T A
```

## Genome reference
## Reference genome

Remember to use the same reference genome for all the files. You can specify the [reference genome](https://nf-co.re/docs/usage/reference_genomes) using:

Expand All @@ -128,12 +130,28 @@ or you can specify a custom genome using:

## Running the pipeline

The typical command for running the pre-processing of the panel and imputation of samples is as follows:
A quick running example only with the imputation step can be performed as follows:

```
nextflow run nf-core/phaseimpute \
--input samplesheet.csv \
--steps impute \
--chunks chunks.csv \
--posfile posfile_legend.csv \
--outdir results \
--genome GRCh38 \
--panel panel.csv \
--tools glimpse1 \
-profile docker
```

The typical command for running the pre-processing of the panel and imputation of samples is shown below:

```bash
nextflow run nf-core/phaseimpute \
--input samplesheet.csv \
--steps panelprep,impute
--steps panelprep,impute \
--outdir results \
--genome GRCh37 \
-profile docker
Expand All @@ -150,21 +168,15 @@ work # Directory containing the nextflow working files
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```

If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file.

Pipeline settings can be provided in a `yaml` or `json` file via `-params-file <file>`.
To facilitate multiple runs of the pipeline with consistent settings without specifying each parameter in the command line, you can use a parameter file. This allows for setting parameters once and reusing them across different executions.

:::warning
Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
:::

The above pipeline run specified with a params file in yaml format:
You can provide pipeline settings in a `yaml` or `json` file, which can be specified using the `-params-file` option:

```bash
nextflow run nf-core/phaseimpute -profile docker -params-file params.yaml
```

with:
Example of a `params.yaml` file:

```yaml title="params.yaml"
input: './samplesheet.csv'
Expand All @@ -173,7 +185,11 @@ genome: 'GRCh37'
<...>
```

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
:::warning
Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
:::

You can also generate `YAML` or `JSON` files easily using the [nf-core/launch](https://nf-co.re/launch) tool, which guides you creating the files that can be used directly with `-params-file`.

### Check of the contigs name

Expand Down

0 comments on commit a790ca1

Please sign in to comment.