Skip to content

Commit

Permalink
Update usage.md
Browse files Browse the repository at this point in the history
edited usage doc
  • Loading branch information
johnoooh authored Jul 31, 2024
1 parent ca817ab commit 2d482fe
Showing 1 changed file with 13 additions and 41 deletions.
54 changes: 13 additions & 41 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,45 +8,25 @@

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a tsb file with 4 columns, and a header row as shown in the examples below.

```bash
--input '[path to samplesheet file]'
```

### Multiple runs of the same sample

The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
sample maf facets_hisens_cncf hla_file
tumor_normal temp_test_somatic_unfiltered.maf facets_hisens.cncf.txt winners.hla.txt
```

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
```

| Column | Description |
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `sample` | Custom sample name. |
| `maf` | The path to a maf output by the TEMPO pipeline. |
| `facets_hisens_cncf` | The path to the hisens cncf file output by facets. |
| `hla_file` A hla file from polysolver | |

An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

Expand All @@ -55,10 +35,13 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
The typical command for running the pipeline is as follows:

```bash
nextflow run mskcc/neoantigenpipeline --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker
nextflow run mskcc/neoantigenpipeline \
-profile prod,<docker/singularity> \
--input samplesheet.csv \
--outdir <OUTDIR>
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
This will launch the pipeline with the prod profile and either docker or singularity.

Note that the pipeline will create the following files in your working directory:

Expand Down Expand Up @@ -88,7 +71,6 @@ with `params.yaml` containing:
```yaml
input: './samplesheet.csv'
outdir: './results/'
genome: 'GRCh37'
<...>
```

Expand Down Expand Up @@ -146,17 +128,7 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- A generic configuration profile to be used with [Docker](https://docker.com/)
- `singularity`
- A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
- `podman`
- A generic configuration profile to be used with [Podman](https://podman.io/)
- `shifter`
- A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)
- `charliecloud`
- A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)
- `apptainer`
- A generic configuration profile to be used with [Apptainer](https://apptainer.org/)
- `conda`
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.

-
### `-resume`

Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html).
Expand Down

0 comments on commit 2d482fe

Please sign in to comment.