Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seqkit #59

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#20](https://github.com/nf-core/seqinspector/pull/20) Use tags to generate group reports
- [#13](https://github.com/nf-core/seqinspector/pull/13) Generate reports per run, per project and per lane.
- [#49](https://github.com/nf-core/seqinspector/pull/49) Merge with template 3.0.2.
- [#40](https://github.com/nf-core/seqinspector/pull/59) Seqkit Stats TSV output.
sarahjeeeze marked this conversation as resolved.
Show resolved Hide resolved

### `Fixed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Seqkit](https://doi.org/10.1371/journal.pone.0163962)

> Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962.

sarahjeeeze marked this conversation as resolved.
Show resolved Hide resolved
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
3. Provide Seqkit summary stats ([`SeqkitStats`](https://bioinf.shenwei.me/seqkit/usage/#stats))

## Usage

Expand Down
4 changes: 4 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ process {
ext.args = '--quiet'
}

withName: SEQKIT_STATS {
ext.args = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you not want to publish the results as well in the outdir? Or only MultiQC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is defined here - I guess you would only overwrite it if you dont want your output published in the standard way.

publishDir = [

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh fancy - a central publishDir directive for all modules ?!? That's smart and has totally escaped me...sorry then for the false accusations!

}

withName: 'MULTIQC_GLOBAL' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [FastQC](#fastqc) - Raw read QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [SeqkitStats](#seqkitstats) - Per sample TSV file with summary statistics
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### FastQC
Expand Down Expand Up @@ -59,6 +60,18 @@ nf-core/seqinspector will generate the following MultiQC reports:

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.

### SeqkitStats

<details markdown="1">
<summary>Output files</summary>

- `seqkit/`
MatthiasZepper marked this conversation as resolved.
Show resolved Hide resolved
- `*.tsv`: Per sample TSV file with summary statistics.

</details>

[SeqkitStats](https://bioinf.shenwei.me/seqkit/usage/#stats) it gives general quality metrics about your sequenced reads including average read lengths, GC(%) and n50's. For further reading and documentation see the [Seqkit help pages]([Seqkit help](https://bioinf.shenwei.me/seqkit/)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that SeqkitStats computes some quality metrics, but to my best knowledge it is more useful for FASTA files and genomic assemblies than sequencing reads?

For example, for an Illumina run, an average read length prior to trimming should be known, because it corresponds to the number of cycles. For Nanopore, admittedly, such a statistic is more useful, so you may want to structure the documentation accordingly?

Copy link
Author

@sarahjeeeze sarahjeeeze Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated text a little, should i specifically mention this is more useful for nanopore data? for ilumina etc you do still get n50's etc but agree it is less useful - want me to add a nanopore test? :D

file	format	type	num_seqs	sum_len	min_len	avg_len	max_len	Q1	Q2	Q3	sum_gap	N50	N50_num	Q20(%)	Q30(%)	AvgQual	GC(%)
sample1_R1.fastq.gz	FASTQ	DNA	1377513	411872902	35	299.0	301	300.0	301.0	301.0	0	301	1	99.10	97.54	29.31	38.53
sample1_R2.fastq.gz	FASTQ	DNA	1377513	411840994	35	299.0	301	300.0	301.0	301.0	0	301	1	97.11	93.54	25.78	38.54


### Pipeline information

<details markdown="1">
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
},
"seqkit/stats": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
}
}
},
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/seqkit/stats/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

34 changes: 34 additions & 0 deletions modules/nf-core/seqkit/stats/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/seqkit/stats/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

141 changes: 141 additions & 0 deletions modules/nf-core/seqkit/stats/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading