Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC pipeline with script only #103

Open
khoahoc0508 opened this issue Nov 7, 2022 · 4 comments
Open

QC pipeline with script only #103

khoahoc0508 opened this issue Nov 7, 2022 · 4 comments

Comments

@khoahoc0508
Copy link

Hello,
I want to update the new version, but I can not see how to QC when running only the script without tracking the database. I want to use this as an older version (FastQC and Samtools QC). Please give me a guide so I can QC my data before analysis.

Sincerely,
Trung

@martinghunt
Copy link
Member

The QC pipeline just runs FASTQC and samtools stats/plot-bamstats. You could run these commands, but they are nothing special, just wrappers around those programs:

clockwork samtools_qc reference.fasta reads.1.fastq reads.2.fastq output_dir

clockwork fastqc outdir reads.1.fastq reads.2.fastq

@khoahoc0508
Copy link
Author

khoahoc0508 commented Nov 13, 2022

Thank you, @martinghunt; it works flawlessly. Now I can entirely switch new version.
Anyway, could you advise on minimum quality requirements for input pair-end files? I am still confused about this.

Sincerely,
Trung

@martinghunt
Copy link
Member

How you decide a sample is bad and remove it is up to you :) There's no set method of doing so and it depends on what analysis you're doing.

You could remove samples up front, eg if (making up example numbers) <90% of the genome has coverage >20X. Or if a low % of reads map or the reads are low quality (eg error rate from samtools).

You could remove samples after variant calling, eg for TB if a sample has >10k variants, or if it has a lot of "heterozygous" calls (both those things suggest contamination).

@khoahoc0508
Copy link
Author

Thanks very much, @martinghunt. These recommendations are helpful for me. I already used clockwork when it was a part of sp3 platform developed by Oxford University, but now this platform is going down, so I follow step by step their workflows, but something I can not handle.

Sincerely,
Trung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants