tags |
---|
ggg, ggg2021, ggg201b |
Contents:
[toc]
I will use the data in: https://github.com/ctb/2022-ggg-201b-assembly-collapse
Final notebook from class: here
I cheated and made a Snakefile that did coverage directly :) -
rule all:
input:
"SRR2584857_quast",
"SRR2584857_annot",
expand("SRR2584857.{cover}C_quast", cover=range(1, 41)),
expand("SRR2584857.{cover}C_annot", cover=range(1, 41))
rule subset_wc:
input:
r1 = "{sample}_1.fastq.gz",
r2 = "{sample}_2.fastq.gz",
output:
r1 = "{sample}.{cover}C_1.fastq.gz",
r2 = "{sample}.{cover}C_2.fastq.gz",
params:
n_lines = lambda w: int(float(w.cover) * 4.5e6 / 100 / 2) * 4
shell: """
gunzip -c {input.r1} | head -{params.n_lines} | gzip > {output.r1} ||
true
gunzip -c {input.r2} | head -{params.n_lines} | gzip > {output.r2} ||
true
"""
Amanda gave a great introduction to the statistics last week, and I wanted to take over where she left off with the binder.
Start by running the RNASeq binder:
Reminder, to run the analysis --
-
start the binder
-
at the command line, run
snakemake -j 4 --use-conda
-
open
rnaseq-workflow.Rmd
-
knit to HTML
(may cover in lab 10)