Skip to content
Rabia Fidan edited this page Feb 4, 2022 · 3 revisions
  • Create an example genomes.fasta from public sequences

  • Deal with alts - produce all combinations of alts and at a relatively lower proportion (1/2^n)

  • Tune bowtie2 so that amplicon dropout occurs in line with experiments

  • Tune bowtie2 to allow ACGT -> N substitutions in primer sites, at the moment only single N insertions are allowed

  • What to do with genome ends (amplicon 1 and 98)? ATM they are dropped fairly consistently because the leftmost and rightmost primer sites don't exist on most genomes that we look at except for the Wuhan reference.

  • Simulate other PCR products (how? Chimeras -> Simera + Point mutations -> with a script? /ignore Chimeras for now.)

  • Test the 'exact' amplicon distribution method (Nicola's method) make sure it works as intended - this is already implemented but needs testing. Additionally, on this point, Nicola asked for multinomial sampling from each set of reads and also for the ability to read from a SAM or BAM file, this is currently not supported.

  • Cruddiness parameter should be able to be different for each genome, and a place for that information is in the tsv file containing the genome abundances. If there's a command-line value, use that for all the genomes, if there isn't then look in the genome abundances, otherwise use a default.

  • For the loop of reads, each time it's on the outer loop (of genomes) keep track.

Clone this wiki locally