Skip to content

Commit

Permalink
docs updates for mutagenize
Browse files Browse the repository at this point in the history
  • Loading branch information
nickzoic committed Nov 2, 2023
1 parent f3a60d5 commit f172db2
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
1 change: 1 addition & 0 deletions countess/plugins/fastq.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class LoadFastqPlugin(PandasInputFilesPlugin):

name = "FASTQ Load"
description = "Loads counts from FASTQ files containing either variant or barcodes"
link = "https://countess-project.github.io/CountESS/included-plugins/#fastq-load"
version = VERSION

file_types = [("FASTQ", ".fastq"), ("FASTQ (gzipped)", ".gz")]
Expand Down
45 changes: 43 additions & 2 deletions docs/included-plugins/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Source code for the included plugins can be found in the repository at

CountESS is not limited to these included plugins: anyone can write and publish CountESS plugins. See [Other Plugins](../other-plugins/) for some examples or [Writing CountESS Plugins](../writing-plugins/) to write your own.

## Built-in Plugins: File Formats
## File Formats

### CSV Reader

Expand Down Expand Up @@ -71,7 +71,7 @@ skip
output
: for each regex group, what colum name and data type should it be stored as.

## Built-in Plugins: Data Manipulation
## Data Manipulation

### Regex Tool

Expand Down Expand Up @@ -108,6 +108,47 @@ when pivoted with index on Variant, pivot on Replicate and expanding Count becom
| `2` | `6` | `0` | `8` |
| `3` | `0` | `0` | `19` |

## Bioinformatics

### FASTQ Load

Read one (or more) [FASTQ](https://maq.sourceforge.net/fastq.shtml) files, optionally gzipped.
Returns a datatable with columns:

sequence
: the sequence, as a DNA or RNA string of A, C, G, T.

header
: the header string from the FASTQ read (the line starting with `@`)

A minimum average quality filter can be applied.

If "Group by sequence?" is selected, an additional column "count" is added and the sequences are grouped. The header is reduced to the common prefix of all headers in the FASTQ file.

### Mutagenize

Takes a sequence in configuration and returns all possible mutations of that sequence of various types. Only one kind is applied at a time.

#### Parameters

All Single Mutations?
: Include all SNVs of the sequence

All Single Deletes?
: Include all single base deletions of the sequence

All Triple Deletes?
: Include all triple base deletions, eg: aligned deletions, of the sequence

All Single Inserts?
: Include all single base insertions of the sequence

All Triple Inserts?
: Include all triple base insertions, eg: aligned insertions, of the sequence

Remove Duplicates?
: Deletions and Insertions into the sequence can end up with the same sequence from multiple operations, for example inserting `A` into `CAAT` can produce `CAAAT` in three different ways, and single deletions can produce `CAT` in two different ways. If "Remove Dupliates" is selected, duplicates are removed but an additional column "count" is added to indicate how many ways this sequence can have been produced.

### Variant Caller

Turns a DNA sequence into an HGVS variant code by comparing it to a reference sequence.
Expand Down

0 comments on commit f172db2

Please sign in to comment.