Skip to content
This repository has been archived by the owner on Nov 5, 2021. It is now read-only.

Python scripts for data file parsing and bulk shell script analyses for downstream data processing tools

Notifications You must be signed in to change notification settings

matt-sd-watson/python_parsers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python_parsers

Python scripts for parsing various data files for extraction of relevant biological information

parse_PDB accepts a Protein Data Bank file format (such as 5kk.pdb), and parses for the distribution of atomic coordinates between hydrophilic and hydrophobic amino acids.

parse_chromosome accepts a .txt coding sequence or FASTA file (such as drosophila_2L.txt), and retrieves all coding sequences with the corresponding genes names using matched indices. Parsing of a specific helix turn helix motif in the nucleotide sequences returns a list of candidate genes containing this motif, which can be used in a BioMart search for gene ontology.

adaptor_parse accepts a file folder containing any fastq files held in within the sub-directories of the main input directory. It parses these files to retrieve their multiplexing adaptors for creating bulk scripts to execute cutadapt for adaptor trimming prior to alignment.

kallisto_quant accepts two .txt files for input: one containing a list of partial filenames for rna-seq libraries, and another containing the list of absolute/full paths for the same libraries. it generates a shell script that will quantify all of the libraries using kallisto quant in the single read format.

dna_conversion accepts a nucleotide coding file and returns all open reading frames with the amino acid equivalent codons.

About

Python scripts for data file parsing and bulk shell script analyses for downstream data processing tools

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages