Skip to content
/ sequtils Public

Collection of Classes and functions for working with biological sequences

License

Notifications You must be signed in to change notification settings

jancr/sequtils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sequtils Tutorial

Collection of Classes and functions for working with biological sequences

Overview

There are two Public classes

  1. SequencePoint, useful for emulating Mutations, SNPs, PTM's etc., it's two most important attributes are:

    • SequencePoint.pos, the human readable number, counting from 1
    • SequencePoint.index, the python readable number counting from 0
  2. SequenedRange, useful for emulating Proteins, domains, secondary structure etc.

    • Its 3 most important attributes are:

      • SequenceRange.start is a SequencePoint pointing to the first amino acid
      • SequenceRange.stop is a SequencePoint pointing to the last amino acid
      • SequenceRange.slice[start, stop]: The python slice object, to index strings
    • It also has the following two properties for easy conversion to tuple

      • SequencePoint.pos.[start, stop]: tuple containing (self.start.pos, self.stop.pos)
      • SequencePoint.index.[start, stop]: tuple containing (self.start.index, self.stop.index)

Example Usage

Example code, lets make glucagon

>>> from sequtils import SequenceRange, SequencePoint
>>> glucagon_sequence = ("MKTIYFVAGLLIMLVQGSWQHALQDTEENPRSFPASQTEAHEDPDEMNEDKRHSQGTFTS"
...                      "DYSKYLDSRRAQDFVQWLMNTKRNRNNIAKRHDEFERHAEGTFTSDVSSYLEGQAAKEFI"
...                      "AWLVKGRGRRDFPEEVAIAEELGRRHADGSFSDEMSTILDNLATRDFINWLIQTKITDKK")
>>> glucagon = SequenceRange(1, seq=glucagon_sequence)
>>> glucagon
SequenceRange(1, 180, seq="MKTIY..ITDKK")

So we now have a protein object, where the stop was inferred from the sequence, glp1 is a peptide

>>> glp1 = SequenceRange(98, 127, full_sequence=glucagon_sequence)
>>> glp1
SequenceRange(98, 127, seq="HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR")

A SequenceRange from 98 to 127 is created, with the peptide sequence inferred from the protein sequence

Lets see the start and stop attributes of the peptide:

>>> glp1.start
SequencePoint(98)

>>> glucagon_sequence[glp1.start.index] == glp1.seq[0]
True

>>> glp1.stop
SequencePoint(127)

>>> glucagon_sequence[glp1.stop.index] == glp1.seq[-1]
True

Lets try to use the slice object to cut the peptide sequence out of the protein

>>> glp1.slice
slice(97, 127, None)

>>> glucagon_sequence[glp1.slice]
'HAEGTFTSDVSSYLEGQAAKEFIAWLVKGR'

>>> glp1.seq == glucagon.seq[glp1.slice]
True

GLP-1 is famous for having a canonical G[KR][KR] motif, this motif is the 3 N-terminal flaking amino acids, let's find it

>>> motif = SequenceRange(1 + glp1.stop.pos, 3 + glp1.stop.pos)
>>> glucagon.seq[motif.slice]
'GRR'

Math Examples

The objects also supports math... So lets try to do the above with math, but first an explanation.

All math on these objects are performed based on the Indexes, thus

>>> SequencePoint(1) + SequencePoint(1)
SequencePoint(1)

>>> SequenceRange(1, 1) + SequenceRange(1, 1)
SequenceRange(1, 1, seq=None)

Because SequencePoint(1).index is 0 and 0 + 0 = 0

The above code is equivalent to the following:

>>> SequencePoint.from_index((SequencePoint(1).index + SequencePoint(1).index))
SequencePoint(1)

The math is super intuitive for scalars

>>> SequenceRange(2, 5) + 2
SequenceRange(4, 7, seq=None)

>>> SequenceRange(2, 5, seq="EVIL") + 2
SequenceRange(4, 7, seq="EVIL")

It also works for non scalars, but then seq becomes None because the length has changed

>>> SequenceRange(2, 5, seq="EVIL") + SequenceRange(3, 6)
SequenceRange(4, 10, seq=None)

If you add numbers or tuples, the code will assume that those are indexes, thus the following 3 all gives the GRR motif by moving glp1.stop by (1, 3)

Create new object moving glp1.stop

>>> SequenceRange(glp1.stop + 1, glp1.stop + 3)
SequenceRange(128, 130, seq=None)

Create new object via math, here we perform SequenceRange + SequencePoint

>>> glp1.stop + SequenceRange.from_index(1, 3)
SequenceRange(128, 130, seq=None)

>>> glp1.stop + SequenceRange(2, 4)
SequenceRange(128, 130, seq=None)

Convert SequencePoint to SequenceRange and then add an offset tuple, note that SequencePoint only knows 'scalar' math, so we have to ether convert it to a SequenceRange as here, or convert the (1, 3) tuple to a SequnceRange as we did above

>>> SequenceRange(glp1.stop) + (1, 3)
SequenceRange(128, 130, seq=None)

About

Collection of Classes and functions for working with biological sequences

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages