-
Notifications
You must be signed in to change notification settings - Fork 2
Build search config via QueryHub
Patpat adopts a two-dimensional (protein-level and peptides-level) strategy for searching public databases. Before searching, Patpat needs to determine the search config, i.e. identify the protein and peptide sequences that should be searched.
Please see the project README.
q = hub.QueryHub()
q.identifier = '<UniProt identifier>'
q.simple_query()
conf_ = q.get_query_config()
Let's get started!
First, QueryHub
is instantiated, and QueryHub's protein_querier
is set to UniProtProteinQuerier()
.
This class fetches relevant metadata from the UniProt database via protein identifiers.
By the way, the classes associated with the querier are stored in the querier
module.
Each querier is given parameters via the set_params()
interface.
Here, we give in the identifier and call protein_query()
to perform the query.
q = hub.QueryHub()
q.protein_querier = querier.UniProtProteinQuerier()
q.protein_querier.set_params(accession='<UniProt identifier>')
q.protein_query()
We want to identify the peptides suitable for the search at this stage.
Again, we define peptide_querier
as LocalPeptideQuerier()
,
a class that queries for peptides that fit the requirements based on the local FASTA file.
Generally, we refer to a conformant as the target protein's Unique peptide (or Proteotypic peptide).
q.peptide_querier = querier.LocalPeptideQuerier()
To identify specific peptides in the peptide set that reflect the target protein, we use an In-Silico enzymatic method in which the proteomic sequence and target protein to process with the same parameters.
Several LocalPeptideQuerier()
parameters that can be set are described below:
-
sequence
: Sequence information for the target protein. If a protein-level query has been performed, sequence information can be obtained by callingQueryHub.fasta['sequence']
. -
organism
: Organism information for the target protein. If a protein-level query has been performed, organism information can be obtained by callingQueryHub.organism
. -
digestion_params
: Optional, digestion parameters. The in-Silico enzymatic of Patpat is supported by thepyteomics
package. -
source
: Optional, a local FASTA file path can be selected automatically based on organism information.
In this example, the digestion parameters and the local FASTA path are used by default selection,
which can be determined by the user if required.
Please refer to the relevant
documentation for the enzymatic rules.
After giving in the parameters, call peptide_query()
to start the query.
digestion_params_ = {
'rules': 'trypsin',
'miss': 1,
'min_length': 7,
'max_length': 35}
source_ = None
q.peptide_querier = querier.LocalPeptideQuerier()
q.peptide_querier.set_params(sequence=q.fasta['sequence'],
organism=q.organism,
digestion_params=digestion_params_,
source=source_)
q.peptide_query()
conf_ = q.get_query_config()
Got the search config! Come on, it's time for the search part! Search for datasets via MapperHub