Skip to content

Build search config via QueryHub

Weiheng Liao edited this page Oct 25, 2022 · 3 revisions

Build search config via QueryHub

Patpat adopts a two-dimensional (protein-level and peptides-level) strategy for searching public databases. Before searching, Patpat needs to determine the search config, i.e. identify the protein and peptide sequences that should be searched.

1 Quick build config

Please see the project README.

q = hub.QueryHub()
q.identifier = '<UniProt identifier>'
q.simple_query()

conf_ = q.get_query_config()

2 A little harder: Pluggable QueryHub

Let's get started!

2.1 Protein-level

First, QueryHub is instantiated, and QueryHub's protein_querier is set to UniProtProteinQuerier(). This class fetches relevant metadata from the UniProt database via protein identifiers. By the way, the classes associated with the querier are stored in the querier module.

Each querier is given parameters via the set_params() interface. Here, we give in the identifier and call protein_query() to perform the query.

q = hub.QueryHub()

q.protein_querier = querier.UniProtProteinQuerier()
q.protein_querier.set_params(accession='<UniProt identifier>')
q.protein_query()

2.2 Peptides-level

We want to identify the peptides suitable for the search at this stage. Again, we define peptide_querier as LocalPeptideQuerier(), a class that queries for peptides that fit the requirements based on the local FASTA file. Generally, we refer to a conformant as the target protein's Unique peptide (or Proteotypic peptide).

q.peptide_querier = querier.LocalPeptideQuerier()

To identify specific peptides in the peptide set that reflect the target protein, we use an In-Silico enzymatic method in which the proteomic sequence and target protein to process with the same parameters.

Several LocalPeptideQuerier() parameters that can be set are described below:

  • sequence: Sequence information for the target protein. If a protein-level query has been performed, sequence information can be obtained by calling QueryHub.fasta['sequence'].
  • organism: Organism information for the target protein. If a protein-level query has been performed, organism information can be obtained by calling QueryHub.organism.
  • digestion_params: Optional, digestion parameters. The in-Silico enzymatic of Patpat is supported by the pyteomics package.
  • source: Optional, a local FASTA file path can be selected automatically based on organism information.

In this example, the digestion parameters and the local FASTA path are used by default selection, which can be determined by the user if required. Please refer to the relevant documentation for the enzymatic rules. After giving in the parameters, call peptide_query() to start the query.

digestion_params_ = {
    'rules': 'trypsin',
    'miss': 1,
    'min_length': 7,
    'max_length': 35}
source_ = None

q.peptide_querier = querier.LocalPeptideQuerier()
q.peptide_querier.set_params(sequence=q.fasta['sequence'],
                             organism=q.organism,
                             digestion_params=digestion_params_,
                             source=source_)
q.peptide_query()

2.3 Get the search config

conf_ = q.get_query_config()

Got the search config! Come on, it's time for the search part! Search for datasets via MapperHub

Clone this wiki locally