Skip to content
Weiheng Liao edited this page Dec 3, 2022 · 4 revisions

Q&A

Welcome to the Q&A page, I hope this solves your problem!


How to check the connectivity of Patpat with external databases

1. Testing the connectivity of individual databases

Try using the patpat.checker module. Take the PRIDE database, for example, using the patpat.checker.PrideChecker class.

from patpat import checker

c = checker.PrideChecker()
m = c.check()

Returns patpat.mapper.PrideMapper() if the Patpat connection to the PRIDE database is working. Otherwise, it returns None.

2. Test the connectivity of all the databases supported by Patpat

The CheckHub class in the hub module integrates the Patpat-supported checker.<databse>Checker.

from patpat import hub

c = hub.CheckerHub()
t = c.check()

As above, hub.CheckerHub().check() will return a list of Mapper databases connected properly to Patpat.

It is easy to see that the class Checker design also follows the factory pattern.


What exactly does Patpat output and How should I view it

Patpat, as a dataset search framework, outputs dataset metadata. At the end of Quick Start - Search for datasets via MapperHub We mentioned that Patpat search results are stored in patpat_envs/result/<task_uuid>, but the description is not very detailed, so apologies to our users here!

The following is the structure of Patpat's output file:

patpat_env/
    |-- result/
        |-- <task_uuid>
            |-- result.tsv
            |-- result.json
            

As you can see, Patpat is in .json and .tsv formats as the output. You will certainly not be satisfied with this answer, and I will go into more detail about what information these two files contain.

1. result.tsv

Each line in result.tsv contains information about the three results of an project: the title, the summary and the Website. users can access the project page in the public database directly via the URL.

.tsv is a text format file that stores data in a tabular structure, see WiKi for details. Support is provided by most table editing software, such as Microsoft Excel.

2. result.json

.json is an "attribute-value" structured file that can be easily stringified. See WiKi for details.

.json is an "attribute-value" structured file that can be easily stringified. See WiKi for details.

A bit abstract! Let us look at an example.

The "<Database name>" in result.json is determined by the Mapper class selected by the user. In other words, the search for several databases will result in several database results, intuitive, right? In addition, the properties are determined by the database, and the information is not guaranteed by Patpat. However, there are a few attributes that Patpat constructs:

  • summary:Summary of the project
  • website:Website of the project
  • protein:Mapping of Protein-level via public database
  • peptides:Mapping of Peptides-level via public database

The content of result.json is structured as follows:

  {
  "<Database name>": 
      {"<Project id>":
          {"attr1": "value1",
           "attr2": "value2",
           ...
           "attrN": "valueN",
          }
       "<Project id>":
          {
          ...
          }
       ...
      }
  "<Database name>":
      {"<Project id>":
       ...
      }
  ...
  },

A bit abstract! Let us look at an example.

  {"iProX": 
      {"PXD006512":
          {"title": "Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma".
           "summary": "Hepatocellular carcinoma (HCC) accounts for approximately 90% of primary liver cancers, the...
           ...
          }
        ...
      }
   ...
  }

A bit dizzy! How should I use result.json? Patpat provides built-in functions.

import os
from patpat import utility

os.chdir(os.path.dirname('~/patpat_envs'))  # choose the run directory as the parent directory of Patpat_envs

uuid = '<uuid>'
t = utility.get_result_from_file(task=uuid)

print(t)

Clone this wiki locally