-
Notifications
You must be signed in to change notification settings - Fork 2
Welcome to the Q&A page, I hope this solves your problem!
Try using the patpat.checker
module. Take the PRIDE database, for example,
using the patpat.checker.PrideChecker
class.
from patpat import checker
c = checker.PrideChecker()
m = c.check()
Returns patpat.mapper.PrideMapper()
if the Patpat connection to the PRIDE database is working. Otherwise,
it returns None
.
The CheckHub
class in the hub
module integrates the Patpat-supported checker.<databse>Checker
.
from patpat import hub
c = hub.CheckerHub()
t = c.check()
As above, hub.CheckerHub().check()
will return a list of Mapper
databases connected properly to Patpat.
It is easy to see that the class Checker
design also follows the factory pattern.
Patpat, as a dataset search framework, outputs dataset metadata.
At the end of Quick Start - Search for datasets via
MapperHub
We mentioned that Patpat search results are stored in patpat_envs/result/<task_uuid>
,
but the description is not very detailed, so apologies to our users here!
The following is the structure of Patpat's output file:
patpat_env/
|-- result/
|-- <task_uuid>
|-- result.tsv
|-- result.json
As you can see, Patpat is in .json
and .tsv
formats as the output.
You will certainly not be satisfied with this answer, and I will go into more detail about what information these two
files contain.
Each line in result.tsv
contains information about the three results of an project:
the title, the summary and the Website. users can access the project page in the public database directly via the URL.
.tsv
is a text format file that stores data in a tabular structure,
see WiKi for details.
Support is provided by most table editing software, such as Microsoft Excel.
.json
is an "attribute-value" structured file that can be easily stringified.
See WiKi for details.
.json
is an "attribute-value" structured file that can be easily stringified.
See WiKi for details.
A bit abstract! Let us look at an example.
The "<Database name>"
in result.json
is determined by the Mapper
class selected by the user.
In other words, the search for several databases will result in several database results, intuitive, right?
In addition, the properties are determined by the database, and the information is not guaranteed by Patpat.
However, there are a few attributes that Patpat constructs:
-
summary
:Summary of the project -
website
:Website of the project -
protein
:Mapping of Protein-level via public database -
peptides
:Mapping of Peptides-level via public database
The content of result.json
is structured as follows:
{
"<Database name>":
{"<Project id>":
{"attr1": "value1",
"attr2": "value2",
...
"attrN": "valueN",
}
"<Project id>":
{
...
}
...
}
"<Database name>":
{"<Project id>":
...
}
...
},
A bit abstract! Let us look at an example.
{"iProX":
{"PXD006512":
{"title": "Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma".
"summary": "Hepatocellular carcinoma (HCC) accounts for approximately 90% of primary liver cancers, the...
...
}
...
}
...
}
A bit dizzy! How should I use result.json
? Patpat provides built-in functions.
import os
from patpat import utility
os.chdir(os.path.dirname('~/patpat_envs')) # choose the run directory as the parent directory of Patpat_envs
uuid = '<uuid>'
t = utility.get_result_from_file(task=uuid)
print(t)