title | redirect_from |
---|---|
The WikiPathways Semantic Web Portal |
/index.php/Portal:Semantic_Web |
This portal describes the Semantic Web features of the WikiPathways databases, such as the Resource Description Framework (RDF) translation, the ontology, and the new nanopublications.
The WikiPathways RDF is provided as part of the monthly releases and contains the Curated and Reactome pathways. The RDF is split in two parts, the GPMLRDF part which contains a direct translation of the content in the GPML files, and a WPRDF part which contains the biology represented in the GPML
The WikiPathways vocabularies are for the semantic information about the pathway, data nodes, and interactions and the GPML vocabulary is for the graphical information about how the pathway diagram is laid out and represented.
If you use the RDF, vocabularies, or nanopublication, please cite the following paper:
Waagmeester, A., Kutmon, M., Riutta, A., Miller, R., Willighagen, E. L., Evelo, C. T., Pico, A. R.,
Jun. 2016. Using the semantic web for rapid integration of WikiPathways with other biological
online data resources. PLoS Comput Biol 12 (6), e1004989+. doi:10.1371/journal.pcbi.1004989 .
For the pathway content, please follow these How to cite WikiPathways instructions.
Visit our new Snorql interface at sparql.wikipathways.org. The image below explains which steps you can take:
1: Select a query from the list 2: Press the green query button to execute your selected query 3: View the results on the same page 4: You can select your own list of example queries from github, by adding the link.
Due to an Apache update, we are now creating RDF data according to SPARQL 1.1.
However, our SPARQL-endpoint running on Virtuoso is still using SPARQL 1.0.
This influences the way to query strings, and might affect federated queries.
Please remove the ^^xsd:string suffix
, as shown in the example below.
We provide a SPARQL endpoint where data queries can be done.
We have a large collection of general example queries, federated queries, and metabolite-related example queries.
For example, to list all pathways per instance of a particular gene or protein (wp:GeneProduct), you can use the following SPARQL:
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT
?pathway
(str(?label) as ?geneProduct)
WHERE {
?geneProduct a wp:GeneProduct .
?geneProduct rdfs:label ?label .
?geneProduct dcterms:isPartOf ?pathway .
?pathway a wp:Pathway .
FILTER regex(str(?label), "CYP").
}
The Semantic Web WikiPathways comes in two flavors: as RDF (beta) and as nanopublications (very experimental).
You can download the WikiPathways RDF from [here](http://data.wikipathways.org/current/rdf/).The WikiPathways RDF is split in two parts, the GPMLRDF part which contains a direct translation of the content in the GPML files, and a WPRDF part which contains harmonized biological information present in the GPML.
There is an RDF api available. Below is an example that extracts the data by converting the query into a url and extracts the data as CSV.
#!/usr/bin/perl
use LWP::Simple;
use URI::Escape;
my $sparql = "SELECT DISTINCT ?wpIdentifier ?elementneedsattention ?elementLabel
WHERE {
?pathway dc:title ?title .
?elementneedsattention a gpml:requiresCurationAttention .
?elementneedsattention dcterms:isPartOf ?pathway .
?elementneedsattention rdfs:label ?elementLabel .
?pathway wp:organism ?organism .
?pathway foaf:page ?page .
?pathway dc:identifier ?wpIdentifier .
?organism rdfs:label \"Mus musculus\"^^<http://www.w3.org/2001/XMLSchema#string> .
}
ORDER BY ?wpIdentifier";
my $url = 'https://sparql.wikipathways.org/sparql?default-graph-uri=&query='.uri_escape($sparql).'&format=text%2Fcsv&timeout=0&debug=on';
my $content = get $url;
die "Couldn't get $url" unless defined $content;
print $content;
For Java there are several options, but we user here the Jena Framework:
import com.hp.hpl.jena.query.Query;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryFactory;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;
public class javaCodeExample {
public static void main(String[] args) {
String sparqlQueryString = "SELECT * WHERE {?s ?p ?o} LIMIT 10";
Query query = QueryFactory.create(sparqlQueryString);
QueryExecution queryExecution = QueryExecutionFactory.sparqlService("https://sparql.wikipathways.org/sparql", query);
ResultSet resultSet = queryExecution.execSelect();
while (resultSet.hasNext()) {
QuerySolution solution = resultSet.next();
System.out.print(solution.get("s"));
System.out.print("\t"+solution.get("p"));
System.out.println("\t"+solution.get("o"));
}
}
}
For PHP we recommend the arc2: Easy RDF and SPARQL for LAMP systems.
The R package rrdf can be found and installed from GitHub.
library(rrdf)
sparql.remote(
"https://sparql.wikipathways.org/sparql",
"SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
)
Another option is to use the SPARQL package (tested on Ubuntu 18.04.5 LTS, R-studio version 1.4.1717, R version 4.1.0 (2021-05-18)).
- Note the backslashes in front of the quotation marks in the VALUES claim; this is specifically needed in R to read these characters correctly.
- Note this query is an example of how to perform a UNION query in WikiPathways.
if(!"SPARQL" %in% installed.packages()){
install.packages("SPARQL")
}
library(SPARQL)
##Connect to Endpoint WikiPathways
endpointwp <- "https://sparql.wikipathways.org/sparql/"
queryDatanodeContent <-
"
select distinct (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?hgncIdProtein) AS ?ProteinsInPWs) (count(distinct ?chebiMetabolite) AS ?MetabolitesInPWs)
where {
VALUES ?wpid {\'WP4224\' \'WP4225\' \'WP4571\' }
?datanode dcterms:identifier ?id ;
dcterms:isPartOf ?pathwayRes .
?pathwayRes a wp:Pathway ;
dcterms:identifier ?wpid ;
dc:title ?title .
{?datanode a wp:Protein ;
wp:bdbHgncSymbol ?hgncIdProtein .}
UNION
{?datanode a wp:Metabolite ;
wp:bdbChEBI ?chebiMetabolite .}
} ORDER BY ASC(?wpid)
"
resultsDatanodeContent <- SPARQL(endpointwp,queryDatanodeContent,curl_args=list(useragent=R.version.string))
showresultsDatanodeContent <- resultsDatanodeContent$results
The below code works in both the JavaScript and the Groovy console:
rdf.sparqlRemote(
"https://sparql.wikipathways.org/sparql",
"SELECT DISTINCT ?p WHERE { ?s ?p ?o }"
)
For quick and easy querying, we recommend to use curl (Linux and OS X)
curl -F "query=SELECT * WHERE {?s ?p ?o} LIMIT 10" https://sparql.wikipathways.org/sparql
The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115191, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in-kind contribution.