-
Notifications
You must be signed in to change notification settings - Fork 7
SADI Semantic Web Services framework
- Getting started
- The page at csv2rdf4lod gives a walk-through introduction to an existing SADI service -- from the client's perspective.
- FAqT Services are built using SADI.
- Maven for the Java tech stack.
This page provides some developer notes that try to help those that want to write a SADI service. Although the focus initial was on using the python implementation, there are some bits about using java that will evolve. We originally used python because we've found it much easier to develop, maintain, and deploy. However, we've recently decided to switch to Java to implement SADI services because the python stack has been too brittle in our uses (e.g., it can't parse non-ASCII RDF, it fails to execute SPARQL-as-string in SuRF, and it "randomly" hits AttributeError:n3 on larger POSTed inputs).
- Using Python
- Using Java
- Using Java (take 2)
- Using Python (take 2)
- SADI vs. other web services frameworks
The SADI Semantic Web-Services framework is web services done right, and we're excited to incorporate it as a fundamental design element for DataFAQs. This page provides some information about how to set up your own SADI service, and thus a DataFAQs FAqT service.
For a walk through on how to talk to an existing SADI service, see the page on csv2rdf4lod's wiki.
The following technologies are stacked together to create your sadi.py service:
- your sadi.py service
- sadi.py
- SuRF
- rdflib
- python
Jim McCusker contributed a python implementation to the SADI code base, adding a third language to the two that already exist (Java and Perl).
The following command will add Jim McCusker's sadi.py into your Python installation (but make sure you reference the latest egg, listed here):
sudo easy_install http://sadi.googlecode.com/files/sadi-0.1.5-py2.6.egg
If you want to build sadi.py yourself, use:
svn checkout http://sadi.googlecode.com/svn/trunk/python/sadi.python sadi.python
cd sadi.python
python setup.py bdist_egg
sudo easy_install dist/sadi-0.1.4-py2.6.egg
Bug Jim on http://code.google.com/p/sadi/issues/list with "sadi.py" problems.
To check to see what version of sadi.py you have installed:
bash-3.2$ easy_install -n sadi
Searching for sadi
Best match: sadi 0.1.2
Processing sadi-0.1.2-py2.6.egg
sadi 0.1.2 is already the active version in easy-install.pth
Using /Library/Python/2.6/site-packages/sadi-0.1.2-py2.6.egg
Processing dependencies for sadi
Finished processing dependencies for sadi
sadi.py services accept turtle when HTTP request header "Content-Type" is text/turtle
(preferred) or application/x-turtle
(will work).
SuRF is an object mapping library that lets you work with RDF data as if they were Python objects. SuRF has a Google Code project and list, but is primarily documented at http://packages.python.org/SuRF/. SuRF provides a handful of vocabulary namespaces by default.
One essential plug in for SuRF is one that fulfills the SPARQL Protocol. If you start using it with code like:
self.logd = Store( reader = "sparql_protocol",
writer = "sparql_protocol",
endpoint = "http://logd.tw.rpi.edu:8890/sparql")
you might get the error:
<class 'surf.plugin.manager.PluginNotFoundException'>: The <sparql_protocol> READER plugin was not found
To resolve it, run sudo easy_install -U surf.sparql_protocol
(according to their docs):
Cosmin May 2012: You can submit all your questions to the SuRF mailing list: https://groups.google.com/group/surfrdf Less severe issues are likely to be solved fairly quickly.
Access rdflib's graph from a SuRF store with store.reader.graph
sadi.py and SURF build on top of rdflib, an RDF api for python. See
sadi.py's Turtle parse issue is fixable with --data-binary.
See Python notes
running services/sadi/contextual-inverse-functional/contextual-inverse-functional.rpy:
python contextual-inverse-functional.rpy
Will launch the service at http://localhost:9090/ContextualInverseFunctional
Calling the service with one of its examples:
curl -LO https://raw.github.com/timrdf/DataFAQs/master/services/sadi/contextual-inverse-functional/sample-inputs/myPa.ttl
curl -H "Content-Type: text/turtle" -d @myPa.ttl http://localhost:9090/ContextualInverseFunctional
will return owl:sameAs triples to instances from a query against LOGD's SPARQL endpoint (A few commented lines show an alternative way to draw from the Turtle file http://homepages.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl instead).
@prefix owl: <http://www.w3.org/2002/07/owl#> .
<http://example.org/id/myPA#myPA>
a <http://purl.org/twc/ontology/cif.owl#SameResource>;
owl:sameAs
<http://dbpedia.org/resource/Pennsylvania>,
<http://logd.tw.rpi.edu/id/us/state/Pennsylvania>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1146/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1148/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1149/value-of/state_abbrv/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1292/value-of/lstate09/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1292/value-of/mstate09/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1330/District-size-order/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state_abbreviation/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1536/typed/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1930/typed/state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/1930/value-of/candidate_state/PA>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/353/typed/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2001_2007_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2008_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/fossil_fuel_consumption/2009_2010_preliminary/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2003_2004_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2005_2007_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2008_final/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/eia-doe-gov/dataset/net_generation/2009_2010_preliminary/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/epa-gov/dataset/crn-stations/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/ncdc-noaa-gov/dataset/us-climate-reference-network/value-of/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table1-anrf-zt/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table2-anrf/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table3-anrf/typed/state/PA>,
<http://logd.tw.rpi.edu/source/nitrd-gov/dataset/nsf_awards/typed/state/PA>,
<http://sws.geonames.org/6254927/>,
<http://www.rdfabout.com/rdf/usgov/geo/us/PA> .
Sample FAqT deployment describes how to deploy this service using twistd.
Jim started a collection of services for LOBD in its google code svn.
http://sadiframework.org/registry/ allows others to submit SADI service URIs, whose descriptions are available from a SPARQL endpoint in the graph named <http://sadiframework.org/registry/>
. A wrapper to that endpoint is available here. SADI services registered at http://sadiframework.org/registry/ are listed at http://sadiframework.org/registry/services. See also the Resources tab.
bash-3.2$ python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from surf import *
>>> store = Store(reader="rdflib", writer="rdflib", rdflib_store="IOMemory")
>>> session = Session(store)
>>> store.reader.graph.parse(open('arrayexpress-e-afmx-1.ttl'),format='n3')
<Graph identifier=_5ed9652e-1b37-4793-8b4a-f61edb081bb6 (<class 'rdflib.graph.Graph'>)>
>>> query='''prefix void: <http://rdfs.org/ns/void#>
... prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
... prefix dcterms: <http://purl.org/dc/terms/>
... select distinct ?group
... where {
... <http://thedatahub.org/en/dataset/arrayexpress-e-afmx-1>
... a datafaqs:CKANDataset;
... dcterms:isPartOf ?group .
... ?group a datafaqs:CKANGroup .
... }
... '''
>>> results = store.execute_sparql(query)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/SuRF-1.1.4_r352-py2.7.egg/surf/store.py", line 200, in execute_sparql
return self.reader.execute_sparql(sparql_query, format = format)
File "/Library/Python/2.7/site-packages/surf.rdflib-1.0.0_r338-py2.7.egg/surf_rdflib/reader.py", line 87, in execute_sparql
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 326, in loads
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 360, in decode
TypeError: expected string or buffer
Jim updated sadi.py to eliminate SuRF dependencies. Instead, it uses just rdflib 4. He still points to https://code.google.com/p/sadi/wiki/BuildingServicesInPython for its documentation, and suggests to use virtualenv.
Flask http://flask.pocoo.org/docs/
virtualenv —-no-site-packages MY_NEW_SADI_ENV_DIR
(jump ahead to Using Java take 2)
The SADI folks offer a tutorial for setting up a sadi service implemented in Java. It is the best place to start.
The following steps use the skeleton they provide to recreate a VisKO service that converts postscript to pdf. The steps avoid using Eclipse because the maven run configuration that the SADI folks offer isn't appearing.
Step 1: Grab the skeleton and uncompress it.
Step 2: cd sadi-services
and create templated Java by running the following maven command. I reuse class names for service names to keep things straightforward. serviceClass
is the Java class that will be created (It creates the corresponding directory structure, too). inputClass
becomes a @InputClass
Java annotation; outputClass
, @OutputClass
; contactEmail
, @ContactEmail
, and serviceName
, @Name
.
mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service \
-DserviceClass=edu.rpi.tw.test.data.document.PostscriptToPDF \
-DinputClass=https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#Postscript \
-DoutputClass=https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#PDF \
-DcontactEmail=lebot@school.edu \
-DserviceName=PostscriptToPDF
All options:
sadi-generator:generate-service
A goal that generates the skeleton of a SADI service.
This goal has the following parameters:
serviceName
The name of the service, which will also be used in the path to the
service servlet. This parameter is required.
serviceClass
The fully-qualified name of the Java class that will implement the
service. This parameter is required.
serviceURL
The URL of the service. This parameter is optional and not normally
required, except in certain baroque network configurations.
serviceRDF
A URL or local path to a service description in RDF. This parameter
is optional, but can be used instead of specifying all of the other
parameters separately.
serviceDescription
The service description. This parameter is optional.
serviceProvider
The service provider. This parameter is optional.
contactEmail
A contact email address for the service. This parameter is required.
authoritative
Whether or not the service is authoritative. This parameter is
optional, defaulting to false.
async
Whether or not the service is asynchronous. This parameter is
optional, defaulting to false.
inputClass
The URI of the service input class. This parameter is required and
the URI must resolve to an OWL class definition.
outputClass
The URI of the service output class. This parameter is required and
the URI must resolve to an OWL class definition.
parameterClass
The URI of the service parameter class. This parameter is optional,
but if specified the URI must resolve to an OWL class definition.
Step 3: Add implementation to the java file that maven just created. At this point, you'll need to know the Jena API. Hopefully I'll add Sesame support soon.
vi src/main/java/edu/rpi/tw/test/data/document/PostscriptToPDF.java
@Override
public void processInput(Resource input, Resource output)
{
/* your code goes here
* (add properties to output node based on properties of input node...)
*/
Resource newPDF = input.getModel().createResource("http://example.org/newly-created-PDF-from-given-PS.pdf");
input.addProperty(Vocab.alternateOf, newPDF);
}
Step 4: Make sure any RDF vocabulary you created is resolvable to an RDFS/OWL description.
Step 5: Compile and deploy the service
mvn org.mortbay.jetty:jetty-maven-plugin:run
Step 6: See service listed at http://localhost:8080/sadi-services/
Step 7: Invoke the service
curl -d @sample-input-for-PostscriptToPDF.ttl.rdf http://localhost:8080/sadi-services/PostscriptToPDF
sample-input-for-PostscriptToPDF.ttl.rdf:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:nie="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:vsr="https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#">
<rdf:Description rdf:about="http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps">
<rdf:type rdf:resource="https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#Postscript"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps">
<nie:mimeType>application/postscript</nie:mimeType>
</rdf:Description>
</rdf:RDF>
rapper -g -o rdfxml sample-input-for-PostscriptToPDF.ttl > sample-input-for-PostscriptToPDF.ttl.rdf
sample-input-for-PostscriptToPDF.ttl
@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> .
@prefix vsr: <https://raw.github.com/timrdf/vsr/master/ontologies/vsr.ttl.owl#> .
<http://www.adobetutorialz.com/content_images/AdobeTechnologies/PostScript/manylines.ps>
a vsr:Postscript;
nie:mimeType "application/postscript";
.
The test-service target is worth looking into...
$ mvn ca.wilkinsonlab.sadi:sadi-tester:test-service
-DserviceURL=http://localhost:8080/sadi-services/hello
-Dinput=http://sadiframework.org/test/hello-input.rdf
-Dexpected=http://sadiframework.org/test/hello-output.rdf
When using Eclipse Indigo, change the type doc/generate sadi service.launch
to:
<launchConfiguration type="org.eclipse.m2e.Maven2LaunchConfigurationType">
This is because the zip that the SADI people offer is for older Eclipse.
Thanks to Nick del Rio for his create war.launch. Plop it into your doc/eclipse/
directory (next to generate sadi service.launch
from the sadi-service-skeleton-0.1.1-e3.7.zip
), refresh within eclipse, and it'll be available from Run -> Run Configurations... -> Maven Build
.
After dropping target/sadi-services.war
into a tomcat webapps/
directory, http://localhost:8080/sadi-services/ will list the services.
Reimplementing lift-ckan.py, from projects/DataFAQs/github/DataFAQs/src/java/sadi-services
, run the mvn "generate-service" target like below. You can edit create-new-sadi-java-file and just source it. Unfortunately, the classes that you specify MUST resolve with an OWL description. So, if the class that you use doesn't, just use rdfs:Resource for both and change it in the Java afterwards.
mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service /
-DserviceName=lift-ckan /
-DserviceClass=edu.rpi.tw.data.quality.sadi.ckan.LiftCKAN /
-DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset /
-DoutputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset /
-DcontactEmail=lebot@rpi.edu
webapp/WEB-INF/web.xml
maps the requested URL to the Java class name using the following two snippets. For example, this enables requests to http://localhost:8080/sadi-services/faqt/sparql-service-description/named-graphs to invoke the processInput(Resource input, Resource output)
method on the Java class edu.rpi.tw.data.quality.sadi.faqt.sparql_service_description.NamedGraphs
.
<servlet-mapping>
<servlet-name>named-graphs</servlet-name>
<url-pattern>/faqt/sparql-service-description/named-graphs</url-pattern>
</servlet-mapping>
...
<servlet>
<servlet-name>named-graphs</servlet-name>
<servlet-class>edu.rpi.tw.data.quality.sadi.faqt.sparql_service_description.NamedGraphs</servlet-class>
</servlet>
mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.quality.sadi.faqt.lodcloud.basic.TaggedWithLOD -DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDataset -D
outputClass=http://purl.org/twc/vocab/datafaqs#Evaluated -DcontactEmail=lebot@rpi.edu -DserviceName=tagged-with-lod
mvn ca.wilkinsonlab.sadi:sadi-generator:generate-service -DserviceClass=edu.rpi.tw.data.quality.sadi.faqt.lodcloud.minimal.TaggedWithTopic -DinputClass=http://purl.org/twc/vocab/datafaqs#CKANDatase
t -DoutputClass=http://purl.org/twc/vocab/datafaqs#Evaluated -DcontactEmail=lebot@rpi.edu -DserviceName=tagged-with-topic
Project-level workflow to develop a Java SADI service:
- Deploy dependency utilities/ckanclient-j/github/CKANClient-J into local
~/.m2/repository
- A good intro to using gson (and how to handle lists in gson). gson on mvnrepository.
ant dist
source pom.sh
- Edit Java source code, e.g. LiftCKAN.java
- Build the .war from DataFAQs/src/java/sadi-services
-
mvn org.apache.maven.plugins:maven-war-plugin:2.2:war
->target/sadi-services.war
= services/sadi/sadi-services.war
-
- Plop into a Tomcat
webapps/
and see the listing at e.g. http://aquarius.tw.rpi.edu/projects/datafaqstest/sadi-services/- (see
utilities/apache-tomcat/notes.rtf
; currently running version 7.0.34)
- (see
- Prizms will install it for you if you let it.
SADI people don't like using issues to track unresolved "questions". So I need to keep track of the emails here.
Open:
- Building sadi google code repo: http://groups.google.com/group/sadi-dev/browse_thread/thread/7bd734300e5a4419#
- 502 on long service execution: http://groups.google.com/group/sadi-dev/browse_thread/thread/8cc886675d9e356c
- Unable to resolve datafaqs namespace: http://code.google.com/p/sadi/issues/detail?id=11
Not critical:
- sparql query not an endpoint: http://groups.google.com/group/sadi-dev/browse_thread/thread/decbd1e3853b22cc
Resolved:
- eclipse generate service mvn: http://groups.google.com/group/sadi-discuss/browse_thread/thread/bd677aba61bbb681
- http://knoesis.org/library/resource.php?id=750 (from http://lists.w3.org/Archives/Public/public-semweb-lifesci/2012Jan/0001.html)
- http://code.google.com/p/sadi/wiki/StandardsComparison
- http://restdesc.org/
- http://www.w3.org/mid/08AE3015BD5DF149951910A5E851F010B7B918@MAIL-02.io-informatics.com
- http://www.w3.org/mid/EFA3516724DFF7419395367D7ADCE633286B3424FF@DCPWVMBXC1VS2.mdanderson.edu
- Hydra RDF is a vocabulary to describe Web APIs http://www.hydra-cg.com/spec/latest/core/ http://www.markus-lanthaler.com/research/hydra-a-vocabulary-for-hypermedia-driven-web-apis.pdf
If you're planning to just use existing evaluation services:
- Skip ahead to see how to set up a FAqT Brick and get some results asap.
If you're trying to write an evaluation service:
- FAqT Service will describe how to steal our template to create an evaluation service that others can call.
- Sample FAqT deployment will describe how to deploy the SADI service that you just developed.
- GSON is a very nice Java library to map JSON into Java objects (and vice versa).