-
Notifications
You must be signed in to change notification settings - Fork 7
CKAN lodcloud RDF vocabulary
- Getting started - the starting point for DataFAQs.
- CKAN - walks through how to create and update dataset listings, both manually and automatically.
The CKAN infrastructure and it's flagship instance http://datahub.io can be used for any kind of dataset, and the lodcloud group is a Data Hub group that focuses on listing the bubbles in the LOD Cloud diagram. So, there are additional hurdles that a Linked Data publisher must jump through to be included in the LOD Cloud Diagram (in addition to listing it on Data Hub). These requirements are a Good Thing, since it provides a concrete target for Linked Data publishers to achieve.
- URIs must dereference (Guidelines).
- Must cite (or be cited) 50 or more URIs in other datasets (Guidelines).
- Describe your dataset according to Guidelines (we'll expand on this in this page, since it is less than straightforward).
- Tag newly added data sets with lod (lodcloud group, Guidelines).
- Validate to level three.
After you validate to level three, cygri or Anja need to add it to the lodcloud group, which is done by public-lod email:
- "If you write an email to me or Anja, then we add it as soon as we get around to it."
- "If you don't write an email, then it will be added whenever we do the next update of the diagram."
This page outlines the metadata that the lodcloud group uses to describe each bubble of the LOD Cloud diagram, and describes some services that can help you along in the process.
The following two pages are intended to be "equivalent" descriptions for how to annotate the CKAN entry for your LOD Cloud bubble:
- http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation (we'll primarily refer to this one)
- http://wifo5-03.informatik.uni-mannheim.de/lodcloud/ckan/validator/levels.html
Although "Most of it can be expressed in VoID", it isn't clear how, since there doesn't seem to be an authoritative RDF representation for a LOD Cloud bubble. Some work that has attempted it:
- There's an RDFization that takes most of CKANmetainformation into account at http://semantic.ckan.net/ [cygri email].
- http://dsi.lod-cloud.net/ shows a query of RDF describing lodcloud [email to public-lod].
- https://bitbucket.org/ww/ckanrdf/overview produces RDF from a CKAN entry, but it doesn't account for the lodcloud conventions established in CKANmetainformation.
- Will Waites has done quite a bit of really nice work, which he describes here. The RDF representation that gockan provided was really great. I've cached an example back when it was running.
Since there isn't an authoritative RDFization, we created our own. Hopefully, all of these will converge at some point.
-
add-metadata.py is a SADI service that accepts a "proper" RDF description of a LOD Cloud bubble and updates the corresponding CKAN entry.
- e.g., POSTing twc-healthdata.ttl to add-metadata.py modifies http://datahub.io/dataset/twc-healthdata.
-
lift-ckan.py is a SADI service that accepts an instance of datafaqs:CKANDataset and returns a "proper" RDF description as a LOD Cloud bubble.
- e.g. POSTing twc-healthdata.ttl to lift-ckan.py returns twc-healthdata.ttl.out.
When validator reports: TODO
Find an interesting example resource:
prefix void: <http://rdfs.org/ns/void#>
select distinct ?dataset ?eg
where {
?dataset void:exampleResource ?eg
filter(!regex(str(?eg),'thing'))
}
limit 1
- http://datahub.io/dataset/twc-healthdata -> "Resources (edit)"
- "New resource..." in lower left.
- "Link to a file"
- Paste in the
File URL
(e.g. http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17/provider/010054) and clickAdd
.- Name: Example URI
- Description: An example URI in this dataset.
- Resource type: Data file.
-
Size (bytes): (enter value from:
curl > b; du -sh b
) - Mimetype: application/rdf+xml
- Mimetype (inner): application/rdf+xml
When validator reports: Missing download(s). Please provide a link to the download file(s) if available in the Downloads & Resources section, using one of the following formats: application/rdf+xml, text/turtle, application/x-ntriples, application/x-nquads, application/x-trig, text/n3.
Although this LOD Cloud metadata would be naturally described in RDF, using CKAN to list the bubbles requires that we contort the descriptions into the schema that CKAN uses to describe their datasets.
From URI: http://purl.org/twc/arrayexpress/E-AFMX-1 use thedatahub local identifier arrayexpress-e-afmx-1
, which gets a URI: http://thedatahub.org/en/dataset/arrayexpress-e-afmx-1
Step 1: add to a group
POST this to http://sparql.tw.rpi.edu/services/datafaqs/ckan/add-metadata:
d
- Listing twc healthdata as a LOD Cloud Bubble describes how healthdata.tw.rpi.edu uses this technique.