Skip to content

CKAN lodcloud RDF vocabulary

timrdf edited this page Jan 5, 2013 · 87 revisions

What's first

  • Getting started - the starting point for DataFAQs.
  • CKAN - walks through how to create and update dataset listings, both manually and automatically.

What we'll cover

The CKAN infrastructure and it's flagship instance http://datahub.io can be used for any kind of dataset, and the lodcloud group is a Data Hub group that focuses on listing the bubbles in the LOD Cloud diagram. So, there are additional hurdles that a Linked Data publisher must jump through to be included in the LOD Cloud Diagram (in addition to listing it on Data Hub). These requirements are a Good Thing, since it provides a concrete target for Linked Data publishers to achieve.

  • URIs must dereference (Guidelines).
  • Must cite (or be cited) 50 or more URIs in other datasets (Guidelines).
  • Describe your dataset according to Guidelines (we'll expand on this in this page, since it is less than straightforward).
  • Tag newly added data sets with lod (lodcloud group, Guidelines).
  • Validate to level three.

After you validate to level three, cygri or Anja need to add it to the lodcloud group, which is done by public-lod email:

  • "If you write an email to me or Anja, then we add it as soon as we get around to it."
  • "If you don't write an email, then it will be added whenever we do the next update of the diagram."

This page outlines the metadata that the lodcloud group uses to describe each bubble of the LOD Cloud diagram, and describes some services that can help you along in the process.

Let's get to it

The following two pages are intended to be "equivalent" descriptions for how to annotate the CKAN entry for your LOD Cloud bubble:

Although "Most of it can be expressed in VoID", it isn't clear how, since there doesn't seem to be an authoritative RDF representation for a LOD Cloud bubble. Some work that has attempted it:

Since there isn't an authoritative RDFization, we created our own. Hopefully, all of these will converge at some point.

How to handle: Level 2 (minimal) Missing example URI

When validator reports: Missing example URI. Please provide an example URI if available in the Downloads & Resources section, using one of the following formats: example/rdf+xml, example/turtle, example/ntriples, example/x-ntriples, example/x-quads, example/rdfa, example/x-trig.

Find an interesting example resource:

prefix void: <http://rdfs.org/ns/void#> 

select distinct ?dataset ?eg 
where {
  ?dataset void:exampleResource ?eg
  filter(!regex(str(?eg),'thing'))
}
limit 1

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   void:exampleResource
      <http://purl.org/twc/health/source/hub-healthdata-gov/dataset/hospital-compare/version/2012-Jul-17/provider/010054>;
.

How to handle: Level 2 (minimal) Missing download(s)

When validator reports: Missing download(s). Please provide a link to the download file(s) if available in the Downloads & Resources section, using one of the following formats: application/rdf+xml, text/turtle, application/x-ntriples, application/x-nquads, application/x-trig, text/n3.

Find the data dumps using:

prefix void: <http://rdfs.org/ns/void#> 

select distinct ?dump
where {
  ?dataset void:dataDump ?dump
}

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;

How to handle Level 3: Missing publisher information

When validator reports: Missing publisher information. Please provide a tag indicating if the dataset was published by the producer of the data, or by a third party. The tag should be one of: published-by-producer, published-by-third-party.

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   moat:taggedWithTag #<tag/published-by-producer>; 
                      <tag/published-by-third-party>; 
.

How to handle Level 3: Missing proprietary vocabulary information

When validator reports: Missing proprietary vocabulary information. Please provide a tag indicating if the dataset does not contain proprietary vocabulary terms, or if it contains proprietary terms, if they are dereferenceable or not. The tag should be one of: deref-vocab, no-deref-vocab, no-proprietary-vocab.

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   moat:taggedWithTag #<tag/no-proprietary-vocab>;
                       <tag/no-deref-vocab>;
                      #<tag/deref-vocab>;
.

How to handle Level 3: Missing instance namespace

When validator reports: Missing instance namespace. Please provide the namespace used for instances of the dataset. For example, the namespace for DBpedia instances is http://dbpedia.org/resource/. This will be used to detect who links to your dataset.

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   void:uriSpace "http://purl.org/twc/health/";
.

How to handle Level 3: Missing voiD or Semantic Sitemap

When validator reports: Missing voiD or Semantic Sitemap. Please provide a link to a voiD description or XML Sitemap if available in the Downloads & Resources section, using one of the following formats: meta/void, meta/sitemap.

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<http://healthdata.tw.rpi.edu/void>
   a void:DatasetDescription;
   foaf:primaryTopic <dataset/twc-healthdata>;
.

How to handle Level 3: Missing information on vocabulary mappings

When validator reports: Missing information on vocabulary mappings. Indicate whether mappings for proprietary vocabulary terms are provided (owl:equivalentClass, owl:equivalentProperty, rdfs:subClassOf, and/or rdfs:subPropertyOf links, or mappings expressed as RIF rules or using the R2R Mapping Language) by using the tag vocab-mappings. Use no-vocab-mappings otherwise.

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   moat:taggedWithTag <tag/vocab-mappings>;
.

How to handle Level 3: Missing vocabularies used

When validator reports: Missing vocabularies used. Provide vocabularies used by the data set as tags, e.g. format-skos, format-foaf.

Find the vocabs used with:

prefix rdfs:       <http://www.w3.org/2000/01/rdf-schema#>
prefix conversion: <http://purl.org/twc/vocab/conversion/>

select distinct ?vocab
where {
  {?dataset conversion:uses_predicate ?p . 
   ?p rdfs:isDefinedBy ?vocab
  }
  union {
   ?dataset conversion:uses_class ?c . 
   ?c rdfs:isDefinedBy ?vocab
  }
}

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   void:vocabulary
                   <http://purl.org/linked-data/cube#>,
                   <http://purl.org/vocab/vann/>,
                   <http://usefulinc.com/ns/doap#>,
                   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
                   <http://www.w3.org/2000/01/rdf-schema#>,
                   <http://www.w3.org/2000/10/swap/pim/contact#>,
                   <http://www.w3.org/2002/07/owl#>,
                   <http://www.w3.org/2006/vcard/ns#>,
                   <http://www.w3.org/ns/prov#>,
                   <http://xmlns.com/foaf/0.1/>;
.

How to handle Level 3: Missing information on provenance metadata

When validator reports: Missing information on provenance metadata. Indicates whether the data set provides provenance meta-information (creator of the data set, creation date, maybe creation method) as document meta-information or via a voiD description. For instance, using the dc:creator or dc:date properties. Use the tag provenance-metadata / no-provenance-metadata.

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   moat:taggedWithTag <tag/provenance-metadata>;
                      #<tag/no-provenance-metadatavocab-mappings>;
.

How to handle Level 3: Missing mapping(s)

When validator reports: Missing mapping(s). If the data set provides vocabulary mappings to other vocabularies, provide a link to the mapping file in the Resources section, using the following format: mapping/. Replace with the mapping/rule language used, like R2R or RIF.

TODO: determine query that will give the eparam files:

<http://purl.org/twc/health/source/hub-healthdata-gov/dataset/food-recalls/version/2012-May-08/conversion/enhancement/1>
   a conversion:LayerDataset, void:Dataset;
   conversion:conversion_process [
      a conversion:EnhancementConversionProcess;

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   TODO
.

How to handle Level N: next

When validator reports: TODO

Then, click around the CKAN web UI:

Or, assert it in RDF and POST it to add-metadata.py:

@base  <http://datahub.io/> .

<dataset/twc-healthdata>
   a datafaqs:CKANDataset;
   TODO
.

Old

using and RDF vocabulary to describe ckan lodcloud dataset listings and using SADI services to produce and consume them to list on ckan using the ckan api

Describing Linked Data using Linked Data

Although this LOD Cloud metadata would be naturally described in RDF, using CKAN to list the bubbles requires that we contort the descriptions into the schema that CKAN uses to describe their datasets.

Adding all descriptions to ArrayExpress experiment E-AFMX-1 "from scratch"

From URI: http://purl.org/twc/arrayexpress/E-AFMX-1 use thedatahub local identifier arrayexpress-e-afmx-1, which gets a URI: http://thedatahub.org/en/dataset/arrayexpress-e-afmx-1

Step 1: add to a group

POST this to http://sparql.tw.rpi.edu/services/datafaqs/ckan/add-metadata:

d

What's next

Clone this wiki locally