The goal of this project is to provide a reliable and high-quality search functionality over RDF Schemas and OWL Ontologies:
- Search for classes, properties and vocabularies
- Index based on ElasticSearch and Lucene
- Bootstrap the index from a Linked Open Vocabularies (LOV) dump
- Load individual RDFS and OWL files into the index
- Search using the RESTful ElasticSearch API
This is how you get an index up and running, and filled with data.
The recommended way on OS X is using Homebrew. After Homebrew is set up and configured, simply run:
brew install elasticsearch
To do: Add instructions for other operating systems...
The easiest way for development use is this, using the provided configuration file:
elasticsearch -f -D es.config=elasticsearch.yml
The -f
flag starts ElasticSearch in the foreground so you can stop it with Ctrl+C.
The -D
option instructs ElasticSearch to use the elasticsearch.yml
configration file. This configuration places data and logs into a subdirectory elasticsearch
within this repository. For production use, you may want to use a different setup.
You need Maven. Install it if necessary (brew install maven
on OS X).
mvn package
This compiles and assembles the command-line app. The result is two things:
- A gzipped version of the command-line app is generated in
target/vocidex-cli.tar.gz
and can be deployed wherever you like - An uncompressed version of the app is in
target/vocidex-cli/vocidex
and can be used directly
From inside the generated app's directory, the command-line tools can be run by invoking bin/appname
.
# go to CLI build dir
cd target/vocidex-cli/vocidex
# Download LOV N-Quads dump as lov_aggregator.nq, takes a while
curl -o lov_aggregator.nq http://lov.okfn.org/dataset/lov/agg/lov_aggregator.rdf
# Load it, takes a while
bin/index-lov elasticsearch localhost lov lov_aggregator.nq
curl 'http://localhost:9200/lov/class,property,vocabulary/_search?q=test&pretty=1'
If this returns a longish JSON response, all is good.
This tool connects to an ElasticSearch cluster and initializes a new index for use with Vocidex. To see its syntax:
bin/create-index
Example invocation:
# Adds an index called 'lov' on the 'elasticsearch' cluster
bin/create-index elasticsearch localhost lov
This tool reads an RDFS or OWL file, and indexes any terms defined therein in an ElasticSearch index. To see its syntax:
bin/add-vocabulary
Example invocation:
# Indexes SKOS into the 'skos' index on the 'elasticsearch' cluster
bin/add-vocabulary elasticsearch localhost skos http://www.w3.org/2004/02/skos/core
This tool populates an ElasticSearch index with the contents of the Linked Open Vocabularies dump. The dump can be obtained here. The file needs to be downloaded, and its extension changed to .nq
because otherwise Jena gets confused. It really is an N-Quads file, not an RDF/XML file. To see the tool's syntax:
bin/index-lov
Example invocation:
# Download LOV dump with right name
curl -o lov_aggregator.nq http://lov.okfn.org/dataset/lov/agg/lov_aggregator.rdf
# Indexes the dump into an index called 'lov' on the 'elasticsearch' cluster
bin/index-lov elasticsearch localhost lov lov_aggregator.nq
Once the ElasticSearch index is populated, the standard REST-based ElasticSearch APIs can be used to run searches.
The following example searches for classes, properties and vocabularies in the lov
index, using the keyword test
:
curl 'http://localhost:9200/lov/class,property,vocabulary/_search?q=test&pretty=1'
Equivalent to:
curl -XPOST 'http://localhost:9200/lov/class,property,vocabulary/_search?pretty=1' -d '{"query":{"match":{"_all":"test"}}}'
This provides an autocomplete feature on pre-tokenized (using edge_ngram [1;100]) and indexed fields *.autocomplete
.
curl -XPOST 'http://localhost:9200/lov/class,property/_search?pretty=1' -d '{
"fields" : ["uri", "prefixed", "localName"],
"query" : {
"multi_match" : {
"query": "foaf:",
"fields": ["prefixed.autocomplete","uri.autocomplete"],
"type" : "match_phrase"
}
}
}'
Initializing Eclipse files:
mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs
Running the tests:
mvn test
Use the issue tracker to discuss stuff, and feel free to submit pull requests.
Vocidex works by creating a JSON document for each entity to be indexed (classes, properties, datatypes, vocabularies), and putting them into an ElasticSearch index. Here we document the structure of these JSON documents.
Note: “term array” is a JSON array of objects, each with uri
and label
keys.
type
:class
,property
,datatype
uri
: absolute URIuri.autocomplete
: edge_ngram tokenized for autocomplete overuri
prefix
: Namespace prefix, either provided by LOV or manually at index time; may be absentlocalName
: Part after the last hash/slashlocalName.autocomplete
: edge_ngram tokenized for autocomplete overlocalName
prefixed
: Prefixed name (e.g.,foaf:Person
), or absent if noprefix
prefixed.autocomplete
: edge_ngram tokenized for autocomplete overprefixed
label
:rdfs:label
or similar property, or a string synthesized from the local namecomment
:rdfs:comment
or similar property; may be absentvocabulary
: LOV metadata about the vocabulary; may be absent **uri
**prefix
**label
**homepage
Term keys as listed above, plus:
superclasses
: term arraydisjointClasses
: term arrayequivalentClasses
: term array
Term keys as listed above, plus:
domains
: term arrayranges
: term array; each member also has either anisDatatype
orisClass
field with valuetrue
superproperties
: term arrayinverseProperties
: term arrayequivalentProperties
: term arrayisAnnotationProperty
: booleanisObjectProperty
: booleanisDatatypeProperty
: booleanisFunctionalProperty
: booleanisInverseFunctionalProperty
: booleanisTransitiveProperty
: booleanisSymmetricProperty
: boolean
Term keys as listed above
type
:vocabulary
uri
: absolute URI as per LOVuri.autocomplete
: edge_ngram tokenized for autocomplete overuri
prefix
: conventional prefix as per LOVprefix.autocomplete
: edge_ngram tokenized for autocomplete overprefix
label
: as for termsshortLabel
: curated short-form label as per LOV; may be absentcomment
: as for termshomepage
: URL from LOV metadata; may be absent