Skip to content

DATAFAQS environment variables

timrdf edited this page Feb 11, 2013 · 65 revisions

What's first

What we'll cover

This page describes each shell environment variable, how it is used, and how it should be set to affect changes in how DataFAQs is deployed or behaves.

Let's get to it

The df- scripts used to manage the FAqT Brick are guided by shell environment variables. This page lists the variables that can influence df- scripts. df-vars.sh will show you all variables that can affect the operations, along with their current values and their defaults if not set.

CSV2RDF4LOD_HOME                                      /opt/csv2rdf4lod-automation
DATAFAQS_HOME                                         /opt/DataFAQs
DATAFAQS_BASE_URI                                     !!! -- MUST BE SET -- !!! source datafaqs-source-me.sh
  
DATAFAQS_LOG_DIR                                      (not required)
  
DATAFAQS_PUBLISH_METADATA_GRAPH_NAME                  (will default to: http://www.w3.org/ns/sparql-servi...)
DATAFAQS_PUBLISH_TDB                                  (will default to: false)
DATAFAQS_PUBLISH_TDB_DIR                              (will default to: VVV/publish/tdb/)
  
DATAFAQS_PUBLISH_VIRTUOSO                             (will default to: false)
CSV2RDF4LOD_CONVERT_DATA_ROOT                         (not required, but vload will copy files when loading)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME                     (will default to: /opt/virtuoso)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH                (will default to: /opt/virtuoso/bin/isql)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT                     (will default to: 1111)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME                 (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD                 (will default to: dba)
  
CSV2RDF4LOD_CONCURRENCY                               (will default to: 1)
  
see documentation for variables in: https://github.com/timrdf/DataFAQs/wiki/DATAFAQS-environment-variables

DATAFAQS_BASE_URI

Any URI that DataFAQs creates will be situated within DATAFAQS_BASE_URI, rooted under $DATAFAQS_BASE_URI/datafaqs/.

For example if DATAFAQS_BASE_URI is at the machine level http://sparql.tw.rpi.edu, URIs like http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1 will be created.

export DATAFAQS_BASE_URI='http://sparql.tw.rpi.edu'

void:dataDumps of the RDF graphs collected by DataFAQS will be placed under $DATAFAQS_BASE_URI/datafaqs/dumps/.

For another example, when the DataFAQs node will be at http://aquarius.tw.rpi.edu/projects/datafaqs/, use:

export DATAFAQS_BASE_URI='http://aquarius.tw.rpi.edu/projects'

Places in the implementation that use DATAFAQS_BASE_URI:

  • Used by bin/df-epoch.sh to name some graphs, named graphs (e.g. <$DATAFAQS_BASE_URI/datafaqs/epoch/$epoch/config/faqt-services> and void:dataDump <{{DATAFAQS_BASE_URI}}/datafaqs/dump/{{DUMP}}>)
  • Used by bin/df-epoch-metadata.py to determine URIs of graphs, named graphs, epochs, faqt services, datasets, etc. (e.g. <{{DATAFAQS_BASE_URI}}/datafaqs/epoch/{{EPOCH}}>)
  • Used by packages/faqt.python/faqt/faqt.py to self-report provenance of any FAqT Service that extends this superclass (see example below). Note that this environment variable must be accessible to the service via mod_env (e.g. PassEnv, SetEnv, or /etc/apache2/envvar) if the services are deployed using mod_python.
# curl http://lod.melagrid.org/services/sadi/faqt/vocabulary/uses/prov

<#> a mygrid:serviceDescription;
    rdfs:label "prov";
    dcterms:subject <>;
...
<#attribution-afe7ae56-7483-11e2-8185-96c47a284d2a>
    a prov:Attribution;
    prov:agent   <http://howdy.melagrid.org/datafaqs/services/sadi/faqt/vocabulary/uses/prov>; # ADDED (was <>)
    prov:hadPlan <https://raw.github.com/timrdf/DataFAQs/master/services/sadi/faqt/vocabulary/uses/prov.py> .
...
<http://howdy.melagrid.org/datafaqs/services/sadi/faqt/vocabulary/uses/prov> # ADDED (was <>)
   a datafaqs:FAqTService,
     prov:Agent, foaf:Agent;                                                 # ADDED
   rdfs:seeAlso  <https://github.com/timrdf/DataFAQs/wiki/FAqT-Service>;     # ADDED
   prov:generatedAtTime "2013-02-11T19:46:11.527095"^^xsd:dateTime .         # ADDED

DATAFAQS_PUBLISH_THROUGHOUT_EPOCH

When creating a new epoch with $DATAFAQS_HOME/bin/df-epoch, if DATAFAQS_PUBLISH_THROUGHOUT_EPOCH is set to true, then it the RDF produced will be loaded as it is produced. Loading at each step in the epoch allows clients to monitor the development of an epoch from the web.

export DATAFAQS_PUBLISH_THROUGHOUT_EPOCH='true'

Note that either $DATAFAQS_PUBLISH_TDB or $DATAFAQS_PUBLISH_VIRTUOSO must be true for this to take affect.

DATAFAQS_PUBLISH_METADATA_GRAPH_NAME

defaults to http://www.w3.org/ns/sparql-service-description#NamedGraph

export DATAFAQS_PUBLISH_METADATA_GRAPH_NAME='http://www.w3.org/ns/sparql-service-description#NamedGraph'

DATAFAQS_PUBLISH_TDB

DATAFAQS_PUBLISH_VIRTUOSO

if DATAFAQS_PUBLISH_VIRTUOSO is true, $DATAFAQS_HOME/bin/df-load-triple-store.sh will load the virtuoso triple store at:

  • CSV2RDF4LOD_CONVERT_DATA_ROOT
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD
    
  • CSV2RDF4LOD_CONCURRENCY (defaults to 1; number of threads to use to load triples)
    

This reuses the settings from csv2rdf4lod-automation.

DATAFAQS_PUBLISH_ALLEGROGRAPH

This is not implemented. TLDR.

DATAFAQS_PUBLISH_SESAME

Useful bits for installing Sesame Server in Tomcat:

Done once within console.sh:

> connect http://localhost:8080/openrdf-sesame .
> create native .
Please specify values for the following variables:
Repository ID [native]:          spo-balance    
Repository title [Native store]: spo-balance
Triple indexes [spoc,posc]:      spoc,posc,cspo,cpos,cops,cosp,opsc,ospc
Repository created

Done for each file/graph to load:

cat load.sc clear.sc 
connect http://localhost:8080/openrdf-sesame.
open spo-balance.
load input_112b6d7443852b15aa3153fa41d7ebf3.rdf into http://xmlns.com/foaf/0.1 .
exit .

connect http://localhost:8080/openrdf-sesame.
open spo-balance.
clear http://xmlns.com/foaf/0.1 .
exit .

DATAFAQS_LOG_DIR

If you don't care, set it with /dev/null; if you want to poke around, use export DATAFAQS_LOG_DIR=pwd``

DATAFAQS_PROVENANCE_CODE_RAW_BASE and DATAFAQS_PROVENANCE_CODE_PAGE_BASE

  • DATAFAQS_PROVENANCE_CODE_RAW_BASE is the base URI for the version-controlled source code of the FAqT Service. This environment variable is accessed by the superclass of all FAqT Services, i.e., faqt.python package (faqt.py) to provide PROV-O assertions about the services.
  • DATAFAQS_PROVENANCE_CODE_PAGE_BASE is the base URI for the page about the source code (as opposed to the source code itself). Used in faqt.python package to provide PROV-O assertions about the services.

CSV2RDF4LOD_HOME

TDBROOT

X_CKAN_API_Key

CKAN requires an API key to POST changes, and so the python ckanclient follows suit. The FAqT Services access the key from the environment variable X_CKAN_API_Key, so that the key is not hard-coded into the service implementations.

The python services access the environment variable using code similar to:

# Instantiate the CKAN client.
key = os.environ['X_CKAN_API_Key']
if len(key) <= 1:
   print 'ERROR: https://github.com/timrdf/DataFAQs/wiki/Missing-CKAN-API-Key'
   sys.exit(1)
ckan = ckanclient.CkanClient(base_location='http://datahub.io'+'/api', api_key=key)

Developer notes:

What's next

Clone this wiki locally