Skip to content

DATAFAQS environment variables

timrdf edited this page Dec 22, 2012 · 65 revisions

What's first

What we'll cover

We'll discuss each shell environment variable, how it is used, and how it should be set to affect changes in how DataFAQs is deployed or behaves.

Let's get to it

The df- scripts used to manage the FAqT Brick are guided by shell environment variables. This page lists the variables that can influence df- scripts. df-vars.sh will show you all variables that can affect the operations, along with their current values and their defaults if not set.

CSV2RDF4LOD_HOME                                      /opt/csv2rdf4lod-automation
DATAFAQS_HOME                                         /opt/DataFAQs
DATAFAQS_BASE_URI                                     !!! -- MUST BE SET -- !!! source datafaqs-source-me.sh
  
DATAFAQS_LOG_DIR                                      (not required)
  
DATAFAQS_PUBLISH_METADATA_GRAPH_NAME                  (will default to: http://www.w3.org/ns/sparql-servi...)
DATAFAQS_PUBLISH_TDB                                  (will default to: false)
DATAFAQS_PUBLISH_TDB_DIR                              (will default to: VVV/publish/tdb/)
  
DATAFAQS_PUBLISH_VIRTUOSO                             (will default to: false)
CSV2RDF4LOD_CONVERT_DATA_ROOT                         (not required, but vload will copy files when loading)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME                     (will default to: /opt/virtuoso)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH                (will default to: /opt/virtuoso/bin/isql)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT                     (will default to: 1111)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME                 (will default to: dba)
CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD                 (will default to: dba)
  
CSV2RDF4LOD_CONCURRENCY                               (will default to: 1)
  
see documentation for variables in: https://github.com/timrdf/DataFAQs/wiki/DATAFAQS-environment-variables

DATAFAQS_BASE_URI

Any URI that DataFAQs creates will be situated within DATAFAQS_BASE_URI, rooted under $DATAFAQS_BASE_URI/datafaqs/.

For example if DATAFAQS_BASE_URI is at the machine level http://sparql.tw.rpi.edu, URIs like http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1 will be created.

export DATAFAQS_BASE_URI='http://sparql.tw.rpi.edu'

void:dataDumps of the RDF graphs collected by DataFAQS will be placed under $DATAFAQS_BASE_URI/datafaqs/dumps/.

For another example, when the DataFAQs node will be at http://aquarius.tw.rpi.edu/projects/datafaqs/, use:

export DATAFAQS_BASE_URI='http://aquarius.tw.rpi.edu/projects'

Places in the implementation that use DATAFAQS_BASE_URI:

  • Used by bin/df-epoch.sh to name some graphs, named graphs (e.g. <$DATAFAQS_BASE_URI/datafaqs/epoch/$epoch/config/faqt-services> and void:dataDump <{{DATAFAQS_BASE_URI}}/datafaqs/dump/{{DUMP}}>)
  • Used by bin/df-epoch-metadata.py to determine URIs of graphs, named graphs, epochs, faqt services, datasets, etc. (e.g. <{{DATAFAQS_BASE_URI}}/datafaqs/epoch/{{EPOCH}}>)
  • Used by packages/faqt.python/faqt/faqt.py to self-report some provenance of any FAqT Service.

DATAFAQS_PUBLISH_THROUGHOUT_EPOCH

When creating a new epoch with $DATAFAQS_HOME/bin/df-epoch, if DATAFAQS_PUBLISH_THROUGHOUT_EPOCH is set to true, then it the RDF produced will be loaded as it is produced. Loading at each step in the epoch allows clients to monitor the development of an epoch from the web.

export DATAFAQS_PUBLISH_THROUGHOUT_EPOCH='true'

Note that either $DATAFAQS_PUBLISH_TDB or $DATAFAQS_PUBLISH_VIRTUOSO must be true for this to take affect.

DATAFAQS_PUBLISH_METADATA_GRAPH_NAME

defaults to http://www.w3.org/ns/sparql-service-description#NamedGraph

export DATAFAQS_PUBLISH_METADATA_GRAPH_NAME='http://www.w3.org/ns/sparql-service-description#NamedGraph'

DATAFAQS_PUBLISH_TDB

DATAFAQS_PUBLISH_VIRTUOSO

if DATAFAQS_PUBLISH_VIRTUOSO is true, $DATAFAQS_HOME/bin/df-load-triple-store.sh will load the virtuoso triple store at:

  • CSV2RDF4LOD_CONVERT_DATA_ROOT
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_USERNAME
    
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD
    
  • CSV2RDF4LOD_CONCURRENCY (defaults to 1; number of threads to use to load triples)
    

This reuses the settings from csv2rdf4lod-automation.

DATAFAQS_PUBLISH_ALLEGROGRAPH

This is not implemented. TLDR.

DATAFAQS_PUBLISH_SESAME

  • DATAFAQS_PUBLISH_SESAME_HOME='~/utilities/sesame/openrdf-sesame-2.6.10/'
  • DATAFAQS_PUBLISH_SESAME_SERVER='http://localhost:8080/openrdf-sesame'
  • DATAFAQS_PUBLISH_SESAME_REPOSITORY_ID='spo-balance'

`` cat load.sc clear.sc connect http://localhost:8080/openrdf-sesame. open spo-balance. load input_112b6d7443852b15aa3153fa41d7ebf3.rdf into http://xmlns.com/foaf/0.1 . exit .

connect http://localhost:8080/openrdf-sesame. open spo-balance. clear http://xmlns.com/foaf/0.1 . exit .



create native. Please specify values for the following variables: Repository ID [native]: spo-balance
Repository title [Native store]: spo-balance Triple indexes [spoc,posc]: spoc,posc Repository created


### DATAFAQS_LOG_DIR

* [df-epoch.sh](https://github.com/timrdf/DataFAQs/blob/master/bin/df-epoch.sh)
* [df-load-triple-store.sh](https://github.com/timrdf/DataFAQs/blob/master/bin/df-load-triple-store.sh)
* [df-clear-triple-store.sh](https://github.com/timrdf/DataFAQs/blob/master/bin/df-clear-triple-store.sh)

If you don't care, set it with `/dev/null`; if you want to poke around, use `export DATAFAQS_LOG_DIR=`pwd``

### DATAFAQS_PROVENANCE_CODE_RAW_BASE

The base URI for the version-controlled source code of the FAqT Services.
Used in [[faqt.python package]] to provide PROV-O assertions about the services.

### DATAFAQS_PROVENANCE_CODE_PAGE_BASE

see DATAFAQS_PROVENANCE_CODE_RAW_BASE. The base URI for the page (not the source code).

### CSV2RDF4LOD_HOME

### TDBROOT

### X_CKAN_API_Key

http://pypi.python.org/pypi/ckanclient requires an API key. The FAqT Services access the key from this environment variable.

### What's next

* [[Installing DataFAQs]]
* [[Prototype deployment details]]
* [[Sample FAqT deployment]]
Clone this wiki locally