ai-data-importer-service

A docker container to load data from a triplestore or csv and convert it to json for easy use in AI algorithms.

Info

The data inporter service handles the loading and converting of data in both csv as SPARQL format. Depending on the data-source, the data is converted to a uniform datastructures that is easily usable for AI algorithms. If the source is a csv file, the file is loaded and parsed. In the case of a SPARQL query, the query is executed on a triple store after which the data is converted to the same format.

Getting started

services:
  ai-data-importer:
    image: redpencil/ai-data-importer
    links:
      - db:database
    environment:
      LOG_LEVEL: "debug"
      MODE: "production"
      ACCEPT_FILENAME: "true"
      ACCEPT_OPTIONS: "true"
      MU_SPARQL_ENDPOINT: "https://dbpedia.org/sparql/"
      MU_SPARQL_UPDATEPOINT: "http://database:8890/sparql"
    volumes:
      - ./config/dataconvert:/config
      - ./share:/share

Reference

Environment variables

ACCEPT_FILENAME if true, the user of the endpoints provided by this service are allowed to submit a filename as argument. If false, a default filename is used and submitted filenames are ignored.
ACCEPT_OPTIONS if true, the user of the endpoints provided by this service are allowed to submit a specific options as argument. If false, a default settings is used and submitted options are ignored.
LOG_LEVEL takes the same options as defined in the Python logging module.
MU_SPARQL_ENDPOINT is used to configure the SPARQL query endpoint.
- By default this is set to http://database:8890/sparql. In that case the triple store used in the backend should be linked to the microservice container as database.
MU_SPARQL_UPDATEPOINT is used to configure the SPARQL update endpoint.
- By default this is set to http://database:8890/sparql. In that case the triple store used in the backend should be linked to the microservice container as database.
MU_APPLICATION_GRAPH specifies the graph in the triple store the microservice will work in.
- By default this is set to http://mu.semte.ch/application. The graph name can be used in the service via settings.graph.
MU_SPARQL_TIMEOUT is used to configure the timeout (in seconds) for SPARQL queries.

File locations

SPARQL files should be located in config/dataconvert/

csv files should be located in share/

Endpoints

`GET /data/query`: load data from a query

Arguments:

filename: filename of file housing the SPARQL query, default input.sparql
limit: Number of items to be fetched from triple store per request, default 1000
global_limit: Total amount of items to be fetched, default all

`GET /data/file`: load data from a csv file

Arguments:

filename: filename of the csv file, default input.csv
columns: columns that should be selected and used, default all

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.drone.yml		.drone.yml
Dockerfile		Dockerfile
README.md		README.md
file_handler.py		file_handler.py
requirements.txt		requirements.txt
web.py		web.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-data-importer-service

Info

Getting started

Reference

Environment variables

File locations

Endpoints

`GET /data/query`: load data from a query

`GET /data/file`: load data from a csv file

About

Releases

Packages

Contributors 2

Languages

stijnrosaer/ai-data-importer-service

Folders and files

Latest commit

History

Repository files navigation

ai-data-importer-service

Info

Getting started

Reference

Environment variables

File locations

Endpoints

GET /data/query: load data from a query

GET /data/file: load data from a csv file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`GET /data/query`: load data from a query

`GET /data/file`: load data from a csv file

Packages