Skip to content

inventory.data.gov

Aaron D Borden edited this page Dec 5, 2019 · 24 revisions

a.k.a Inventory is used by federal agencies to manage metadata for their datasets. Inventory is used to generate the agency's data.json which must be hosted on the agency's website (e.g. agency.gov/data.json). Inventory is a CKAN instance and can be used to host datasets in addition to metadata.

Environments

Instance Url
Production inventory.data.gov
Staging inventory-datagov.dev-ocsit.bsp.gsa.gov
ci inventory.ci.datagov.us

Dependencies

Sub-components:

  • ckan
  • datapusher

Services:

  • apache2
  • rds
  • redis
  • s3
  • solr

Logs

  • /var/log/inventory/ckan.custom.log
  • /var/log/inventory/ckan.error.log
  • /var/log/inventory/datapusher.custom.log
  • /var/log/inventory/datapusher.error.log

Common tasks

Importing from data.json

ckanpyimport is used in onboarding new agencies to inventory.data.gov. This tool imports datasets from a data.json file.

The import script will happily create duplicates, so if there are any existing datasets in the organization, you probably should delete them all first.

Run this from the jumpbox using nohup or tmux so that disconnecting your session does not interrupt the script. The script can take a while depending on how many packages need to be imported (~2 hours for 1000 datasets). You should also test against staging before running against production.

Clone this wiki locally