cleanventory

A ZeroPM R package

The goal of cleanventory is to provide simple functionality to clean and partially curate data sets of common chemical inventories. The aim is to document every step, from the raw (downloaded) files to the final tables.

cleanventory aims to correctly identify all missing values in data sets, validates CAS Registry Numbers (when present) and additionally offers functionality to transform all special characters into ASCII characters.

The dependencies of cleanventory are kept at as minimal as possible: openxlsx for handling .xlsx files, and the trio of pdftools, magick and tesseract to extract data from (image) .pdf files.

We suggest the following packages/functionalities in addition: bit64::as.integer64() to correctly handle the us_tsca$cas_reg_no and us_cdr$chemical_id_wo_dashes columns (kept as double for compatibility).

As of 2022-08-02, the following inventories are included:

Chemical Inventory	Function	Compatible Version(s)	URL
US TSCA	`read_us_tsca()`	2021-08	https://www.epa.gov/tsca-inventory
EU CLP Annex VI	`read_eu_clp()`	9, 10, 13, 14, 15, 17	https://echa.europa.eu/en/information-on-chemicals/annex-vi-to-clp
EU ECI	`read_eu_eci()`	Unknown	https://echa.europa.eu/information-on-chemicals/ec-inventory
Japan NITE	`read_jp_nite()`	March 2022	https://www.nite.go.jp/chem/english/ghs/ghs_download.html
New Zealand IoC	`read_nz_ioc()`	December 2021	https://www.epa.govt.nz/database-search/new-zealand-inventory-of-chemicals-nzioc/
South Korea NCIS	`read_kr_ncis()`	4 May 2022	https://ncis.nier.go.kr/en/mttrList.do
Australia HCIS	`read_au_hcis()`	Unknown	http://hcis.safeworkaustralia.gov.au/HazardousChemical
Australia ICI	`read_au_ici()`	10 February 2022	https://www.industrialchemicals.gov.au/search-inventory
Taiwan CSI	`read_tw_csi()`	Unknown	https://gazette.nat.gov.tw/egFront/detail.do?metaid=73440&log=detailLog https://gazette.nat.gov.tw/egFront/detail.do?metaid=78617&log=detailLog
Philippine ICCS	`read_ph_iccs()`	2017, 2020, 2021	https://chemical.emb.gov.ph/?page_id=138
Japan CSCL	`read_jp_cscl()`	31 May 2022 31 May 2022 1 April 2022	https://www.nite.go.jp/en/chem/chrip/chrip_search/sltLst
Canada DSL	`read_ca_dsl()`	14 June 2022	https://pollution-waste.canada.ca/substances-search/Substance?lang=en
China IECSC	`read_cn_iecsc()`	2013	https://www.mee.gov.cn/gkml/hbb/bgg/201301/t20130131_245810.htm
Nordics SPIN	`read_xn_spin()`	2000	http://www.spin2000.net/spinmyphp/
US CDR	`read_us_cdr()`	2016 2020	https://www.epa.gov/chemical-data-reporting
Malaysia CIMS	`read_my_cims()`	2017	https://cims.dosh.gov.my/

Installation

You can install the development version of cleanventory from GitHub with:

# install.packages("devtools")
remotes::install_github("ZeroPM-H2020/cleanventory")

Examples

This is an example which shows you how to get the data set of the (current) EU CLP Annex VI:

library(cleanventory)

tmp <- tempdir()

url <- paste0(
  "https://echa.europa.eu/documents/10162/17218/",
  "annex_vi_clp_table_atp17_en.xlsx/",
  "4dcec79c-f277-ed68-5e1b-d435900dbe34?t=1638888918944"
)

eu_clp_file <- download.file(
  url, 
  destfile = paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/"),
  quiet = TRUE,
  mode = ifelse(.Platform$OS.type == "windows", "wb", "w")
)

path <- paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/")

eu_clp <- read_eu_clp(path)

invisible(file.remove(path))

head(eu_clp)
#>       index_no international_chemical_identification     ec_no     cas_no
#> 1 001-001-00-9                              hydrogen 215-605-7  1333-74-0
#> 2 001-002-00-4             aluminium lithium hydride 240-877-9 16853-85-3
#> 3 001-003-00-X                        sodium hydride 231-587-3  7646-69-7
#> 4 001-004-00-5                       calcium hydride 232-189-2  7789-78-8
#> 5 003-001-00-4                               lithium 231-102-5  7439-93-2
#> 6 003-002-00-X                        n-hexyllithium 404-950-0 21369-64-2

str(eu_clp)
#> 'data.frame':    4702 obs. of  4 variables:
#>  $ index_no                             : chr  "001-001-00-9" "001-002-00-4" "001-003-00-X" "001-004-00-5" ...
#>  $ international_chemical_identification: chr  "hydrogen" "aluminium lithium hydride" "sodium hydride" "calcium hydride" ...
#>  $ ec_no                                : chr  "215-605-7" "240-877-9" "231-587-3" "232-189-2" ...
#>  $ cas_no                               : chr  "1333-74-0" "16853-85-3" "7646-69-7" "7789-78-8" ...

Acknowledgement

This R package was developed at the Norwegian Geotechnical Institute (NGI) as part of the project ZeroPM: Zero pollution of Persistent, Mobile substances. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101036756.

If you find this package useful and can afford it, please consider making a donation to a humanitarian non-profit organization, such as Sea-Watch. Thank you.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
R		R
inst/tinytest		inst/tinytest
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
cleanventory.Rproj		cleanventory.Rproj
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

cleanventory

Installation

Examples

Acknowledgement

About

Licenses found

Releases

Packages

Languages

License

Licenses found

RaoulWolf/cleanventory

Folders and files

Latest commit

History

Repository files navigation

cleanventory

Installation

Examples

Acknowledgement

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages