-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
133 lines (99 loc) · 6.06 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r}
#| include = FALSE
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# cleanventory <img src="man/figures/logo.svg" align="right" height="139" />
<!-- badges: start -->
[![R-CMD-check](https://github.com/RaoulWolf/cleanventory/workflows/R-CMD-check/badge.svg)](https://github.com/RaoulWolf/cleanventory/actions)
[![Codecov test coverage](https://codecov.io/gh/RaoulWolf/cleanventory/branch/master/graph/badge.svg)](https://app.codecov.io/gh/RaoulWolf/cleanventory?branch=master)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
> A [ZeroPM](https://zeropm.eu/) R package
The goal of cleanventory is to provide simple functionality to clean and
partially curate data sets of common chemical inventories. The aim is to
document every step, from the raw (downloaded) files to the final tables.
cleanventory aims to correctly identify all missing values in data sets,
validates CAS Registry Numbers (when present) and additionally offers
functionality to transform all special characters into ASCII characters.
The dependencies of cleanventory are kept at as minimal as possible:
[openxlsx](https://cran.r-project.org/web/packages/openxlsx) for handling
.xlsx files, and the trio of [pdftools](https://cran.r-project.org/web/packages/pdftools), [magick](https://cran.r-project.org/web/packages/magick) and [tesseract](https://cran.r-project.org/web/packages/tesseract) to extract
data from (image) .pdf files.
We suggest the following packages/functionalities in addition:
[`bit64::as.integer64()`](https://cran.r-project.org/web/packages/bit64) to
correctly handle the `us_tsca$cas_reg_no` and `us_cdr$chemical_id_wo_dashes`
columns (kept as `double` for compatibility).
As of `r Sys.Date()`, the following inventories are included:
| Chemical Inventory | Function | Compatible Version(s) | URL |
|:------------------ |:----------------- |:--------------------- |:--------------------------------------------------------------------------------- |
| US TSCA | `read_us_tsca()` | 2021-08 | https://www.epa.gov/tsca-inventory |
| EU CLP Annex VI | `read_eu_clp()` | 9, 10, 13, 14, 15, 17 | https://echa.europa.eu/en/information-on-chemicals/annex-vi-to-clp |
| EU ECI | `read_eu_eci()` | *Unknown* | https://echa.europa.eu/information-on-chemicals/ec-inventory |
| Japan NITE | `read_jp_nite()` | March 2022 | https://www.nite.go.jp/chem/english/ghs/ghs_download.html |
| New Zealand IoC | `read_nz_ioc()` | December 2021 | https://www.epa.govt.nz/database-search/new-zealand-inventory-of-chemicals-nzioc/ |
| South Korea NCIS | `read_kr_ncis()` | 4 May 2022 | https://ncis.nier.go.kr/en/mttrList.do |
| Australia HCIS | `read_au_hcis()` | *Unknown* | http://hcis.safeworkaustralia.gov.au/HazardousChemical |
| Australia ICI | `read_au_ici()` | 10 February 2022 | https://www.industrialchemicals.gov.au/search-inventory |
| Taiwan CSI | `read_tw_csi()` | *Unknown* | https://gazette.nat.gov.tw/egFront/detail.do?metaid=73440&log=detailLog</br>https://gazette.nat.gov.tw/egFront/detail.do?metaid=78617&log=detailLog |
| Philippine ICCS | `read_ph_iccs()` | 2017, 2020, 2021 | https://chemical.emb.gov.ph/?page_id=138 |
| Japan CSCL | `read_jp_cscl()` | 31 May 2022</br>31 May 2022</br>1 April 2022 | https://www.nite.go.jp/en/chem/chrip/chrip_search/sltLst|
| Canada DSL | `read_ca_dsl()` | 14 June 2022 | https://pollution-waste.canada.ca/substances-search/Substance?lang=en |
| China IECSC | `read_cn_iecsc()` | 2013 | https://www.mee.gov.cn/gkml/hbb/bgg/201301/t20130131_245810.htm |
| Nordics SPIN | `read_xn_spin()` | 2000 | http://www.spin2000.net/spinmyphp/ |
| US CDR | `read_us_cdr()` | 2016</br>2020 | https://www.epa.gov/chemical-data-reporting |
| Malaysia CIMS | `read_my_cims()` | 2017 | https://cims.dosh.gov.my/ |
## Installation
You can install the development version of cleanventory from
[GitHub](https://github.com/) with:
```{r install}
#| eval = FALSE
# install.packages("devtools")
remotes::install_github("ZeroPM-H2020/cleanventory")
```
## Examples
This is an example which shows you how to get the data set of the (current) EU
CLP Annex VI:
```{r, echo=FALSE}
options(timeout = max(300, getOption("timeout")))
```
```{r example}
#| message = FALSE
library(cleanventory)
tmp <- tempdir()
url <- paste0(
"https://echa.europa.eu/documents/10162/17218/",
"annex_vi_clp_table_atp17_en.xlsx/",
"4dcec79c-f277-ed68-5e1b-d435900dbe34?t=1638888918944"
)
eu_clp_file <- download.file(
url,
destfile = paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/"),
quiet = TRUE,
mode = ifelse(.Platform$OS.type == "windows", "wb", "w")
)
path <- paste(tmp, "annex_vi_clp_table_atp17_en.xlsx", sep = "/")
eu_clp <- read_eu_clp(path)
invisible(file.remove(path))
head(eu_clp)
str(eu_clp)
```
## Acknowledgement
This R package was developed at the
[Norwegian Geotechnical Institute (NGI)](https://www.ngi.no/eng) as part of the
project
[ZeroPM: Zero pollution of Persistent, Mobile substances](https://zeropm.eu/).
This project has received funding from the European Union's Horizon 2020
research and innovation programme under grant agreement No 101036756.
---
If you find this package useful and can afford it, please consider making a
donation to a humanitarian non-profit organization, such as
[Sea-Watch](https://sea-watch.org/en/). Thank you.