README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# nhtsr

<!-- badges: start -->
<!-- badges: end -->

The goal of `nhtsr` is to make it considerably easier for R users to 
interact with NHTS 2017 and 2022 datasets. The package contains eight datasets:

  - `nhts_households` and `nhts22_households`
  - `nhts_persons` and `nhts22_persons`
  - `nhts_vehicles` and `nhts22_vehicles`
  - `nhts_trips` and `nhts22_trips`
  

### Citation: 

From ORNL website:

> To recognize the valuable role of National Household Travel Survey (NHTS) data
in the transportation research process and to facilitate repeatability of the
research, users of NHTS data are asked to formally acknowledge the data source.
Where possible, this acknowledgment should take place in the form of a formal
citation, such as when writing a research report, planning document, on-line
article, and other publications. The citation can be formatted as follows:

```
U.S. Department of Transportation, Federal Highway Administration, 2022
National Household Travel Survey. URL: http://nhts.ornl.gov.
```


## Installation

You can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("byu-transpolab/nhts2017")
```

## Example

Each of the datasets is a properly data-typed `tibble`, derived from the 
`SPSS` files distributed by [Oak Ridge National Laboratory](https://nhts.ornl.gov/).
The variables have attribute labels that appear in RStudio's data set viewer,
and factor variables have correct labels appended.

For instance, to count the number of households completing records for each day,
we can simply do

```{r example}
library(nhtsr)
library(haven)
library(tidyverse)

nhts_households %>%
  group_by(travday) %>%
  summarise(
    count = n(),
    weighted = sum(wthhfin)
  )
```


In one departure from the NHTS public data files, the datasets are `tidy` in that
each field appears only once in the dataset. E.g., the `msasize` variable
 --- indicating the size of the metropolitan area each household resides in ---
is only appended to the `nhts_households` tibble rather than to all four tibbles.
Joining is trivial, however.

```{r joinexample}
nhts22_trips |> 
  left_join(nhts22_households, by = "houseid") |> 
  group_by(msasize) |> 
  summarise(
    mean_trip_length = weighted.mean(trpmiles, wttrdfin)
  )
```

Additionally, the `strttime` and `endtime` fields on the trips data have been
converted from four-character strings (e.g. `1310` for 1:10 PM) into R `datetime`
objects. This required setting a date, which was arbitrarily chosen to be an appropriate
weekday in October 2017 or October 2022

```{r strttime}
ggplot(nhts_trips, aes(x = strttime)) + 
  geom_histogram()
```