README.Rmd

---
output: github_document
always_allow_html: true
editor_options: 
  chunk_output_type: console
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

# COVID-19 U.S. County Model

Bayesian model of COVID-19 cases in U.S. counties.

***

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = FALSE,
  message = FALSE,
  warning = FALSE,
  error = FALSE,
  include = FALSE,
  collapse = TRUE,
  comment ="#>",
  fig.retina = 2
)
library(tidyverse)
library(rstan)
library(tidybayes)
library(sf)
library(leaflet)
library(scales)
library(glue)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```

## Data

The data is from the [COVID-19 Event Risk Planner](https://github.com/appliedbinf/covid19-event-risk-planner), which combines data from several sources including the [NYTimes COVID19 data project](https://github.com/nytimes/covid-19-data) and [U.S. Census](https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html).
It includes U.S. county-level COVID-19 data such as number of cases, deaths, and population.

```{r get-data}
covid_cases <-
  read_csv(
    file = "https://raw.githubusercontent.com/appliedbinf/covid19-event-risk-planner/master/usa_risk_counties.csv",
    col_types = cols(
      GEOID = col_integer(),
      NAME = col_character(),
      stname = col_character(),
      cases = col_double(),
      deaths = col_double(),
      cases_past = col_double(),
      X = col_double(),
      pop = col_double(),
      Nr = col_double(),
      risk = col_double(),
      asc_bias = col_double(),
      event_size = col_double()
    )
  ) %>%
  arrange(GEOID) %>%
  remove_missing(
    na.rm = TRUE,
    vars = c("GEOID", "cases", "pop")
  ) %>%
  select(GEOID, NAME, stname, cases, pop) %>%
  distinct()

counties <-
  sf::read_sf(
    dsn = "https://raw.githubusercontent.com/appliedbinf/covid19-event-risk-planner/master/COVID19-Event-Risk-Planner/map_data/tl_2017_us_county.geojson"
  )

data <- 
  covid_cases %>%
  left_join(counties)
```

## Stan Model

I fit a hierarchical binomial model for the counts of COVID-19 cases in each U.S. county.
The model treats each county as population members and uses partial pooling to estimate county-level COVID-19 cases.
Partial pooling means the county-level COVID-19 probabilities are modeled by a distribution.
This allows for information sharing among these parameters.

The Stan model is below:

```{r model}
model_code <- "
data {
  int<lower=0> N;     // counties
  int<lower=0> y[N];  // cases
  int<lower=0> K[N];  // populations
}
parameters {
  real<lower=0, upper=1> phi;  // population chance of covid
  real<lower=1> kappa;         // population concentration
  vector<lower=0, upper=1>[N] theta;  // chance of covid
}
model {
  kappa ~ pareto(1, 1.5);  // hyperprior
  theta ~ beta(phi * kappa, (1 - phi) * kappa); // prior
  y ~ binomial(K, theta); // likelihood
}
"
```

```{stan, echo = TRUE, include=TRUE, output.var="model"}
data {
  int<lower=0> N;     // counties
  int<lower=0> y[N];  // cases
  int<lower=0> K[N];  // populations
}
parameters {
  real<lower=0, upper=1> phi;  // population chance of covid
  real<lower=1> kappa;         // population concentration
  vector<lower=0, upper=1>[N] theta;  // chance of covid
}
model {
  kappa ~ pareto(1, 1.5);  // hyperprior
  theta ~ beta(phi * kappa, (1 - phi) * kappa); // prior
  y ~ binomial(K, theta); // likelihood
}
```

```{r fit, cache=TRUE}
fit <-
  stan(
    model_code = model_code,
    data=list(
      "N"=nrow(data),
      "K"=data$pop,
      "y"=data$cases
    ),
    iter=10000,
    chains=4
  )
```

## Results

```{r post-processing}
predictions <- 
  fit %>%
  recover_types() %>%
  gather_draws(theta[i])  %>%
  group_by(i) %>%
  summarize(
    posterior_median = median(.value),
  ) %>%
  ungroup()

map_data <- 
  data %>%
  bind_cols(predictions) %>%
  mutate(
    yhat = pop * posterior_median,
    res = yhat - cases,
    stdres = res/sd(res),
    pct = (yhat - cases)/cases
  ) %>%
  st_sf()
```

### COVID-19 Rate

```{r map-estimates, include=TRUE, screenshot.opts=list(zoom = 2)}
leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lat = 37.1, lng = -95.7, zoom = 4) %>%
  addPolygons(
    data = map_data,
    color = "#444444",
    weight = 0.2,
    smoothFactor = 0.1,
    opacity = 1.0,
    fillOpacity = 0.7,
    fillColor = ~colorNumeric(
      palette = "viridis",
      domain = map_data$posterior_median
      )(posterior_median),
    highlight = highlightOptions(weight = 1),
    label = glue("{map_data$NAME}, {map_data$stname}\n{percent(map_data$posterior_median, accuracy = 0.1)}")
  ) %>%
  addLegend(
    data = map_data,
    position = "bottomright",
    pal = colorNumeric(
      palette = "viridis",
      domain = map_data$posterior_median
    ),
    values = ~posterior_median,
    title = "Rate",
    opacity = .7,
    labFormat = function(type, cuts, p) {
      percent(cuts)
      }
  )
```

### Residuals

```{r map-residuals, include=TRUE, screenshot.opts=list(zoom = 2)}
leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lat = 37.1, lng = -95.7, zoom = 4) %>%
  addPolygons(
    data = map_data,
    color = "#444444",
    weight = 0.2,
    smoothFactor = 0.1,
    opacity = 1.0,
    fillOpacity = 0.7,
    fillColor = ~colorNumeric(
      palette = "viridis",
      domain = map_data$res
    )(res),
    highlight = highlightOptions(weight = 1),
    label = glue("{map_data$NAME}, {map_data$stname}\n{percent(map_data$stdres, accuracy = 0.1)}")
  ) %>%
  addLegend(
    data = map_data,
    position = "bottomright",
    pal = colorNumeric(
      palette = "viridis",
      domain = map_data$res
    ),
    values = ~res,
    title = "Residuals",
    opacity = .7
  )
```