Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IDOT data to data-lake #617

Open
wants to merge 76 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
14d4edf
Push for raw upload
Damonamajor Oct 3, 2024
2ecffe1
Remove unnecessary code
Damonamajor Oct 3, 2024
e0c05ae
Testing file
Damonamajor Oct 3, 2024
d145bae
Modify HWY so it looks back to 2012
Damonamajor Oct 3, 2024
b749f3a
Minor simplifications
wrridgeway Oct 4, 2024
2bfc813
Add cleaning script
Damonamajor Oct 4, 2024
da0791d
Quick edit
Damonamajor Oct 4, 2024
b15eafa
Push correct version
Damonamajor Oct 4, 2024
bc1070c
Remove old file
Damonamajor Oct 7, 2024
917c48d
Working script
Damonamajor Oct 7, 2024
936b972
Working loop
Damonamajor Oct 7, 2024
1bbcce4
Billy edits
Damonamajor Oct 7, 2024
1e44751
text edits
Damonamajor Oct 7, 2024
21aa438
Use correct buckets
Damonamajor Oct 7, 2024
83a0e4e
Text edits
Damonamajor Oct 7, 2024
3206571
lintr
Damonamajor Oct 8, 2024
59c180d
lintr
Damonamajor Oct 8, 2024
de905ef
Change geoparquet function
wrridgeway Oct 8, 2024
418ef5e
Add filter for Cook County
Damonamajor Oct 8, 2024
45b7929
Comment
Damonamajor Oct 8, 2024
c6961d0
Add if-else statement for County
Damonamajor Oct 8, 2024
c0fa4a8
rename data to shapefile_data
Damonamajor Oct 8, 2024
266ba86
Include period instead of named file
Damonamajor Oct 8, 2024
cc5a18f
period
Damonamajor Oct 8, 2024
2cb1e0c
Fix s3 pathing
wrridgeway Oct 8, 2024
683a551
Start dbt schema
Damonamajor Oct 8, 2024
8eda339
Merge branch 'Investigate-IDOT-data' of github.com:ccao-data/data-arc…
Damonamajor Oct 8, 2024
76bfe5b
Add year
Damonamajor Oct 8, 2024
c1d57e8
Fix path
Damonamajor Oct 8, 2024
89c69e7
Add renaming
Damonamajor Oct 8, 2024
cb883e4
Remove unecessary code
wrridgeway Oct 9, 2024
2014cbe
Delete file
Damonamajor Oct 9, 2024
102b721
Remove docs
Damonamajor Oct 9, 2024
ad6ed46
Use FC_NAME and FCNAME
Damonamajor Oct 9, 2024
47b0469
Fix brackets
Damonamajor Oct 9, 2024
20b9c4c
Get back to running
wrridgeway Oct 9, 2024
369e228
Make year a character
Damonamajor Oct 9, 2024
f07d62f
Add NA handeling
Damonamajor Oct 10, 2024
6ec7267
Reorder columns
Damonamajor Oct 10, 2024
b402b77
Add docs
Damonamajor Oct 10, 2024
1dfd84b
better commenting
Damonamajor Oct 10, 2024
22741ab
Remove duplicated code
Damonamajor Oct 10, 2024
c0450ff
Rename SURF_YR
Damonamajor Oct 10, 2024
cfafd12
Better renaming
Damonamajor Oct 10, 2024
8af8f07
rename traffic, fix surface_year
Damonamajor Oct 10, 2024
75b1bbb
Add mean values
Damonamajor Oct 17, 2024
a86203c
Update commenting
Damonamajor Oct 17, 2024
184b0e2
Remove extra line
Damonamajor Oct 17, 2024
b074eee
Run function
Damonamajor Oct 17, 2024
4eb17fe
Revert num_lanes
Damonamajor Oct 17, 2024
6840eb3
Get loop working
Damonamajor Oct 17, 2024
fb86cc6
Wrapup
Damonamajor Oct 17, 2024
540efdc
Wrapup
Damonamajor Oct 17, 2024
81cfdf6
Linting
wrridgeway Oct 17, 2024
7f1e4c1
linting
Damonamajor Oct 17, 2024
fa29c88
linting
Damonamajor Oct 17, 2024
5a285fd
Rename to shapefile_data
Damonamajor Oct 21, 2024
6734b5d
Add commented text
Damonamajor Oct 21, 2024
bc37c55
remove slash
Damonamajor Oct 21, 2024
91fab65
remove hash
Damonamajor Oct 21, 2024
9eb0fe3
Updates after viewing output
Damonamajor Oct 22, 2024
8237ace
Remove additional geom column from mutate
wrridgeway Oct 23, 2024
c87c58c
lintr
Damonamajor Nov 1, 2024
d2b07d6
Merge branch 'Investigate-IDOT-data' of github.com:ccao-data/data-arc…
Damonamajor Nov 1, 2024
9f067ee
Linting
wrridgeway Nov 3, 2024
2fb559a
Update dbt/models/spatial/docs.md
Damonamajor Nov 4, 2024
fd1945b
Update etl/scripts-ccao-data-raw-us-east-1/spatial/spatial-environmen…
Damonamajor Nov 4, 2024
bc045b0
Update etl/scripts-ccao-data-warehouse-us-east-1/spatial/spatial-envi…
Damonamajor Nov 4, 2024
9810b7e
Billy changes
Damonamajor Nov 4, 2024
de0526c
Working file with doc updates
Damonamajor Nov 4, 2024
a1b8691
Final?
Damonamajor Nov 4, 2024
02bbdcc
Remove line at end
Damonamajor Nov 4, 2024
bc72c99
Rename environ
Damonamajor Nov 4, 2024
87da63b
Add )
Damonamajor Nov 4, 2024
a5df331
lintr
Damonamajor Nov 4, 2024
5c60c23
styler
Damonamajor Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions dbt/models/spatial/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,22 @@ Includes townships within the City of Chicago, which are technically defunct.
**Geometry:** `MULTIPOLYGON`
{% enddocs %}

# traffic

{% docs table_traffic %}

Illinois Department of Transportation data source from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do a few word about what's in the data?

[https://apps1.dot.illinois.gov/gist2/](https://apps1.dot.illinois.gov/gist2/).
Data focuses on five features; lanes, speed limits, traffic count, road type,
and surface type. Some columns are not present in all years of data (for example
speed limit in 2012) Data for columns is not universally present so we average
numeric values for roads which overlap and have a matching name. For example,
if segment B touches segment A and C with speed limits of 25 and 30, the speed
limit for segment B will be 27.5.

Damonamajor marked this conversation as resolved.
Show resolved Hide resolved
**Geometry:** `MULTILINESTRING`
{% enddocs %}

# transit_dict

{% docs table_transit_dict %}
Expand Down
3 changes: 3 additions & 0 deletions dbt/models/spatial/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ sources:
- name: township
description: '{{ doc("table_township") }}'

- name: traffic
description: '{{ doc("table_traffic") }}'

- name: transit_dict
description: '{{ doc("table_transit_dict") }}'

Expand Down
36 changes: 33 additions & 3 deletions etl/renv.lock
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,20 @@
],
"Hash": "ae4a925e0f6bb1b7e5fa96b739c5221a"
},
"RSocrata": {
"Package": "RSocrata",
"Version": "1.7.15-1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"httr",
"jsonlite",
"mime",
"plyr"
],
"Hash": "435ebea3fa736ab1317c79a5fa34fa55"
},
"Rcpp": {
"Package": "Rcpp",
"Version": "1.0.12",
Expand Down Expand Up @@ -1926,8 +1940,13 @@
"noctua": {
"Package": "noctua",
"Version": "2.6.2",
"Source": "Repository",
"Repository": "CRAN",
"Source": "GitHub",
wrridgeway marked this conversation as resolved.
Show resolved Hide resolved
"RemoteType": "github",
"RemoteHost": "api.github.com",
"RemoteUsername": "dyfanjones",
"RemoteRepo": "noctua",
"RemoteRef": "master",
"RemoteSha": "23a4cfbf537407c7a1547fc13ba771ba2eb098e0",
"Requirements": [
"DBI",
"R",
Expand All @@ -1938,7 +1957,7 @@
"utils",
"uuid"
],
"Hash": "c03d73125d695e80b35b4bb3eacf0358"
"Hash": "b3fc482d0ae2f51ed324fd3da66471b4"
},
"numDeriv": {
"Package": "numDeriv",
Expand Down Expand Up @@ -2276,6 +2295,17 @@
"Repository": "CRAN",
"Hash": "09eb987710984fc2905c7129c7d85e65"
},
"plyr": {
"Package": "plyr",
"Version": "1.8.9",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"Rcpp"
],
"Hash": "6b8177fd19982f0020743fadbfdbd933"
},
"png": {
"Package": "png",
"Version": "0.1-8",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
library(aws.s3)
library(dplyr)
library(httr)
library(lubridate)
library(purrr)
library(sf)
library(arrow)

# Define S3 bucket and paths
AWS_S3_RAW_BUCKET <- Sys.getenv("AWS_S3_RAW_BUCKET")
output_bucket <- file.path(
AWS_S3_RAW_BUCKET,
"spatial", "environment", "traffic"
)

# Get list of available files
years <- map(2012:year(Sys.Date()), \(x) {
if (HEAD(paste0(
"https://apps1.dot.illinois.gov/gist2/gisdata/all",
x, ".zip"
))$status_code == 200) {
x
}
}) %>%
unlist()

# Function to process each year and upload shapefiles for
# that specific year to S3
process_shapefiles_for_year <- map(years, \(x) {
remote_file_path <- file.path(output_bucket, paste0(x, ".parquet"))

# Skip everything if file already exists
if (!object_exists(remote_file_path)) {
# Define the URL for the shapefile ZIP file, dynamically for each year
url <- paste0(
"https://apps1.dot.illinois.gov/gist2/gisdata/all", x, ".zip"
)

# Create a temporary file to store the downloaded ZIP
temp_zip <- tempfile(fileext = ".zip")
temp_dir <- tempdir()

# Download the ZIP file to a temporary location
download.file(url = url, destfile = temp_zip)

message(paste("Shapefile ZIP for year", x, "downloaded successfully."))

# Unzip the file into a temporary directory
unzip(temp_zip, exdir = temp_dir)
message(paste(
"Shapefile for year", x,
"unzipped into temporary directory."
))

# List files in the unzipped directory and look for the .shp files
unzipped_files <- list.files(temp_dir, recursive = TRUE, full.names = TRUE)
shp_file_for_year <- unzipped_files[grepl(
paste0(
"HWY",
x
),
unzipped_files,
ignore.case = TRUE
) &
grepl("\\.shp$", unzipped_files)]

# Process only the shapefile that matches the current year
if (length(shp_file_for_year) == 1) {
# Read the shapefile into the environment using sf::st_read
shapefile_data <- sf::st_read(shp_file_for_year) %>%
# Add filter for Cook County. The name changes in different years.
filter(if ("COUNTY" %in% names(.)) {
COUNTY == "016"
} else {
INV_CO == "016"
}) %>%
mutate(year = as.character(x))

# Save the shapefile as a GeoParquet file
geoarrow::write_geoparquet(shapefile_data, remote_file_path)
} else {
message(paste("No shapefile found for year", x, "."))
}
}
})
Loading
Loading