-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IDOT data to data-lake #617
Open
Damonamajor
wants to merge
76
commits into
master
Choose a base branch
from
Investigate-IDOT-data
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+435
−3
Open
Changes from all commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
14d4edf
Push for raw upload
Damonamajor 2ecffe1
Remove unnecessary code
Damonamajor e0c05ae
Testing file
Damonamajor d145bae
Modify HWY so it looks back to 2012
Damonamajor b749f3a
Minor simplifications
wrridgeway 2bfc813
Add cleaning script
Damonamajor da0791d
Quick edit
Damonamajor b15eafa
Push correct version
Damonamajor bc1070c
Remove old file
Damonamajor 917c48d
Working script
Damonamajor 936b972
Working loop
Damonamajor 1bbcce4
Billy edits
Damonamajor 1e44751
text edits
Damonamajor 21aa438
Use correct buckets
Damonamajor 83a0e4e
Text edits
Damonamajor 3206571
lintr
Damonamajor 59c180d
lintr
Damonamajor de905ef
Change geoparquet function
wrridgeway 418ef5e
Add filter for Cook County
Damonamajor 45b7929
Comment
Damonamajor c6961d0
Add if-else statement for County
Damonamajor c0fa4a8
rename data to shapefile_data
Damonamajor 266ba86
Include period instead of named file
Damonamajor cc5a18f
period
Damonamajor 2cb1e0c
Fix s3 pathing
wrridgeway 683a551
Start dbt schema
Damonamajor 8eda339
Merge branch 'Investigate-IDOT-data' of github.com:ccao-data/data-arc…
Damonamajor 76bfe5b
Add year
Damonamajor c1d57e8
Fix path
Damonamajor 89c69e7
Add renaming
Damonamajor cb883e4
Remove unecessary code
wrridgeway 2014cbe
Delete file
Damonamajor 102b721
Remove docs
Damonamajor ad6ed46
Use FC_NAME and FCNAME
Damonamajor 47b0469
Fix brackets
Damonamajor 20b9c4c
Get back to running
wrridgeway 369e228
Make year a character
Damonamajor f07d62f
Add NA handeling
Damonamajor 6ec7267
Reorder columns
Damonamajor b402b77
Add docs
Damonamajor 1dfd84b
better commenting
Damonamajor 22741ab
Remove duplicated code
Damonamajor c0450ff
Rename SURF_YR
Damonamajor cfafd12
Better renaming
Damonamajor 8af8f07
rename traffic, fix surface_year
Damonamajor 75b1bbb
Add mean values
Damonamajor a86203c
Update commenting
Damonamajor 184b0e2
Remove extra line
Damonamajor b074eee
Run function
Damonamajor 4eb17fe
Revert num_lanes
Damonamajor 6840eb3
Get loop working
Damonamajor fb86cc6
Wrapup
Damonamajor 540efdc
Wrapup
Damonamajor 81cfdf6
Linting
wrridgeway 7f1e4c1
linting
Damonamajor fa29c88
linting
Damonamajor 5a285fd
Rename to shapefile_data
Damonamajor 6734b5d
Add commented text
Damonamajor bc37c55
remove slash
Damonamajor 91fab65
remove hash
Damonamajor 9eb0fe3
Updates after viewing output
Damonamajor 8237ace
Remove additional geom column from mutate
wrridgeway c87c58c
lintr
Damonamajor d2b07d6
Merge branch 'Investigate-IDOT-data' of github.com:ccao-data/data-arc…
Damonamajor 9f067ee
Linting
wrridgeway 2fb559a
Update dbt/models/spatial/docs.md
Damonamajor fd1945b
Update etl/scripts-ccao-data-raw-us-east-1/spatial/spatial-environmen…
Damonamajor bc045b0
Update etl/scripts-ccao-data-warehouse-us-east-1/spatial/spatial-envi…
Damonamajor 9810b7e
Billy changes
Damonamajor de0526c
Working file with doc updates
Damonamajor a1b8691
Final?
Damonamajor 02bbdcc
Remove line at end
Damonamajor bc72c99
Rename environ
Damonamajor 87da63b
Add )
Damonamajor a5df331
lintr
Damonamajor 5c60c23
styler
Damonamajor File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
85 changes: 85 additions & 0 deletions
85
etl/scripts-ccao-data-raw-us-east-1/spatial/spatial-environment-traffic.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
library(aws.s3) | ||
library(dplyr) | ||
library(httr) | ||
library(lubridate) | ||
library(purrr) | ||
library(sf) | ||
library(arrow) | ||
|
||
# Define S3 bucket and paths | ||
AWS_S3_RAW_BUCKET <- Sys.getenv("AWS_S3_RAW_BUCKET") | ||
output_bucket <- file.path( | ||
AWS_S3_RAW_BUCKET, | ||
"spatial", "environment", "traffic" | ||
) | ||
|
||
# Get list of available files | ||
years <- map(2012:year(Sys.Date()), \(x) { | ||
if (HEAD(paste0( | ||
"https://apps1.dot.illinois.gov/gist2/gisdata/all", | ||
x, ".zip" | ||
))$status_code == 200) { | ||
x | ||
} | ||
}) %>% | ||
unlist() | ||
|
||
# Function to process each year and upload shapefiles for | ||
# that specific year to S3 | ||
process_shapefiles_for_year <- map(years, \(x) { | ||
remote_file_path <- file.path(output_bucket, paste0(x, ".parquet")) | ||
|
||
# Skip everything if file already exists | ||
if (!object_exists(remote_file_path)) { | ||
# Define the URL for the shapefile ZIP file, dynamically for each year | ||
url <- paste0( | ||
"https://apps1.dot.illinois.gov/gist2/gisdata/all", x, ".zip" | ||
) | ||
|
||
# Create a temporary file to store the downloaded ZIP | ||
temp_zip <- tempfile(fileext = ".zip") | ||
temp_dir <- tempdir() | ||
|
||
# Download the ZIP file to a temporary location | ||
download.file(url = url, destfile = temp_zip) | ||
|
||
message(paste("Shapefile ZIP for year", x, "downloaded successfully.")) | ||
|
||
# Unzip the file into a temporary directory | ||
unzip(temp_zip, exdir = temp_dir) | ||
message(paste( | ||
"Shapefile for year", x, | ||
"unzipped into temporary directory." | ||
)) | ||
|
||
# List files in the unzipped directory and look for the .shp files | ||
unzipped_files <- list.files(temp_dir, recursive = TRUE, full.names = TRUE) | ||
shp_file_for_year <- unzipped_files[grepl( | ||
paste0( | ||
"HWY", | ||
x | ||
), | ||
unzipped_files, | ||
ignore.case = TRUE | ||
) & | ||
grepl("\\.shp$", unzipped_files)] | ||
|
||
# Process only the shapefile that matches the current year | ||
if (length(shp_file_for_year) == 1) { | ||
# Read the shapefile into the environment using sf::st_read | ||
shapefile_data <- sf::st_read(shp_file_for_year) %>% | ||
# Add filter for Cook County. The name changes in different years. | ||
filter(if ("COUNTY" %in% names(.)) { | ||
COUNTY == "016" | ||
} else { | ||
INV_CO == "016" | ||
}) %>% | ||
mutate(year = as.character(x)) | ||
|
||
# Save the shapefile as a GeoParquet file | ||
geoarrow::write_geoparquet(shapefile_data, remote_file_path) | ||
} else { | ||
message(paste("No shapefile found for year", x, ".")) | ||
} | ||
} | ||
}) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do a few word about what's in the data?