Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GTFS_RT ingestion pipeline for older files #156

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

mzappitello
Copy link
Contributor

GTFS_RT files generated from Sept 2019 through March 2020 have a different naming scheme and different extensions. Both changes break the way our ingestion pipeline currently works. Patch these issues and update our tests to check them.

  • ConfigType.from_filename updated to handle new trip updates and vehicle position file names. (NOTE, I put all the other files from these dates through this function, and they were a ok.
  • GtfsRtConverter.pyarrow_from_gz tries to choose the compression algo from the file extension, but tries to force a gz algo if that fails.

Asana Task: https://app.asana.com/0/1204931901750655/1205130684703916/f

GTFS_RT files generated from Sept 2019 through March 2020 have a
different naming scheme and different extensions. Both changes break the
way our ingestion pipeline currently works. Patch these issues and
update our tests to check them.

* `ConfigType.from_filename` updated to handle new trip updates and
  vehicle position file names. (NOTE, I put all the other files from
  these dates through this function, and they were a ok.
* `GtfsRtConverter.pyarrow_from_gz` tries to choose the compression algo
  from the file extension, but tries to force a gz algo if that fails.
@@ -48,18 +48,29 @@ def from_filename(cls, filename: str) -> ConfigType:
# disable too many returns error message
if "mbta.com_realtime_Alerts_enhanced" in filename:
return cls.RT_ALERTS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added spacing between blocks to make it read a little easier.

@rymarczy
Copy link
Collaborator

rymarczy commented Aug 1, 2023

LGTM

@mzappitello mzappitello merged commit a624fc0 into main Aug 1, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants