Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CSV Flatfile Pipeline Testing #154

Merged
merged 2 commits into from
Jul 25, 2023
Merged

feat: CSV Flatfile Pipeline Testing #154

merged 2 commits into from
Jul 25, 2023

Conversation

rymarczy
Copy link
Collaborator

This PR provides a new testing method test_whole_table. This method takes a CSV representation of Vehicle Position records, converts it to a Parquet file on the fly and runs it through process_gtfs_rt_files to populate GTFS-RT RDS tables.

After Processing, a query selects records from the RDS and compares them against a CSV results file to confirm that there have been no fundamental changes to the Performance Manager processing pipelines.

The input CSV file consists of mostly whole route trips, 3 in each direction for each MBTA Rail line.

This PR required the updating of GTFS and GTFS-RT test files in the SPRINGBOARD test_files folder of the repository to match that date of records from the flat file (May 8th 2023). In an effort to reduce the repository size, all non rail data has also been stripped from the SPRINGBOARD parquet files of RT_TRIP_UPDATES and RT_VEHICLE_POSITIONS.

This Branch was developed with a commit from prior to the inclusion of the PR that removed hash columns from the application. This was done to provide a check against the validity of the changes introduced by that PR.

Asana Task: https://app.asana.com/0/1204931901750655/1205084207879142

Copy link
Contributor

@mzappitello mzappitello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small change requests, but it looks good to me.

parquet_file = os.path.join(
springboard_dir, parquet_folder, "flat_file.parquet"
)
os.makedirs(os.path.join(springboard_dir, parquet_folder), exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the pytest temp directory fixture instead of os mkdir.

https://docs.pytest.org/en/6.2.x/tmpdir.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated with pytest temp_path fixture.

)

compare_result = db_result_df.compare(csv_result_df, align_axis=1)
print(compare_result, flush=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only print this if the assert fails?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated.

Copy link
Contributor

@mzappitello mzappitello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🍰

@rymarczy rymarczy merged commit d6c8493 into main Jul 25, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants