feat: CSV Flatfile Pipeline Testing #154

rymarczy · 2023-07-19T10:07:02Z

This PR provides a new testing method test_whole_table. This method takes a CSV representation of Vehicle Position records, converts it to a Parquet file on the fly and runs it through process_gtfs_rt_files to populate GTFS-RT RDS tables.

After Processing, a query selects records from the RDS and compares them against a CSV results file to confirm that there have been no fundamental changes to the Performance Manager processing pipelines.

The input CSV file consists of mostly whole route trips, 3 in each direction for each MBTA Rail line.

This PR required the updating of GTFS and GTFS-RT test files in the SPRINGBOARD test_files folder of the repository to match that date of records from the flat file (May 8th 2023). In an effort to reduce the repository size, all non rail data has also been stripped from the SPRINGBOARD parquet files of RT_TRIP_UPDATES and RT_VEHICLE_POSITIONS.

This Branch was developed with a commit from prior to the inclusion of the PR that removed hash columns from the application. This was done to provide a check against the validity of the changes introduced by that PR.

Asana Task: https://app.asana.com/0/1204931901750655/1205084207879142

mzappitello

two small change requests, but it looks good to me.

mzappitello · 2023-07-19T20:48:45Z

python_src/tests/performance_manager/test_performance_manager.py

+    parquet_file = os.path.join(
+        springboard_dir, parquet_folder, "flat_file.parquet"
+    )
+    os.makedirs(os.path.join(springboard_dir, parquet_folder), exist_ok=True)


we should use the pytest temp directory fixture instead of os mkdir.

https://docs.pytest.org/en/6.2.x/tmpdir.html

Incorporated with pytest temp_path fixture.

mzappitello · 2023-07-19T21:04:19Z

python_src/tests/performance_manager/test_performance_manager.py

+    )
+
+    compare_result = db_result_df.compare(csv_result_df, align_axis=1)
+    print(compare_result, flush=True)


only print this if the assert fails?

Incorporated.

mzappitello

LGTM 🍰

rymarczy requested a review from mzappitello July 19, 2023 12:46

test pipline with input and output csv files

da80368

rymarczy force-pushed the testing_with_flat_file branch from c195dc4 to da80368 Compare July 19, 2023 15:39

mzappitello suggested changes Jul 19, 2023

View reviewed changes

incorporate PR comments

2cd4dc3

rymarczy requested a review from mzappitello July 20, 2023 15:50

mzappitello approved these changes Jul 24, 2023

View reviewed changes

rymarczy merged commit d6c8493 into main Jul 25, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CSV Flatfile Pipeline Testing #154

feat: CSV Flatfile Pipeline Testing #154

rymarczy commented Jul 19, 2023

mzappitello left a comment

mzappitello Jul 19, 2023

rymarczy Jul 20, 2023

mzappitello Jul 19, 2023

rymarczy Jul 20, 2023

mzappitello left a comment

feat: CSV Flatfile Pipeline Testing #154

feat: CSV Flatfile Pipeline Testing #154

Conversation

rymarczy commented Jul 19, 2023

mzappitello left a comment

Choose a reason for hiding this comment

mzappitello Jul 19, 2023

Choose a reason for hiding this comment

rymarczy Jul 20, 2023

Choose a reason for hiding this comment

mzappitello Jul 19, 2023

Choose a reason for hiding this comment

rymarczy Jul 20, 2023

Choose a reason for hiding this comment

mzappitello left a comment

Choose a reason for hiding this comment