Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚌 Convert TM data into bus vehicle events #425

Merged
merged 4 commits into from
Sep 7, 2024
Merged

Conversation

mzappitello
Copy link
Contributor

🚌 Convert TM data into bus vehicle events

Convert Transit Master Stop Crossing parquet files and Daily Piece of Work parquet files into datafarmes describing Vehicle Events and Operator / Vehicle assignments.

Comment on lines 12 to 33
def create_dt_from_sam(
service_date_col: pl.Expr, sam_time_col: pl.Expr
) -> pl.Expr:
"""
add a seconds after midnight to a service date to create a datetime object.
seconds after midnight is in boston local time, convert it to utc.
"""
return (
service_date_col.cast(pl.Datetime) + pl.duration(seconds=sam_time_col)
).map_elements(
lambda x: BOSTON_TZ.localize(x).astimezone(UTC_TZ),
return_dtype=pl.Datetime,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nice utility function to convert seconds after midnight columns into datetime columns.

)


def generate_tm_events(tm_files: List[str]) -> pl.DataFrame:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will generate the tm vehicle events dataframe that can be joined with the gtfs rt dataframe to create contain all vehicle events we're aware of.



def get_daily_work_pieces(daily_work_piece_files: List[str]) -> pl.DataFrame:
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will generate a dataframe of daily work pieces, who was driving what bus on a given day. there are a lot of caveats though since i can't seam to get the joins to work perfectly. they are noted in the comments.

We want to take Transit Master data from the springboard bucket and
convert them into Bus Events that can be joined against GTFS Realtime
data. This will give us additional information about when a but arrived
at stops as well as when it hits non revenue timepoints.

Create a new function that takes a list of Transit Master stop crossing
parquet paths and joins it against Transit Master Geo Nodes, Routes,
Trips, and Vehicle Tables. Adjust column names, cast them appropriately,
and do some modification to make them useable by later stages in the
pipeline.

Add test files and test cases to ensure ingestion and transformation is
happening as expected.
Process Daily Work Piece logs into a dataframe of operator / vehicle
records tied to block ids, run ids, and trip ids that can be joined
against bus vehicle events.
the remote files runtime utility was updated and merged on main while
the tm to bus events branch was completed. after rebasing, this patch
wires everything up correctly.
Copy link

Coverage of commit 909796f

Summary coverage rate:
  lines......: 76.4% (2420 of 3169 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/bus_performance_manager/tm_ingestion.py                                |61.5%     26|    -     0|    -      0

Download coverage report

Copy link

github-actions bot commented Sep 7, 2024

Coverage of commit 12371c6

Summary coverage rate:
  lines......: 75.5% (2460 of 3260 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/bus_performance_manager/tm_ingestion.py                                |52.4%     21|    -     0|    -      0

Download coverage report

@rymarczy rymarczy merged commit 45f1c9b into main Sep 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants