Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rymarczy
Copy link
Collaborator

This change drops using canonical_stop_sequence for filtering out initial trip stations and instead uses static schedule data to drop any events relating to the initial canonical station of a trip.

Asana Task: https://app.asana.com/0/1205827492903547/1207100677882489

Copy link

Coverage of commit 026d983

Summary coverage rate:
  lines......: 76.6% (2125 of 2773 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

Copy link
Contributor

@mzappitello mzappitello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small questions.

Comment on lines +93 to +123
" ( "
" SELECT "
" DISTINCT ON (coalesce(static_trips.branch_route_id,static_trips.trunk_route_id),static_route_patterns.direction_id,static_route_patterns.static_version_key) "
" static_route_patterns.direction_id as direction_id "
" , static_route_patterns.representative_trip_id as representative_trip_id "
" , static_trips.trunk_route_id as trunk_route_id "
" , coalesce(static_trips.branch_route_id, static_trips.trunk_route_id) as route_id "
" , static_route_patterns.static_version_key as static_version_key "
" FROM "
" static_route_patterns "
" JOIN static_trips on "
" static_route_patterns.representative_trip_id = static_trips.trip_id "
" AND static_route_patterns.static_version_key = static_trips.static_version_key "
" JOIN static_routes on "
" static_routes.route_id = static_trips.route_id "
" AND static_routes.static_version_key = static_trips.static_version_key "
" WHERE "
" static_routes.route_type < 2 "
" AND (static_route_patterns.route_pattern_typicality = 1 "
" or static_route_patterns.route_pattern_typicality = 5) "
" order by "
" coalesce(static_trips.branch_route_id, static_trips.trunk_route_id), "
" static_route_patterns.direction_id, "
" static_route_patterns.static_version_key, "
" static_route_patterns.route_pattern_typicality desc) as canon_trips "
" JOIN static_stop_times on "
" canon_trips.representative_trip_id = static_stop_times.trip_id "
" AND canon_trips.static_version_key = static_stop_times.static_version_key "
" JOIN static_stops on "
" static_stop_times.stop_id = static_stops.stop_id "
" AND static_stop_times.static_version_key = static_stops.static_version_key ) as drop_station "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity check for my sake.

this gets every station on every trip, then its filtered out to only be the first station for every trip.

we use that for the filtering later rather than the ve.canonical_stop_sequence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this subquery is the same one used to create canonical_stop_sequence but we are limiting it to the first parent_station in each each direction and then doing an anti-join with the drop_flag field.

Comment on lines -110 to +164
" ve.service_date, vt.route_id, vt.direction_id, vt.vehicle_id, vt.start_time"
" ve.service_date, vt.vehicle_id, vt.start_time"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the order change requested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't requested, but we're only doing this to reduce the parquet file size and dropping the two SORT field fields gave us a speed up in query time with almost no change in file size.

Copy link

Coverage of commit 8a23fa9

Summary coverage rate:
  lines......: 77.2% (2141 of 2773 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

f" AND vt.service_date >= {max_start_date.strftime('%Y%m%d')} ",
f" AND ve.service_date >= {max_start_date.strftime('%Y%m%d')} ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faster because the base of the query is on vehicle events?

Copy link
Contributor

@mzappitello mzappitello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants