FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

rymarczy · 2024-07-10T18:52:55Z

This change drops using canonical_stop_sequence for filtering out initial trip stations and instead uses static schedule data to drop any events relating to the initial canonical station of a trip.

Asana Task: https://app.asana.com/0/1205827492903547/1207100677882489

github-actions · 2024-07-10T18:57:43Z

Coverage of commit `026d983`

Summary coverage rate:
  lines......: 76.6% (2125 of 2773 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

mzappitello

a few small questions.

mzappitello · 2024-07-11T21:34:09Z

src/lamp_py/tableau/jobs/rt_rail.py

+            "           ( "
+            "           SELECT "
+            "               DISTINCT ON (coalesce(static_trips.branch_route_id,static_trips.trunk_route_id),static_route_patterns.direction_id,static_route_patterns.static_version_key) "
+            "               static_route_patterns.direction_id as direction_id "
+            "               , static_route_patterns.representative_trip_id as representative_trip_id "
+            "               , static_trips.trunk_route_id as trunk_route_id "
+            "               , coalesce(static_trips.branch_route_id, static_trips.trunk_route_id) as route_id "
+            "               , static_route_patterns.static_version_key as static_version_key "
+            "           FROM "
+            "               static_route_patterns "
+            "           JOIN static_trips on "
+            "               static_route_patterns.representative_trip_id = static_trips.trip_id "
+            "               AND static_route_patterns.static_version_key = static_trips.static_version_key "
+            "           JOIN static_routes on "
+            "               static_routes.route_id = static_trips.route_id "
+            "               AND static_routes.static_version_key = static_trips.static_version_key "
+            "           WHERE "
+            "               static_routes.route_type < 2 "
+            "               AND (static_route_patterns.route_pattern_typicality = 1 "
+            "                   or static_route_patterns.route_pattern_typicality = 5) "
+            "           order by "
+            "               coalesce(static_trips.branch_route_id, static_trips.trunk_route_id), "
+            "               static_route_patterns.direction_id, "
+            "               static_route_patterns.static_version_key, "
+            "               static_route_patterns.route_pattern_typicality desc) as canon_trips "
+            "       JOIN static_stop_times on "
+            "           canon_trips.representative_trip_id = static_stop_times.trip_id "
+            "           AND canon_trips.static_version_key = static_stop_times.static_version_key "
+            "       JOIN static_stops on "
+            "           static_stop_times.stop_id = static_stops.stop_id "
+            "           AND static_stop_times.static_version_key = static_stops.static_version_key ) as drop_station "


sanity check for my sake.

this gets every station on every trip, then its filtered out to only be the first station for every trip.

we use that for the filtering later rather than the ve.canonical_stop_sequence

Yes, this subquery is the same one used to create canonical_stop_sequence but we are limiting it to the first parent_station in each each direction and then doing an anti-join with the drop_flag field.

mzappitello · 2024-07-11T21:34:27Z

src/lamp_py/tableau/jobs/rt_rail.py

-            "   ve.service_date, vt.route_id, vt.direction_id, vt.vehicle_id, vt.start_time"
+            "   ve.service_date, vt.vehicle_id, vt.start_time"


was the order change requested?

It wasn't requested, but we're only doing this to reduce the parquet file size and dropping the two SORT field fields gave us a speed up in query time with almost no change in file size.

github-actions · 2024-07-16T10:32:07Z

Coverage of commit `8a23fa9`

Summary coverage rate:
  lines......: 77.2% (2141 of 2773 lines)
  functions..: no data found
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

mzappitello · 2024-07-17T14:20:34Z

src/lamp_py/tableau/jobs/rt_rail.py

-            f" AND vt.service_date >= {max_start_date.strftime('%Y%m%d')} ",
+            f" AND ve.service_date >= {max_start_date.strftime('%Y%m%d')} ",


faster because the base of the query is on vehicle events?

mzappitello

🍰

use static schedule filter

026d983

mzappitello approved these changes Jul 11, 2024

View reviewed changes

vt where faster

8a23fa9

mzappitello reviewed Jul 17, 2024

View reviewed changes

mzappitello approved these changes Jul 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

rymarczy commented Jul 10, 2024

github-actions bot commented Jul 10, 2024

mzappitello left a comment

mzappitello Jul 11, 2024

rymarczy Jul 17, 2024

mzappitello Jul 11, 2024

rymarczy Jul 17, 2024

github-actions bot commented Jul 16, 2024

mzappitello Jul 17, 2024

mzappitello left a comment

		" ve.service_date, vt.route_id, vt.direction_id, vt.vehicle_id, vt.start_time"
		" ve.service_date, vt.vehicle_id, vt.start_time"

		f" AND vt.service_date >= {max_start_date.strftime('%Y%m%d')} ",
		f" AND ve.service_date >= {max_start_date.strftime('%Y%m%d')} ",

FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

Are you sure you want to change the base?

FIX (LAMP_ALL_RT_fields): Use Static Schedule to filter Initial Station of Trip #398

Conversation

rymarczy commented Jul 10, 2024

github-actions bot commented Jul 10, 2024

Coverage of commit 026d983

mzappitello left a comment

Choose a reason for hiding this comment

mzappitello Jul 11, 2024

Choose a reason for hiding this comment

rymarczy Jul 17, 2024

Choose a reason for hiding this comment

mzappitello Jul 11, 2024

Choose a reason for hiding this comment

rymarczy Jul 17, 2024

Choose a reason for hiding this comment

github-actions bot commented Jul 16, 2024

Coverage of commit 8a23fa9

mzappitello Jul 17, 2024

Choose a reason for hiding this comment

mzappitello left a comment

Choose a reason for hiding this comment

Coverage of commit `026d983`

Coverage of commit `8a23fa9`