-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paths recording refactor #835
Open
trevorgerhardt
wants to merge
21
commits into
dev
Choose a base branch
from
paths-refactor
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Returns times in seconds instead of converting before sending.
If we are returning a single string, concatenating values can be very redundant as IDs may be the same as `short_name`s and stop indexes would need to be parsed to be useful.
Should the tests be using the "HumanReadableIteration" results?
Refactor code around paths recording. - Reduce flags and settings from peppering inner loops - Consolidate regional / taui / single point recording - Reduce code with "side effects" preferring methods that act on passed in data - Modify single point path summaries to produce what the front-end will display.
Translate a path result to data usable by the client side.
- Remove the static initialize method from `PathResultsRecorder` - Move the `summarize` method from `PathResult` -> `RegionalPathResultSummar`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While working on #827 I saw a few opportunities for refactoring to improve code quality around paths recording. The central issue was consolidating the multiple ways paths are recorded. But there were additional issues that were cleaned up in related code paths along the way.
This PR is intended to be a pure refactor with no functional changes.
Terminology:
AnalysisWorkerTask
,ProfileRequest
,RegionalTask
orTravelTimeSurfaceTask
Paths recording
Paths data, which is recorded while routing and propagating, is used in 3 different ways. Single origin-destination pairs, many-to-many pairings, and with a grid (all grid cells to all other grid cells). All three have different path data requirements and produce different results.
PathResultsRecorder
For each request, a
PathsResultsRecorder
is created and initialized if that request asks for paths data. Methods likerecordPathsForTarget
handle the case if path recording is disabled internally and turn into anoop
so that the surrounding code does not need to repeatedly check flags to see if it is enabled.The new
TransitPathsPerIteration
, which records paths in theFastRaptorWorker
, works the same way.Each result type can be produced from the data collected in the new
PathResultsRecorder
.Result types
One-to-one
Mainly for single point analysis, we return a single set of paths to a specific destination which are displayed in the application UI (https://github.com/conveyal/ui/pull/1891). The results are returned in JSON.
An example of the results:
For more details about how path results get translated, see
PathResultSummary.java
(originally created in #827).Many-to-many
For regional analyses, we store multiple path results for each origin destination pair taken from a specific set of non-gridded (freeform) points. The results are stored in a CSV file.
An example row of results (full file):
For path results with multiple trip legs, corresponding entries will be joined together with a delimiter. For example, routes would look like
RED|J2
and there would be two board stops, two alight stops, two ride times, and two wait times.For details about how path results get turned into their CSV compatible data that is sent back to the broker, see
RegionalPathResultsSummary.java
in this PR (created from the summarization code in the originalPathResult.java
).Q: Should times be recorded in seconds instead of floating point minutes? Seconds are more useful for machines, floating point times are more useful for reading. Ex: 2538 seconds vs 42.3 minutes.
Taui from raster grid
For Taui sites, we store sets of route and stop indexes that reference detailed route and stop data in a corresponding file. For each grid cell, we store which path set should be used to get to every other grid cell. We store the results in a custom binary
PATHGRID
file.For details about how path results get turned into binary data, see
TauiPathResultsWriter.java
in this PR (converted fromPathWriter.java
).Further consolidation?
We produce three different sets of results from the same data in three different output formats. This is understandable, as the requirements in each situation are different. But it does make me wonder if there is some more consolidation possible, so that when we are moving between each path result type, working with the end data is more familiar. It's an open question.
Other Refactoring
AnalysisWorkerTask
/ProfileRequest
/RegionalTask
/TravelTimeSurfaceTask
We pass these full request objects around generously, making it unclear which fields of the request each class or function that uses it depends on. Additionally, a more pernicious behavior is repeatedly storing the request object as a instance field to be used later. Not only is it unclear which fields of the request object effect the given class on construction, but each usage of the request object must now be looked at to see which parts of the request effect behavior.
If we are using multiple fields from the request, it may seem "easier" to pass the whole request object as a parameter. But this may hide the fact that we are using a consistent subset of the request object in multiple places.
These groupings should be looked for across all these request objects and where those request objects are used. If there are proper groupings, we should seek to create a composed "configuration" object similar to the backend config. If there are not logical groupings, we should seek to reduce deep usage of these request objects and instead pass the specific values that are needed.
This pull-request can serve somewhat as a guide/case study to the pros and cons of this to help inform a direction to go. I attempted to use just the values necessary where I could. In some cases, I still passed the full request objects mainly because of the "cleanliness" of the surrounding code.
Side effects
Our coding style contains too many side effects. A "side effect" is when an operation creates, changes, or destroys data that was not directly passed in as a parameter or returned as a result. For example:
This is a contrived example. It does not look as bad when its not surrounded by other code. The side effect free version can be thought of as "functions that operate on inputs and return the results". Here is the "side effect free" version:
Side effect free code is easier to understand, test, and refactor. It helps keeps methods smaller, simpler, and more focused.
This is mainly a coding style (or philosophy?) preference. I am certainly not suggesting that it is something that needs to be applied 100% across the board. We are not using a functional language and will still benefit from OOP styles.
I am mainly suggesting that we keep this in mind while creating new code and refactoring old code. (Our UI repo could also benefit by re-affirming these principles).
Some questions we should ask ourselves:
TravelTimeComputer
The
TravelTimeComputer
was trivially easy to transform to a side effect free style. It was a single method that was initialized with just theAnalysisWorkerTask
andTransportNetwork
.The change was essentially:
There are comments in suggesting that the
TravelTimeComputer
andTravelTimeReducer
should be combined. This PR converts theTravelTimeComputer
to a single static method. That method could now live anywhere.PerTargetPropagater
The
PerTargetPropagater
took a bit more work, but seemed appropriate because of the new way of passing thePathResultsRecorder
around and the desire to reduce deep usage of the "request" object. The propagator is only used within theTravelTimeComputer
and therefore shuffling the setup logic from the propagator into separate steps in theTravelTimeComputer
seemed appropriate.The code I was attempting to refactor to capture paths was here:
r5/src/main/java/com/conveyal/r5/profile/PerTargetPropagater.java
Lines 249 to 272 in 44c6839
r5/src/main/java/com/conveyal/r5/profile/PerTargetPropagater.java
Lines 417 to 425 in 44c6839
The above code contained both side effects and deep checks of the request object. In the future, the transformations that remain in the
PerTargetPropagater
may have a better place to move to. For now, I tried to leave the main pieces of logic and setup within the propagater, but split it up in a way that theTravelTimeComputer
could call into it step by step.If you look at the resulting code, the propagater does not return results, but it sets time values of the passed in array, and uses a passed in instance to record paths. There are ways to create code that is purely side effect free, but I am not advocating that we need to be dogmatic about it. Merely to utilize it when it can improve code quality.
r5/src/main/java/com/conveyal/r5/analyst/TravelTimeComputer.java
Lines 408 to 425 in 52d99ad
r5/src/main/java/com/conveyal/r5/analyst/PerTargetPropagater.java
Lines 180 to 184 in 52d99ad