Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths recording refactor #835

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from
Open

Paths recording refactor #835

wants to merge 21 commits into from

Conversation

trevorgerhardt
Copy link
Member

@trevorgerhardt trevorgerhardt commented Nov 2, 2022

While working on #827 I saw a few opportunities for refactoring to improve code quality around paths recording. The central issue was consolidating the multiple ways paths are recorded. But there were additional issues that were cleaned up in related code paths along the way.

This PR is intended to be a pure refactor with no functional changes.

Terminology:

  • Path: the access mode, routes, stops, and egress mode used to get from an origin to a destination.
  • Iteration: the departure time, wait times, and total duration of a specific usage of a path.
  • Path result: a combined path & iteration. Each iteration uses a single path. Each path can be used at multiple iterations.
  • Request: One of AnalysisWorkerTask, ProfileRequest, RegionalTask or TravelTimeSurfaceTask

Paths recording

Paths data, which is recorded while routing and propagating, is used in 3 different ways. Single origin-destination pairs, many-to-many pairings, and with a grid (all grid cells to all other grid cells). All three have different path data requirements and produce different results.

PathResultsRecorder

For each request, a PathsResultsRecorder is created and initialized if that request asks for paths data. Methods like recordPathsForTarget handle the case if path recording is disabled internally and turn into a noop so that the surrounding code does not need to repeatedly check flags to see if it is enabled.

The new TransitPathsPerIteration, which records paths in the FastRaptorWorker, works the same way.

Each result type can be produced from the data collected in the new PathResultsRecorder.

Result types

One-to-one

Mainly for single point analysis, we return a single set of paths to a specific destination which are displayed in the application UI (https://github.com/conveyal/ui/pull/1891). The results are returned in JSON.

An example of the results:

image

For more details about how path results get translated, see PathResultSummary.java (originally created in #827).

Many-to-many

For regional analyses, we store multiple path results for each origin destination pair taken from a specific set of non-gridded (freeform) points. The results are stored in a CSV file.

An example row of results (full file):

origin destination routes boardStops alightStops rideTimes accessTime egressTime transferTime waitTimes totalTime nIterations
Grosvenor Brookland RED 3542 4710 39.0 1.0 1.3 0.0 1.0 42.3 112

For path results with multiple trip legs, corresponding entries will be joined together with a delimiter. For example, routes would look like RED|J2 and there would be two board stops, two alight stops, two ride times, and two wait times.

For details about how path results get turned into their CSV compatible data that is sent back to the broker, see RegionalPathResultsSummary.java in this PR (created from the summarization code in the original PathResult.java).

Q: Should times be recorded in seconds instead of floating point minutes? Seconds are more useful for machines, floating point times are more useful for reading. Ex: 2538 seconds vs 42.3 minutes.

Taui from raster grid

For Taui sites, we store sets of route and stop indexes that reference detailed route and stop data in a corresponding file. For each grid cell, we store which path set should be used to get to every other grid cell. We store the results in a custom binary PATHGRID file.

For details about how path results get turned into binary data, see TauiPathResultsWriter.java in this PR (converted from PathWriter.java).

Further consolidation?

We produce three different sets of results from the same data in three different output formats. This is understandable, as the requirements in each situation are different. But it does make me wonder if there is some more consolidation possible, so that when we are moving between each path result type, working with the end data is more familiar. It's an open question.

Other Refactoring

AnalysisWorkerTask / ProfileRequest / RegionalTask / TravelTimeSurfaceTask

We pass these full request objects around generously, making it unclear which fields of the request each class or function that uses it depends on. Additionally, a more pernicious behavior is repeatedly storing the request object as a instance field to be used later. Not only is it unclear which fields of the request object effect the given class on construction, but each usage of the request object must now be looked at to see which parts of the request effect behavior.

If we are using multiple fields from the request, it may seem "easier" to pass the whole request object as a parameter. But this may hide the fact that we are using a consistent subset of the request object in multiple places.

These groupings should be looked for across all these request objects and where those request objects are used. If there are proper groupings, we should seek to create a composed "configuration" object similar to the backend config. If there are not logical groupings, we should seek to reduce deep usage of these request objects and instead pass the specific values that are needed.

This pull-request can serve somewhat as a guide/case study to the pros and cons of this to help inform a direction to go. I attempted to use just the values necessary where I could. In some cases, I still passed the full request objects mainly because of the "cleanliness" of the surrounding code.

Side effects

Our coding style contains too many side effects. A "side effect" is when an operation creates, changes, or destroys data that was not directly passed in as a parameter or returned as a result. For example:

class WithSideEffects {
  data = null;
  result = null;
  compute () {
    this.result = slowFunctionOnData(this.data)
  }
}
const sideEffect = new WithSideEffects()
sideEffect.data = [1, 2, 3, 4]
sideEffect.compute()
console.log(sideEffect.results)

This is a contrived example. It does not look as bad when its not surrounded by other code. The side effect free version can be thought of as "functions that operate on inputs and return the results". Here is the "side effect free" version:

function computeWithoutSideEffects (data) {
  return slowFunctionOnData(data)
}
const data = [1, 2, 3, 4]
const result = computeWithoutSideEffects(data) // The key line
console.log(result)

Side effect free code is easier to understand, test, and refactor. It helps keeps methods smaller, simpler, and more focused.

This is mainly a coding style (or philosophy?) preference. I am certainly not suggesting that it is something that needs to be applied 100% across the board. We are not using a functional language and will still benefit from OOP styles.

I am mainly suggesting that we keep this in mind while creating new code and refactoring old code. (Our UI repo could also benefit by re-affirming these principles).

Some questions we should ask ourselves:

  • Should this be a class method that operates on instance values or can it be a static method that only operates on inputs?
  • Is this class hard to keep side effect free because it is trying to do too much?

TravelTimeComputer

The TravelTimeComputer was trivially easy to transform to a side effect free style. It was a single method that was initialized with just the AnalysisWorkerTask and TransportNetwork.

The change was essentially:

// Before
var computer = new TravelTimeComputer(request, network)
return computer.compute()
// After
return TravelTimeComputer.compute(request, network)

There are comments in suggesting that the TravelTimeComputer and TravelTimeReducer should be combined. This PR converts the TravelTimeComputer to a single static method. That method could now live anywhere.

PerTargetPropagater

The PerTargetPropagater took a bit more work, but seemed appropriate because of the new way of passing the PathResultsRecorder around and the desire to reduce deep usage of the "request" object. The propagator is only used within the TravelTimeComputer and therefore shuffling the setup logic from the propagator into separate steps in the TravelTimeComputer seemed appropriate.

The code I was attempting to refactor to capture paths was here:

// Improve upon these non-transit travel times based on transit travel times to nearby stops.
// This fills in perIterationTravelTimes and perIterationPaths for one particular target.
timer.propagation.start();
propagateTransit(targetIdx);
timer.propagation.stop();
// Construct the PathScorer before extracting percentiles because the scorer needs to make a copy of
// the unsorted complete travel times.
PathScorer pathScorer = null;
if (savePaths == SavePaths.WRITE_TAUI) {
// TODO optimization: skip this entirely if there is no transit access to the destination.
// We know transit access is impossible in the caller when there are no reached stops.
pathScorer = new PathScorer(perIterationTravelTimes, perIterationPaths, perIterationEgress);
} else if (savePaths == SavePaths.ALL_DESTINATIONS) {
// For regional tasks, return paths to all targets.
// Typically used with freeform destinations fewer in number than gridded destinations.
travelTimeReducer.recordPathsForTarget(targetIdx, perIterationTravelTimes, perIterationPaths,
perIterationEgress);
} else if (savePaths == SavePaths.ONE_DESTINATION && targetIdx == destinationIndexForPaths) {
// Return paths to the single target destination specified (by toLat/toLon in a single-point
// analysis, or by the origin-destination pairing implied by a oneToOne regional analysis).
travelTimeReducer.recordPathsForTarget(0, perIterationTravelTimes, perIterationPaths,
perIterationEgress);
}

perIterationTravelTimes[iteration] = timeToReachTarget;
if (pathsToStopsForIteration != null) {
Path path = pathsToStopsForIteration.get(iteration)[stop];
if (path != null) {
perIterationPaths[iteration] = path;
perIterationEgress[iteration] = egress;
}
}
}

The above code contained both side effects and deep checks of the request object. In the future, the transformations that remain in the PerTargetPropagater may have a better place to move to. For now, I tried to leave the main pieces of logic and setup within the propagater, but split it up in a way that the TravelTimeComputer could call into it step by step.

If you look at the resulting code, the propagater does not return results, but it sets time values of the passed in array, and uses a passed in instance to record paths. There are ways to create code that is purely side effect free, but I am not advocating that we need to be dogmatic about it. Merely to utilize it when it can improve code quality.

timer.propagation.start();
PerTargetPropagater.propagateToTarget(
targetIdx,
travelTimesToTarget,
pathsRecorder,
costTables,
modeSpeeds,
modeTimeLimits,
invertedTravelTimesToStops,
maxTripDurationSeconds
);
timer.propagation.stop();
// Record paths accumulated from propagation.
pathsRecorder.recordPathsForTarget(
targetIdx,
travelTimesToTarget
);

// To reach this target in this iteration, alighting at this stop and proceeding by this egress mode is
// faster than any previously checked stop/egress mode combination. Because that's the case, update the
// best known travel time and the corresponding path.
travelTimesToTarget[iteration] = newTimeToTarget;
pathsRecorder.setTargetIterationValues(iteration, stopIndex, egressTimeAndMode);

Returns times in seconds instead of converting before sending.
If we are returning a single string, concatenating values can be very redundant as IDs may be the same as `short_name`s and stop indexes would need to be parsed to be useful.
Should the tests be using the "HumanReadableIteration" results?
Refactor code around paths recording.
- Reduce flags and settings from peppering inner loops
- Consolidate regional / taui / single point recording
- Reduce code with "side effects" preferring methods that act on passed in data
- Modify single point path summaries to produce what the front-end will display.
Translate a path result to data usable by the client side.
- Remove the static initialize method from `PathResultsRecorder`
- Move the `summarize` method from `PathResult` -> `RegionalPathResultSummar`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant