Paths recording refactor #835

trevorgerhardt · 2022-11-02T10:41:01Z

While working on #827 I saw a few opportunities for refactoring to improve code quality around paths recording. The central issue was consolidating the multiple ways paths are recorded. But there were additional issues that were cleaned up in related code paths along the way.

This PR is intended to be a pure refactor with no functional changes.

Terminology:

Path: the access mode, routes, stops, and egress mode used to get from an origin to a destination.
Iteration: the departure time, wait times, and total duration of a specific usage of a path.
Path result: a combined path & iteration. Each iteration uses a single path. Each path can be used at multiple iterations.
Request: One of AnalysisWorkerTask, ProfileRequest, RegionalTask or TravelTimeSurfaceTask

Paths recording

Paths data, which is recorded while routing and propagating, is used in 3 different ways. Single origin-destination pairs, many-to-many pairings, and with a grid (all grid cells to all other grid cells). All three have different path data requirements and produce different results.

`PathResultsRecorder`

For each request, a PathsResultsRecorder is created and initialized if that request asks for paths data. Methods like recordPathsForTarget handle the case if path recording is disabled internally and turn into a noop so that the surrounding code does not need to repeatedly check flags to see if it is enabled.

The new TransitPathsPerIteration, which records paths in the FastRaptorWorker, works the same way.

Each result type can be produced from the data collected in the new PathResultsRecorder.

Result types

One-to-one

Mainly for single point analysis, we return a single set of paths to a specific destination which are displayed in the application UI (https://github.com/conveyal/ui/pull/1891). The results are returned in JSON.

An example of the results:

For more details about how path results get translated, see PathResultSummary.java (originally created in #827).

Many-to-many

For regional analyses, we store multiple path results for each origin destination pair taken from a specific set of non-gridded (freeform) points. The results are stored in a CSV file.

An example row of results (full file):

origin	destination	routes	boardStops	alightStops	rideTimes	accessTime	egressTime	transferTime	waitTimes	totalTime	nIterations
Grosvenor	Brookland	RED	3542	4710	39.0	1.0	1.3	0.0	1.0	42.3	112

For path results with multiple trip legs, corresponding entries will be joined together with a delimiter. For example, routes would look like RED|J2 and there would be two board stops, two alight stops, two ride times, and two wait times.

For details about how path results get turned into their CSV compatible data that is sent back to the broker, see RegionalPathResultsSummary.java in this PR (created from the summarization code in the original PathResult.java).

Q: Should times be recorded in seconds instead of floating point minutes? Seconds are more useful for machines, floating point times are more useful for reading. Ex: 2538 seconds vs 42.3 minutes.

Taui from raster grid

For Taui sites, we store sets of route and stop indexes that reference detailed route and stop data in a corresponding file. For each grid cell, we store which path set should be used to get to every other grid cell. We store the results in a custom binary PATHGRID file.

For details about how path results get turned into binary data, see TauiPathResultsWriter.java in this PR (converted from PathWriter.java).

Further consolidation?

We produce three different sets of results from the same data in three different output formats. This is understandable, as the requirements in each situation are different. But it does make me wonder if there is some more consolidation possible, so that when we are moving between each path result type, working with the end data is more familiar. It's an open question.

Other Refactoring

`AnalysisWorkerTask` / `ProfileRequest` / `RegionalTask` / `TravelTimeSurfaceTask`

We pass these full request objects around generously, making it unclear which fields of the request each class or function that uses it depends on. Additionally, a more pernicious behavior is repeatedly storing the request object as a instance field to be used later. Not only is it unclear which fields of the request object effect the given class on construction, but each usage of the request object must now be looked at to see which parts of the request effect behavior.

If we are using multiple fields from the request, it may seem "easier" to pass the whole request object as a parameter. But this may hide the fact that we are using a consistent subset of the request object in multiple places.

These groupings should be looked for across all these request objects and where those request objects are used. If there are proper groupings, we should seek to create a composed "configuration" object similar to the backend config. If there are not logical groupings, we should seek to reduce deep usage of these request objects and instead pass the specific values that are needed.

This pull-request can serve somewhat as a guide/case study to the pros and cons of this to help inform a direction to go. I attempted to use just the values necessary where I could. In some cases, I still passed the full request objects mainly because of the "cleanliness" of the surrounding code.

Side effects

Our coding style contains too many side effects. A "side effect" is when an operation creates, changes, or destroys data that was not directly passed in as a parameter or returned as a result. For example:

class WithSideEffects {
  data = null;
  result = null;
  compute () {
    this.result = slowFunctionOnData(this.data)
  }
}
const sideEffect = new WithSideEffects()
sideEffect.data = [1, 2, 3, 4]
sideEffect.compute()
console.log(sideEffect.results)

This is a contrived example. It does not look as bad when its not surrounded by other code. The side effect free version can be thought of as "functions that operate on inputs and return the results". Here is the "side effect free" version:

function computeWithoutSideEffects (data) {
  return slowFunctionOnData(data)
}
const data = [1, 2, 3, 4]
const result = computeWithoutSideEffects(data) // The key line
console.log(result)

Side effect free code is easier to understand, test, and refactor. It helps keeps methods smaller, simpler, and more focused.

This is mainly a coding style (or philosophy?) preference. I am certainly not suggesting that it is something that needs to be applied 100% across the board. We are not using a functional language and will still benefit from OOP styles.

I am mainly suggesting that we keep this in mind while creating new code and refactoring old code. (Our UI repo could also benefit by re-affirming these principles).

Some questions we should ask ourselves:

Should this be a class method that operates on instance values or can it be a static method that only operates on inputs?
Is this class hard to keep side effect free because it is trying to do too much?

`TravelTimeComputer`

The TravelTimeComputer was trivially easy to transform to a side effect free style. It was a single method that was initialized with just the AnalysisWorkerTask and TransportNetwork.

The change was essentially:

// Before
var computer = new TravelTimeComputer(request, network)
return computer.compute()
// After
return TravelTimeComputer.compute(request, network)

There are comments in suggesting that the TravelTimeComputer and TravelTimeReducer should be combined. This PR converts the TravelTimeComputer to a single static method. That method could now live anywhere.

`PerTargetPropagater`

The PerTargetPropagater took a bit more work, but seemed appropriate because of the new way of passing the PathResultsRecorder around and the desire to reduce deep usage of the "request" object. The propagator is only used within the TravelTimeComputer and therefore shuffling the setup logic from the propagator into separate steps in the TravelTimeComputer seemed appropriate.

The code I was attempting to refactor to capture paths was here:

r5/src/main/java/com/conveyal/r5/profile/PerTargetPropagater.java

Lines 249 to 272 in 44c6839

    
           // Improve upon these non-transit travel times based on transit travel times to nearby stops. 
        
           // This fills in perIterationTravelTimes and perIterationPaths for one particular target. 
        
           timer.propagation.start(); 
        
           propagateTransit(targetIdx); 
        
           timer.propagation.stop(); 
        
           // Construct the PathScorer before extracting percentiles because the scorer needs to make a copy of 
        
           // the unsorted complete travel times. 
        
           PathScorer pathScorer = null; 
        
           if (savePaths == SavePaths.WRITE_TAUI) { 
        
               // TODO optimization: skip this entirely if there is no transit access to the destination. 
        
               // We know transit access is impossible in the caller when there are no reached stops. 
        
               pathScorer = new PathScorer(perIterationTravelTimes, perIterationPaths, perIterationEgress); 
        
           } else if (savePaths == SavePaths.ALL_DESTINATIONS) { 
        
               // For regional tasks, return paths to all targets. 
        
               // Typically used with freeform destinations fewer in number than gridded destinations. 
        
               travelTimeReducer.recordPathsForTarget(targetIdx, perIterationTravelTimes, perIterationPaths, 
        
                       perIterationEgress); 
        
           } else if (savePaths == SavePaths.ONE_DESTINATION && targetIdx == destinationIndexForPaths) { 
        
               // Return paths to the single target destination specified (by toLat/toLon in a single-point 
        
               // analysis, or by the origin-destination pairing implied by a oneToOne regional analysis). 
        
               travelTimeReducer.recordPathsForTarget(0, perIterationTravelTimes, perIterationPaths, 
        
                       perIterationEgress); 
        
           }

r5/src/main/java/com/conveyal/r5/profile/PerTargetPropagater.java

Lines 417 to 425 in 44c6839

    
               perIterationTravelTimes[iteration] = timeToReachTarget; 
        
               if (pathsToStopsForIteration != null) { 
        
                   Path path = pathsToStopsForIteration.get(iteration)[stop]; 
        
                   if (path != null) { 
        
                       perIterationPaths[iteration] = path; 
        
                       perIterationEgress[iteration] = egress; 
        
                   } 
        
               } 
        
           }

The above code contained both side effects and deep checks of the request object. In the future, the transformations that remain in the PerTargetPropagater may have a better place to move to. For now, I tried to leave the main pieces of logic and setup within the propagater, but split it up in a way that the TravelTimeComputer could call into it step by step.

If you look at the resulting code, the propagater does not return results, but it sets time values of the passed in array, and uses a passed in instance to record paths. There are ways to create code that is purely side effect free, but I am not advocating that we need to be dogmatic about it. Merely to utilize it when it can improve code quality.

r5/src/main/java/com/conveyal/r5/analyst/TravelTimeComputer.java

Lines 408 to 425 in 52d99ad

    
           timer.propagation.start(); 
        
           PerTargetPropagater.propagateToTarget( 
        
                   targetIdx, 
        
                   travelTimesToTarget, 
        
                   pathsRecorder, 
        
                   costTables, 
        
                   modeSpeeds, 
        
                   modeTimeLimits, 
        
                   invertedTravelTimesToStops, 
        
                   maxTripDurationSeconds 
        
           ); 
        
           timer.propagation.stop(); 
        
           // Record paths accumulated from propagation. 
        
           pathsRecorder.recordPathsForTarget( 
        
                   targetIdx, 
        
                   travelTimesToTarget 
        
           );

r5/src/main/java/com/conveyal/r5/analyst/PerTargetPropagater.java

Lines 180 to 184 in 52d99ad

    
           // To reach this target in this iteration, alighting at this stop and proceeding by this egress mode is 
        
           // faster than any previously checked stop/egress mode combination. Because that's the case, update the 
        
           // best known travel time and the corresponding path. 
        
           travelTimesToTarget[iteration] = newTimeToTarget; 
        
           pathsRecorder.setTargetIterationValues(iteration, stopIndex, egressTimeAndMode);

Returns times in seconds instead of converting before sending.

If we are returning a single string, concatenating values can be very redundant as IDs may be the same as `short_name`s and stop indexes would need to be parsed to be useful.

Should the tests be using the "HumanReadableIteration" results?

Refactor code around paths recording. - Reduce flags and settings from peppering inner loops - Consolidate regional / taui / single point recording - Reduce code with "side effects" preferring methods that act on passed in data - Modify single point path summaries to produce what the front-end will display.

Translate a path result to data usable by the client side.

- Remove the static initialize method from `PathResultsRecorder` - Move the `summarize` method from `PathResult` -> `RegionalPathResultSummar`

trevorgerhardt added 20 commits September 22, 2022 15:09

Paths: return machine usable results

743d879

Returns times in seconds instead of converting before sending.

Improve stopString and routeString outputs

cd950c9

If we are returning a single string, concatenating values can be very redundant as IDs may be the same as `short_name`s and stop indexes would need to be parsed to be useful.

Round the minutes in the SimpsonDesertTests

0ed82ea

Should the tests be using the "HumanReadableIteration" results?

Merge branch 'dev' into path-summaries

12c1e39

Add a PathResultSummary class

4d3600b

Translate a path result to data usable by the client side.

Merge branch 'path-summaries' into paths-refactor

936914f

Remove now unused HumanReadableIteration

cf33d90

Clean up comments

ad5269a

Check for path result values before summarizing

c17c0bb

Revert changes to TransitLayer and RouteSequence

6652105

Update field names to include seconds

39a309a

Merge branch 'path-summaries' into paths-refactor

9f22a23

Remove unused methods

219b262

Additional clean up

8183417

Allow getPathResults to return null

7391d25

Merge branch 'dev' into paths-refactor

44b81f0

Use static creation method

95cb044

Move the propagater into analyst

9490c4a

- Remove the static initialize method from `PathResultsRecorder` - Move the `summarize` method from `PathResult` -> `RegionalPathResultSummar`

Rename parameter

52d99ad

trevorgerhardt marked this pull request as ready for review November 10, 2022 06:58

trevorgerhardt enabled auto-merge (squash) November 10, 2022 06:58

trevorgerhardt requested review from abyrd and ansoncfit November 10, 2022 06:59

trevorgerhardt added optimization cleanup labels Nov 10, 2022

Merge branch 'dev' into paths-refactor

13c5b5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paths recording refactor #835

Paths recording refactor #835

trevorgerhardt commented Nov 2, 2022 •

edited

Loading

	// Improve upon these non-transit travel times based on transit travel times to nearby stops.
	// This fills in perIterationTravelTimes and perIterationPaths for one particular target.
	timer.propagation.start();
	propagateTransit(targetIdx);
	timer.propagation.stop();

	// Construct the PathScorer before extracting percentiles because the scorer needs to make a copy of
	// the unsorted complete travel times.
	PathScorer pathScorer = null;
	if (savePaths == SavePaths.WRITE_TAUI) {
	// TODO optimization: skip this entirely if there is no transit access to the destination.
	// We know transit access is impossible in the caller when there are no reached stops.
	pathScorer = new PathScorer(perIterationTravelTimes, perIterationPaths, perIterationEgress);
	} else if (savePaths == SavePaths.ALL_DESTINATIONS) {
	// For regional tasks, return paths to all targets.
	// Typically used with freeform destinations fewer in number than gridded destinations.
	travelTimeReducer.recordPathsForTarget(targetIdx, perIterationTravelTimes, perIterationPaths,
	perIterationEgress);
	} else if (savePaths == SavePaths.ONE_DESTINATION && targetIdx == destinationIndexForPaths) {
	// Return paths to the single target destination specified (by toLat/toLon in a single-point
	// analysis, or by the origin-destination pairing implied by a oneToOne regional analysis).
	travelTimeReducer.recordPathsForTarget(0, perIterationTravelTimes, perIterationPaths,
	perIterationEgress);
	}

	perIterationTravelTimes[iteration] = timeToReachTarget;
	if (pathsToStopsForIteration != null) {
	Path path = pathsToStopsForIteration.get(iteration)[stop];
	if (path != null) {
	perIterationPaths[iteration] = path;
	perIterationEgress[iteration] = egress;
	}
	}
	}

	timer.propagation.start();
	PerTargetPropagater.propagateToTarget(
	targetIdx,
	travelTimesToTarget,
	pathsRecorder,
	costTables,
	modeSpeeds,
	modeTimeLimits,
	invertedTravelTimesToStops,
	maxTripDurationSeconds
	);
	timer.propagation.stop();

	// Record paths accumulated from propagation.
	pathsRecorder.recordPathsForTarget(
	targetIdx,
	travelTimesToTarget
	);

	// To reach this target in this iteration, alighting at this stop and proceeding by this egress mode is
	// faster than any previously checked stop/egress mode combination. Because that's the case, update the
	// best known travel time and the corresponding path.
	travelTimesToTarget[iteration] = newTimeToTarget;
	pathsRecorder.setTargetIterationValues(iteration, stopIndex, egressTimeAndMode);

Paths recording refactor #835

Are you sure you want to change the base?

Paths recording refactor #835

Conversation

trevorgerhardt commented Nov 2, 2022 • edited Loading

Terminology:

Paths recording

PathResultsRecorder

Result types

One-to-one

Many-to-many

Taui from raster grid

Further consolidation?

Other Refactoring

AnalysisWorkerTask / ProfileRequest / RegionalTask / TravelTimeSurfaceTask

Side effects

TravelTimeComputer

PerTargetPropagater

trevorgerhardt commented Nov 2, 2022 •

edited

Loading

`PathResultsRecorder`

`AnalysisWorkerTask` / `ProfileRequest` / `RegionalTask` / `TravelTimeSurfaceTask`

`TravelTimeComputer`

`PerTargetPropagater`