Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nx-cugraph: faster shortest_path #4739

Open
wants to merge 6 commits into
base: branch-24.12
Choose a base branch
from

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Oct 24, 2024

For larger graphs, nearly all the time was spent creating the dict of lists of paths. I couldn't find a better way to create these, nor could I find an approach to compute more in PLC or cupy. So, the solution in this PR is to avoid computing until needed!

This now returns a Mapping instead of a dict. Will anybody care or notice that the return type isn't strictly a dict?

This is currently only for unweighted bfs. We should also do weighted sssp paths. Update to do sssp too.

Also, this currently recurses. If we like the approach in this PR, we should update computing the paths on demand to not recurse. Updated to not recurse to avoid any chance of hitting recursion limit, which would have been unlikely, but possible.

If all paths are used--and, hence, computed in PathMapping--then overall performance remains comparable. Hence, this PR speeds up performance (sometimes by a lot!) or keeps it the same, and the cost is a non-dict return types, which I think is okay for backends to do, because duck-typing is a thing in Python.

For larger graphs, nearly all the time was spent creating the dict of lists of paths.
I couldn't find a better way to create these, nor could I find an approach to compute
more in PLC or cupy. So, the solution in this PR is to avoid computing until needed!

This now returns a `Mapping` instead of a `dict`. Will anybody care or notice that
the return type isn't strictly a dict?

This is currently only for unweighted bfs. We should also do weighted sssp paths.

Also, this currently recurses. If we like the approach in this PR, we should update
computing the paths on demand to not recurse.
@eriknw eriknw added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python nx-cugraph labels Oct 24, 2024
@eriknw eriknw requested a review from a team as a code owner October 24, 2024 22:32
@ChuckHastings
Copy link
Collaborator

If you can define the behavior you want, we could explore adding some PLC functionality to improve this.

@eriknw
Copy link
Contributor Author

eriknw commented Oct 25, 2024

If you can define the behavior you want, we could explore adding some PLC functionality to improve this.

Thanks @ChuckHastings, this is what I tried to do initially, but I couldn't determine anything that would actually help, because the slow part is creating Python lists for each node. Maybe we could get a factor of 2 or so if we moved the original code to Cython (but I'm skeptical), but this just feels awkward.

@ChuckHastings
Copy link
Collaborator

Did you look at

* @brief Extract BFS or SSSP paths from a cugraph_paths_result_t
?

This function should return a dense matrix where each row represents the path from a destination matrix back to the source from the BFS. So result[0][0] would be the first destination, result[0][1] would be the predecessor, etc. back to the source. Any vertices beyond the source on a given path would be set to invalid_vertex_id.

Would that make things faster? You wouldn't have to do the predecessor lookups, you would already have contiguous values in GPU memory.

@rlratzel
Copy link
Contributor

Thanks, @eriknw. Please also include the before and after output from the relevant pytest benchmarks.

the cost is a non-dict return types, which I think is okay for backends to do, because duck-typing is a thing in Python.

We should discuss this at the next NetworkX dispatching meeting.

@eriknw
Copy link
Contributor Author

eriknw commented Nov 6, 2024

Did you look at

* @brief Extract BFS or SSSP paths from a cugraph_paths_result_t

Thanks for sharing that, I didn't know about it! I did experiment as if such a thing (or CSR-like path results) could be done in C, and I found it didn't help performance.

@eriknw
Copy link
Contributor Author

eriknw commented Nov 6, 2024

Please also include the before and after output from the relevant pytest benchmarks.

Benchmarks before this PR:
shortest_paths_before

Benchmarks with this PR:
shortest_paths_after

@eriknw
Copy link
Contributor Author

eriknw commented Nov 7, 2024

We should discuss this at the next NetworkX dispatching meeting.

I brought this up during the nx dispatching meeting today. All present thought this approach is perfectly fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks improvement Improvement / enhancement to an existing function non-breaking Non-breaking change nx-cugraph python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants