Remove The Hashmap from Shorted Path for Centrality Computation#1307
Remove The Hashmap from Shorted Path for Centrality Computation#1307IvanIsCoding merged 24 commits intoQiskit:mainfrom
Conversation
Pull Request Test Coverage Report for Build 22083427732Details
💛 - Coveralls |
IvanIsCoding
left a comment
There was a problem hiding this comment.
I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.
I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.
|
|
||
| for node in graph.node_identifiers() { | ||
| predecessors.insert(node, Vec::new()); | ||
| sigma.insert(node, 0.0); | ||
| distance.insert(node, -1); | ||
| } |
There was a problem hiding this comment.
Se how the hashmap was full filled with value for all node of the graph, so replacing with a vec of size of node bound will no be a problem for cache efficiency, because hashmap was already full when the algorithm started
There was a problem hiding this comment.
So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.
I heard your argument and i agree that we should benchmark to be sure that it will be faster when the hashmap was not full filled |
IvanIsCoding
left a comment
There was a problem hiding this comment.
I will stand corrected that indeed this is better than the code from #799. I was not involved in the review of #799, but to be honest we can do much better than hashing every single node in the graph (multiple) times.
Your numbers are nice but not very reproducible at the moment, I suggest you test both edge_betweness_centrality and betweness_centrality with two best/worst case scenarios from Python:
- https://www.rustworkx.org/apiref/rustworkx.generators.directed_path_graph.html -> worst case scenario, very sparse braph
- https://www.rustworkx.org/apiref/rustworkx.generators.complete_graph.html -> best case scenario for the optimization
With the benchmark code, we can reuse it to testperformance improvements. I think at the end, the status of thee code will be:
- Undirected graphs always call the vector version (this PR)
- Directed graphs call an optimized version with hash map (future PR)
|
|
||
| for node in graph.node_identifiers() { | ||
| predecessors.insert(node, Vec::new()); | ||
| sigma.insert(node, 0.0); | ||
| distance.insert(node, -1); | ||
| } |
There was a problem hiding this comment.
So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.
|
Also, last but not least add a test to test edge betweness centrality with a deleted node: rustworkx/tests/graph/test_centrality.py Line 62 in 30897c5 |
|
Hello, I have run some benchmark with the graphs you mentionned before. For the Directed graph of 10000 nodes :With hashmap : 2.5117154121398926 secondes For Complete graph of 1000 nodes :With hashmap : 4.231908559799194 secondes if you want to test by your self |
Okay i will do it later |
|
There are some Clippy failures, but they are from the update to Rust 1.89. They are not your fault. you can leave the PR as is, I’ll tweak it and merge it during 0.18. Right now, reviews for 0.17 are a bit stuck |
|
I ran "cargo test -all" with the code up to date and i get this error Test error trace |
|
@IvanIsCoding Hello, any news about this PR ? |
|
I will take a look at this today |
IvanIsCoding
left a comment
There was a problem hiding this comment.
So there was an actual bug in the code due to node_count(), I wrote a test case to catch it. Switching things to node_bound() fixed it.
This should be good to go now.

Hello,
I removed the hashmap for the shorted path for centrality.
This may improve performance.
Tell me what do u think about :)