Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radius CPU computation with KDTree.query() when number of neighbors is higher than max_num_neighbors #126

Open
aboulch opened this issue Apr 7, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@aboulch
Copy link

aboulch commented Apr 7, 2022

Hello,

In the radius search CPU, you are using:

tree = scipy.spatial.cKDTree(x.detach().numpy())
_, col = tree.query( y.detach().numpy(), k=max_num_neighbors, distance_upper_bound=r + 1e-8)

It seems that there is no guaranty that returned indices are randomly selected if there are more points within range r than max_num_neighbors.

Here is a sample code:

data = np.random.rand(100000, 3)
tree = KDTree(data)
distances, ids = tree.query([0.5,0.5,0.5], k=32, distance_upper_bound=0.2)
print(distances.max())
---> 0.044963456313535384 (reproducible if launched several times)

It seems that it searches for the knn and return only the neighbors within r.
The right function may be query_ball_point?

@aboulch
Copy link
Author

aboulch commented Apr 7, 2022

The GPU version does as expected.

@rusty1s
Copy link
Owner

rusty1s commented Apr 7, 2022

Are you using an older version of torch-cluster? In the current version, we no longer make use of scipy for this, see https://github.com/rusty1s/pytorch_cluster/blob/master/torch_cluster/radius.py.

@aboulch
Copy link
Author

aboulch commented Apr 7, 2022

Thanks for qucik response. I was not on the right version of the code documentation.
I use the 1.5.9 version of the code see bellow

In an IPython session:

In [5]: import torch_cluster

In [6]: torch_cluster.__version__
Out[6]: '1.5.9'

In [7]: from torch_cluster import radius

In [9]: import torch

In [10]: data = torch.rand(1000000, 3)

In [11]: query = torch.tensor([[0.5,0.5,0.5]])

# Compute distance of the further neighbor

# CPU
In [12]: (data[radius(data, query, r=0.2, max_num_neighbors=32)[1]] - 0.5).norm(dim=1).max()
Out[12]: tensor(0.0607)

In [13]: (data[radius(data, query, r=0.2, max_num_neighbors=64)[1]] - 0.5).norm(dim=1).max()
Out[13]: tensor(0.0701)

In [14]: (data[radius(data, query, r=0.2, max_num_neighbors=128)[1]] - 0.5).norm(dim=1).max()
Out[14]: tensor(0.0824)

# GPU
In [15]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=32)[1]] - 0.5).norm(dim=1).max()
Out[15]: tensor(0.1985)

In [16]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=64)[1]] - 0.5).norm(dim=1).max()
Out[16]: tensor(0.2000)

In [17]: (data[radius(data.cuda(), query.cuda(), r=0.2, max_num_neighbors=128)[1]] - 0.5).norm(dim=1).max()
Out[17]: tensor(0.2000)

CPU and GPU do not give the same results.

@aboulch
Copy link
Author

aboulch commented Apr 8, 2022

Hello,
Problem could be that nanoflann does not produce results in a random order when lookning for the neighbors in a given radius. This is due the tree traversal process.
A solution could be to shuffle the index list?

radius_cpu.cpp l94

std::vector<std::pair<size_t, scalar_t>> ret_matches;
size_t num_matches = mat_index.index->radiusSearch(
    y_data + i * y.size(1), r * r, ret_matches, params);

// proposition
std::random_shuffle(ret_matches.begin(), ret_matches.end());

for (size_t j = 0; j < std::min(num_matches, (size_t)max_num_neighbors); j++) {
            out_vec.push_back(x_start + ret_matches[j].first);
            out_vec.push_back(i);
}

@rusty1s
Copy link
Owner

rusty1s commented Apr 8, 2022

Yes, our GPU implementation is non-deterministic while the CPU one is deterministic (it will pick the closest points). Do you think this is a problem? I am happy to add some form of random shuffling to the CPU implementation (at best controllable via an input argument?). Let me know if you have interest in contributing this feature!

@rusty1s rusty1s added the enhancement New feature or request label Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants