Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Multi-GPU DBSCAN is broken #6110

Open
vikcost opened this issue Oct 14, 2024 · 3 comments
Open

[BUG] Multi-GPU DBSCAN is broken #6110

vikcost opened this issue Oct 14, 2024 · 3 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@vikcost
Copy link

vikcost commented Oct 14, 2024

Below is a minimal version of a test script for multi-gpu DBSCAN.
I have 6 RTX 4090 on my machine that I want to utilize.

I observe memory allocations and de-allocations on my GPUs. But DBSCAN fails to return any result.

Any idea where the issue might be coming from?

import numpy as np
from cuml.dask.cluster import DBSCAN
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

if __name__ == "__main__":
    cluster = LocalCUDACluster()
    client = Client(cluster)
    embs = np.random.randn(100_000, 256)
    dbscan = DBSCAN(
        client=client,
        eps=0.25,
        min_samples=5,
        metric="cosine",
    ).fit(embs)

Environment details:

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Ubuntu 22.04
  • GPU Model/Driver: RTX4090 driver 550.107
  • CUDA: 12.4
  • Method of cuDF & cuML install: pip
@vikcost vikcost added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 14, 2024
@divyegala
Copy link
Member

@vikcost can you explain what you mean by DBSCAN failing to return any result? Does that mean there is a crash or something else going on?

@vikcost
Copy link
Author

vikcost commented Nov 4, 2024

@divyegala

After reviewing and waiting for longer, I see DBSCAN returning clustering results on datasets of 1_000_000 data points.
However, I expect to get quicker performance by setting rmm_pool_size="24GB", but computation time slightly increased from 303 sec to 312 sec.

cluster = LocalCUDACluster(protocol="ucx", rmm_pool_size="24GB")

It's unexpected, provided that RMM is designed for advanced memory management. Am I setting these hyperparameters in a wrong way?

@vikcost
Copy link
Author

vikcost commented Nov 4, 2024

However, when I run clustering on 5_000_000 data points,
I don't see typical log outputs, as below:

[W] [22:35:28.663183] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
[W] [22:35:32.380082] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
...

Also, GPU-utilization is 0% and script doesn't show any signs of activity. How would one estimate a run time of a multi-GPU DBSCAN as a function of number of data points?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants