[BUG] Multi-GPU DBSCAN is broken #6110

vikcost · 2024-10-14T22:27:49Z

Below is a minimal version of a test script for multi-gpu DBSCAN.
I have 6 RTX 4090 on my machine that I want to utilize.

I observe memory allocations and de-allocations on my GPUs. But DBSCAN fails to return any result.

Any idea where the issue might be coming from?

import numpy as np
from cuml.dask.cluster import DBSCAN
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

if __name__ == "__main__":
    cluster = LocalCUDACluster()
    client = Client(cluster)
    embs = np.random.randn(100_000, 256)
    dbscan = DBSCAN(
        client=client,
        eps=0.25,
        min_samples=5,
        metric="cosine",
    ).fit(embs)

Environment details:

Environment location: Bare-metal
Linux Distro/Architecture: Ubuntu 22.04
GPU Model/Driver: RTX4090 driver 550.107
CUDA: 12.4
Method of cuDF & cuML install: pip

The text was updated successfully, but these errors were encountered:

divyegala · 2024-10-21T14:13:50Z

@vikcost can you explain what you mean by DBSCAN failing to return any result? Does that mean there is a crash or something else going on?

vikcost · 2024-11-04T20:43:44Z

@divyegala

After reviewing and waiting for longer, I see DBSCAN returning clustering results on datasets of 1_000_000 data points.
However, I expect to get quicker performance by setting rmm_pool_size="24GB", but computation time slightly increased from 303 sec to 312 sec.

cluster = LocalCUDACluster(protocol="ucx", rmm_pool_size="24GB")

It's unexpected, provided that RMM is designed for advanced memory management. Am I setting these hyperparameters in a wrong way?

vikcost · 2024-11-04T21:03:24Z

However, when I run clustering on 5_000_000 data points,
I don't see typical log outputs, as below:

[W] [22:35:28.663183] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
[W] [22:35:32.380082] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
...

Also, GPU-utilization is 0% and script doesn't show any signs of activity. How would one estimate a run time of a multi-GPU DBSCAN as a function of number of data points?

vikcost added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Multi-GPU DBSCAN is broken #6110

[BUG] Multi-GPU DBSCAN is broken #6110

vikcost commented Oct 14, 2024

divyegala commented Oct 21, 2024

vikcost commented Nov 4, 2024

vikcost commented Nov 4, 2024

[BUG] Multi-GPU DBSCAN is broken #6110

[BUG] Multi-GPU DBSCAN is broken #6110

Comments

vikcost commented Oct 14, 2024

divyegala commented Oct 21, 2024

vikcost commented Nov 4, 2024

vikcost commented Nov 4, 2024