Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-GPU vs. single GPU - scvi 1.1.x branch #2364

Open
zhenxingjian opened this issue Dec 8, 2023 · 3 comments
Open

multi-GPU vs. single GPU - scvi 1.1.x branch #2364

zhenxingjian opened this issue Dec 8, 2023 · 3 comments

Comments

@zhenxingjian
Copy link

I've successfully installed the scvi==1.1.x (main branch) and tested that I can train the model on 1 GPU.
However, when using multi-GPU, here's the error I'm facing at.

Batch_size = 512.
for 1 GPU:
x.shape = (512, 1178)
for 2 GPUs:
x.shape = (1, 512, 1178)

This causes almost everything cannot run in the code due to the dimension mismatch.
For example, one_hot function or FCLayers.

Do you have a quick fix for this or I should manually change everything of the dimension in the code (maybe x.squeeze(0) in the outer-most nn.Module) to match it?

@martinkim0
Copy link
Contributor

Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.

@canergen
Copy link
Member

canergen commented Dec 9, 2023

Hi @martinkim0 scVI is working with multiple samples (like n_samples_per_mc_run). Those look similar like the multi-GPU structure (n_samples, n_batch, n_genes). Quite some other functions like scANVI are not handling n_samples correctly (dimension errors). It's major work to adapt this. I was so confused by it today and made scANVI instead work with n_samples=1.

@zhenxingjian
Copy link
Author

Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.

I'm following the setup of multiVI.
If you've tested that scVI is working with multi-GPU training, I can try to modify the code following the same setup in scVI from my end to see if it can support multi-GPU training.

@martinkim0 martinkim0 added the P0 label Jul 12, 2024
@martinkim0 martinkim0 added this to the scvi-tools 1.3 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants