-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two small issues #4
Comments
Hi! Regarding the performance, you should expect it to be slower than respective instant-ngp C++ version, but more work on the performance side is in coming. Additionally, you can try disabling rendering during training and vice-versa, disable training when rendering for better performance. |
Also, I don't think you need to dev Nerf.jl for NerfGUI.jl, unless you want to modify it. And lastly, we have a basic benchmark in Nerf.jl which you can run with: using Nerf
Nerf.benchmark() And report the numbers here |
No, I was referring to https://github.com/JuliaNeuralGraphics/NerfGUI.jl/blob/main/src/NerfGUI.jl#L18 is still For the benchmark I see (Windows 11, Nvidia RTX 4090)
I also just wanted to say I'm sorry if my first post came off as critical. I think this project is extremely cool, I was just trying to figure out what my expectations should be (and based on your comment: 25 seconds for 1k steps and my benchmark showing 78 sec for 10 it does seem there's some kind of large discrepancy).
|
So it runs benchmark twice, first 10 iterations to make sure everything is compiled. As for |
For the 1,000 iterations it took
|
Wow, that is extremely slow! julia> using Nerf, ProfileCanvas
julia> config_file = joinpath(pkgdir(Nerf), "data", "raccoon_sofa2", "transforms.json");
julia> dataset = Nerf.Dataset(Nerf.Backend; config_file);
julia> model = Nerf.BasicModel(Nerf.BasicField(Nerf.Backend));
julia> trainer = Nerf.Trainer(model, dataset; n_rays=1024);
julia> @profview Nerf.trainer_benchmark(trainer, 10); # Ignore it, since it includes compilation time
julia> @profview Nerf.trainer_benchmark(trainer, 10); # Report this one. |
(Sorry for the zip; turns out Github doesn't let you attach html files). One thing I noticed when running the 1k iterations yesterday and on the profile today is that actually there will be "bursts" where it will do 5-15 iterations in less than a second and then the next few iterations will take multiple seconds each. I'm not sure if it's the garbage collector or maybe device transfers, but it's definitely not the case that all iterations are equally slow but more that it is very irregular; sometimes quite fast and other times extremely slow. |
There profiling results aren't very useful unfortunately... But since you are seeing bursts with fast iterations it may be due to GC. |
Ok, well I guess we can close this anyway. Maybe one last question: you mentioned that you've run it on a 6 GB GPU; that was one other weird thing that I noticed -- when I run Nerf.jl it always instantly allocates all 24GB that the 4090's got and I had wondered about that also -- is that intentional or maybe that's something I can look into also (probably at the CUDA.jl level?). Thanks for the help and sorry for the chatty "issue." |
It allocates, but it does not use all that memory. This is due to GC not freeing unused arrays immediately, which means, from the CUDA.jl perspective, that those arrays (and underlying memory) are still in use. And when the memory pool grows to the maximum size (24 GB in your case), then it forcibly triggers GC, freeing the memory and potentially releasing it back to OS, which is expensive.
I propose we leave it open, since this huge drop in performance is worrying. |
If you can, try running another Nerf.jl benchmark, it tests performance of a single kernel without allocations.
Currently it tests the performance of HashGridEncoding kernel and sperical harmonics. |
Hello! I pulled the latest version of NerfGUI.jl and Nerf.jl to my local machine and in NerfGUI I entered package mode and "dev'd" Nerf.jl. But it crashed for me with errors about undefined DEVICE. I prepared a PR to fix it but as I was about to push it I noticed that the issue is already fixed on the
nerf-update
branch. So I thought I would ask if it was time to merge that branch to main or is there a potential issue?My 2nd question (which may be related) is that the training performance after the fix is extremely slow. On the order of a minute or two per iteration. I'm using CUDA with an RTX 4090 so I guess I was expected a few frames per second, but looking around I couldn't find a reference iteration time and even in the JuliaCon '23 talk I didn't see training speed. Is there something wrong on my end or is < 1 fps during training expected?
The text was updated successfully, but these errors were encountered: