-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using with CUDA Solver? #18
Comments
You need to convert
After that you can use About your example, |
Hi @amontoison, I gave that a go but no success! Do you mean something like
Note the change to |
I think what @amontoison tries to say is if you use using SparseArrays, LinearAlgebra, CUDA, IncompleteLU
using IncompleteLU: ILUFactorization
import LinearAlgebra: ldiv!
struct CudaILUFactorization{TL, TU}
L::TL
U::TU
end
function CudaILUFactorization(f::ILUFactorization)
L = UnitLowerTriangular(cu(f.L))
U = UpperTriangular(transpose(cu(f.U)))
return CudaILUFactorization(L, U)
end
LinearAlgebra.ldiv!(f::CudaILUFactorization, x) = ldiv!(f.U, ldiv!(f.L, x))
function example()
A = sprand(Float32, 1000, 1000, 10 / 1000) + 100I
fact = ilu(A, τ = 0.001f0);
cuda_fact = CudaILUFactorization(fact);
x = rand(Float32, 1000);
return norm(Array(ldiv!(cuda_fact, cu(x))) - fact \ x)
end s.t. julia> example()
2.401256f-8 |
Now, I'm not 100% sure anymore why I wasn't using the UnitLowerTriangular + Transpose types for the CPU. IIRC there was some performance issue or dispatch issue back in the days. Maybe that's resolved by now. I should check. |
@Omega-xyZac you forgot the factors
@haampie The code for the backward and forward sweeps was slow If I remember well. The last release 2.4.0 should include these modifications Harmen but I don't know why I have the old behavior of |
Thank you both for all the help. I really appreciate it. I'm still having issues with the GPU version VS CPU. CPU converges in 40 iterations whilst GPU never converges and hits iteration limit. I have attached an example A and b along with the script. Would appreciate any ideas. A&b.zip
Probably still doing something silly here... |
I found the mistakes :
But as I explained before, It's not relevant to use The best solution for your problem is from my point of view:
|
Also, at least on the CPU, computing the incomplete Cholesky factorization is much more efficient than computing the incomplete LU decomposition, cause it barely requires bookkeeping for indices. |
Thank you both for the info. The main issue that I was having with |
|
Using one of the larger systems I have
where F is ic02 and Precilu is ilu with a drop tolerance of 3.0. |
Hello @haampie , If the matrix is ill-conditioned, say
|
Hi,
I'm interested in using this package with a GPU CG method from IterativeMethods. I think only
forward_substitution
andbackward_substitution
methods are required to extend this to CUDA. I've tried using CUDAssv2!
with little success. Any thoughts?The text was updated successfully, but these errors were encountered: