Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to solve Ax=b when A is a Sparse Array ? #925

Open
crazyfireji opened this issue Nov 4, 2024 · 3 comments
Open

How to solve Ax=b when A is a Sparse Array ? #925

crazyfireji opened this issue Nov 4, 2024 · 3 comments

Comments

@crazyfireji
Copy link

Like the title, if A is a Sparse Array, we can solve it like: x=A\b, but it can't works in CUDA or gpu.

@amontoison
Copy link
Member

Hi @crazyfireji!

If A is a sparse matrix on the GPU, you can use a Krylov solver depending on the properties and shape of A:

  • x, stats = cg(A, b, atol=1e-10, rtol=0.0) if A is symmetric positive definite.
  • x, stats = minres(A, b, atol=1e-10, rtol=0.0) if A is symmetric indefinite.
  • x, stats = gmres(A, b, atol=1e-10, rtol=0.0) if A is square and unsymmetric.
  • x, stats = lsmr(A, b, atol=1e-10, rtol=0.0) if A is rectangular.

I’ve detailed how to set this up for each GPU backend
--> here <--

@crazyfireji
Copy link
Author

Hi @crazyfireji!

If A is a sparse matrix on the GPU, you can use a Krylov solver depending on the properties and shape of A:

  • x, stats = cg(A, b, atol=1e-10, rtol=0.0) if A is symmetric positive definite.
  • x, stats = minres(A, b, atol=1e-10, rtol=0.0) if A is symmetric indefinite.
  • x, stats = gmres(A, b, atol=1e-10, rtol=0.0) if A is square and unsymmetric.
  • x, stats = lsmr(A, b, atol=1e-10, rtol=0.0) if A is rectangular.

I’ve detailed how to set this up for each GPU backend --> here <--

but the effecience is very low when ,my A is very large(more than 200000*200000, is there any more effecient function can be recomanded?

@amontoison
Copy link
Member

amontoison commented Nov 6, 2024

@crazyfireji Can you give more details please?
I don't know on which architectures you want to run your code (NVIDIA, AMD or Intel?) and what are the properties of A.

For large systems, you need a preconditioner to speed-up the convergence.
You have many exemples in the documentation.

If the system is "just" 200 000 x 200 000, I can also recommend CUDSS.jl but it will only work on NVIDIA GPUs and the system should not be too ill-conditioned:
https://github.com/exanauts/CUDSS.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants