Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add symmetric flag to PowerMethod #1470

Merged
merged 21 commits into from
Sep 28, 2023
Merged

add symmetric flag to PowerMethod #1470

merged 21 commits into from
Sep 28, 2023

Conversation

paskino
Copy link
Contributor

@paskino paskino commented Apr 18, 2023

Describe your changes

This pull request introduces a new argument to the power method, called "method".

If method="direct_only" then the power method computes $x_{k+1} = A (x_k / | x_{k}|)$ then $|A|= |x_k|$.

If method="composed_with_adjoint" then the power method computes $x_{k+1} = A^TA(x_k/|x_k|)$. $|A| = \sqrt{|x_k|}$, i.e. it applies the power method to $A^TA$ before taking the square root.

If method="auto" then the code checks to see if the operator range and domain are equal, if they are then it uses the "direct_only" method, if they aren't or there are no range and domain geometries defined, it uses the "composed_with_adjoint" method.

The LinearOperator::Calculate_norm method has been updated to use the "composed_with_adjoint" method to match the behaviour of norm with e.g. Matlab.

A check_convergence flag has been added in the case return_all=True to check if the tolerance has been reached at the end of the iterations. This is only relevant in the case that a user would use the power method directly with return_all=True.

Link relevant issues

closes #1468

Checklist when you are ready to request a review

  • I have performed a self-review of my code
  • I have added docstrings in line with the guidance in the developer guide
  • I have implemented unit tests that cover any new or modified functionality
  • CHANGELOG.md has been updated with any functionality change
  • Request review from all relevant developers
  • Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

  • The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License.
  • I confirm that the contribution does not violate any intellectual property rights of third parties

closes #1468 

Signed-off-by: Edoardo Pasca <edo.paskino@gmail.com>
@paskino
Copy link
Contributor Author

paskino commented Apr 21, 2023

add as unit test a check with a blurring operator, which should evaluate the same in case symmetric=True and False

@paskino paskino requested a review from jakobsj April 21, 2023 09:30
@paskino
Copy link
Contributor Author

paskino commented Apr 21, 2023

@jakobsj
Copy link
Contributor

jakobsj commented Apr 26, 2023

I'm looking into this.

jakobsj
jakobsj previously requested changes May 2, 2023
Copy link
Contributor

@jakobsj jakobsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this - this is an important fix and I believe it is correctly implemented. My only comment is that currently two terms are used "symmetric" and "diagonalisable". It is unclear if we mean the same by these. For example, in the old version "symmetric" was checked by whether the range and domain are the same, which is not sufficient. For clarity, I recommend replacing "symmetric" by "diagonalisable".

@MargaretDuff
Copy link
Member

MargaretDuff commented Aug 23, 2023

I am not sure that this is correct. Especially when we leave the default to be False. In the case where A is not normal, $A^TA \neq AA^T$ then the eigenvalues of $A^TA$ are not necessarily the square of the eigenvalues of $A$. For example

$$ \begin{matrix} 1 & 0\\ 1 & 2 \end{matrix} $$

has eigenvalues 1 and 2 and is diagonalisable but not normal. In this case

$$ A^TA=\left( \begin{matrix} 2 & 2\\ 2 & 4 \end{matrix} \right) $$

has eigenvalues $3\pm\sqrt{5}$. This is why the test is failing.

@MargaretDuff MargaretDuff added the bug Something isn't working label Aug 23, 2023
@MargaretDuff
Copy link
Member

MargaretDuff commented Aug 23, 2023

I have done two things: changed symmetric to square. If the matrix is not square then we find the absolute value of the largest eigenvalue of $A^*A$ and square root this. If the matrix is square, we complete the power method as usual.

There are cases where the power method won't converge and I highlighted this in a warning, if we get to the max number of iterations without converging.

N.B we care about diagonalisability, because without this, the power method may not converge. We care about a matrix being symmetric because this means the power method converges quicker, but it doesn't actually require any change in the code for this speed up!

I think the confusion here and in the initial issue was between symmetric (the matrix contents) and square (the matrix size).

The question is now whether it deals with the issue: #1468

@MargaretDuff MargaretDuff added Waiting for review and removed bug Something isn't working labels Aug 23, 2023
@MargaretDuff
Copy link
Member

To deal with issue: #1468 specifically the matrix A = {{0, 1}, {0, 0}}, the power method now raises an error if the matrix is nil-potent, i.e. the largest eigenvalue is zero. The next question is whether this is the behaviour we want, or if we want the power method to output zero in this case?

@jakobsj
Copy link
Contributor

jakobsj commented Aug 25, 2023

Thank you @MargaretDuff for looking into this so carefully. I think you are right about the source of confusion and the suggestion to keep using the matrix itself in case of a square matrix.

On the case of nilpotent matrices - I note that for example matlab eig and eigs happily return zero(s). I could see us moving ahead with returning zero, which seems to me more useful, if the user just wants to know what the largest eigenvalue is. Then let for example FISTA deal with checking whether the value is zero ie can be used for a step size calculation.

I think the confusion around symmetric possibly also had to do with ensuring a real-valued largest eigenvalue. How do we handle if this is not the case? For example, what would happen now if we have a real-valued, square but non-symmetrix matrix like in matlab notation A = [[-1,2], [-3,-4]], which has complex eigenvalues -2.5 +/- 1.9365i? Does the power method converge to anything? To one of these complex-valued eigenvalues, and which one since their magnitude is the same?

@MargaretDuff
Copy link
Member

Thank you @MargaretDuff for looking into this so carefully. I think you are right about the source of confusion and the suggestion to keep using the matrix itself in case of a square matrix.

On the case of nilpotent matrices - I note that for example matlab eig and eigs happily return zero(s). I could see us moving ahead with returning zero, which seems to me more useful, if the user just wants to know what the largest eigenvalue is. Then let for example FISTA deal with checking whether the value is zero ie can be used for a step size calculation.

Yep, this would deal with Kris' issue but we should check if it breaks anything else.

I think the confusion around symmetric possibly also had to do with ensuring a real-valued largest eigenvalue. How do we handle if this is not the case? For example, what would happen now if we have a real-valued, square but non-symmetrix matrix like in matlab notation A = [[-1,2], [-3,-4]], which has complex eigenvalues -2.5 +/- 1.9365i? Does the power method converge to anything? To one of these complex-valued eigenvalues, and which one since their magnitude is the same?

I think if the matrix has a unique largest eigenvalue then we should be fine for convergence. If there is not a unique largest eigenvalue but the matrix is hermitian then we should also be fine converging to the magnitude of the largest eigenvalue, but the eigenvector won't converge (we don't use that so that's fine). In your example, it doesn't converge, I think because the eigenvectors aren't orthogonal and don't span the space of 2D complex vectors. The case of A = [[-1,2], [-3,-4]] doesn't converge correctly and is worrying. Maybe a discussion about how to check convergence?

@jakobsj
Copy link
Contributor

jakobsj commented Aug 30, 2023

I haven't looked enough into the power method on how it behaves in a such a case. I wonder if this is beyond the present issue and could be dealt with - for now - by creating a separate issue to keep track of it, and simply throw an error or warning in case it does not converge within a specified number of iterations? COuld consider allow the user to pass a "force" flag or similar if they know what they are doing and definitely want to get whatever value out that the method has made it to in the specified number of iterations? Should we (do we already?) also allow the user to specify the number of iterations and/or tolerance?

@MargaretDuff
Copy link
Member

After a chat with @paskino I have:

  • Kept the change from symmetric to square
  • Added a convergence check in the return_all list, so it is optional
  • Added an optional domain_is_range flag. If this is True, it calculates the power method algorithm for the operator, A, if false it returns the square root of the result of the power method for A^TA and if the default, None, it checks the range and domain and decides if it thinks the operator is square.

Open for a chat about what to name things

@paskino
Copy link
Contributor Author

paskino commented Sep 26, 2023

Resume:

If the operator $A$ is square, i.e. it has the same range and domain:

$x_{k+1} = A (x_k / | x_{k}|)$ then $|A|= |x_k|$

If $A$ is not square then we can calculate the norm of $A^TA$ instead. If $A$ is square and normal ($A^TA = AA^T$), then we can apply this rule and do $x_{k+1} = A^TA(x_k/|x_k|)$. $|A| = \sqrt{|x_k|}$

@MargaretDuff
Copy link
Member

MargaretDuff commented Sep 26, 2023

With @gfardell, @paskino and @jakobsj we discussed:

  • That the call to norm by the linear operator should return the norm calculated by $||A||^2=\text{largest eig value}(A^*A)$ and that this fits with Matlab's behaviour. To do this, we could either set the power method to, as default, iterate on $A^*A$ or we could update the call to the power method in LinearOperator.calculate_norm with a flag to force iterates on $A^*A$. We preferred the latter because then the power method could still be used as a stand-alone method.
  • We discussed what to call the flag in the power method to force iterates on $A$ or $A^*A$, possible names include compose_with_adjoint, direct_only, range_and_domain_are_equal, domain_equals_range... The names "square" and "symmetric" are not correct.

@MargaretDuff MargaretDuff marked this pull request as ready for review September 26, 2023 12:58
@MargaretDuff
Copy link
Member

@gfardell and I just had a chat about the power method. We decided that instead of a flag with options True, False and None on a flag called "method" with options "auto", "direct_only" and "composed_with_adjoint". Any thoughts on these names?

Copy link
Member

@gfardell gfardell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested some logic and parameter changes. You also need to check in the file with the LinearOperator::Calculate_norm update.

Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
Wrappers/Python/cil/optimisation/operators/Operator.py Outdated Show resolved Hide resolved
@gfardell gfardell added Merge pending jenkins Once jenkins is happy with can be merged and removed Waiting for review labels Sep 27, 2023
Signed-off-by: Gemma Fardell <47746591+gfardell@users.noreply.github.com>
@gfardell
Copy link
Member

image

The documentation looks a little off to me. The bold section in the note, as well as the types of some of the arguments seem to have not rendered as expected.

@MargaretDuff
Copy link
Member

image
I think that looks better (at least locally)

@MargaretDuff
Copy link
Member

And the build looks better

image

@gfardell gfardell merged commit d6b245a into master Sep 28, 2023
3 checks passed
@gfardell gfardell deleted the power_method_symmetric branch September 28, 2023 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge pending jenkins Once jenkins is happy with can be merged
Projects
Status: Done
Archived in project
Development

Successfully merging this pull request may close these issues.

PowerMethod is too optimistic in using the symmetric version, causing failures for SIRF resamplers
5 participants