Replies: 3 comments 5 replies
-
I actually developed an adaptive selection of ## adaptive gamma
algo = PDHG(f = F, g = G, operator = K)
algo.max_iteration = 5000
algo.update_objective_interval = 20
for i in range(100):
algo.run(20)
gamma = algo.solution.norm()
sigma = gamma / (normK)
tau = 1 / (gamma * normK)
algo.sigma = sigma
algo.tau = tau This is the comparison of the objective value of the adaptive vs default choice of |
Beta Was this translation helpful? Give feedback.
-
This is an interesting idea! How do you initialize the primal and dual variable of PDHG? One of the proven convergence rates for PDHG is If tau = gamma * beta, sigma = beta / gamma, then the "optimal" gamma (which minimizes the constant in the rate) is How does gamma2 relate to gamma1 = norm(x*)? In general these are probably quite different, but perhaps they are similar for your examples? |
Beta Was this translation helpful? Give feedback.
-
Do you see the same improvement with KL? Also, one more thing. I have noticed, a while ago, some problems with primal/dual/pdgap objectives using Astra with gpu. I believe this is due to the adjoint mismatch. So for comparison I would choose the cpu backend. |
Beta Was this translation helpful? Give feedback.
-
After discussing with @gschramm at Fully3D about his heuristic choice of step sizes in SPDHG I made a few experiments on PDHG based on the PDHG notebook.
His idea is rather simple. One can introduce a parameter
gamma
that controls the ratio between the step size for the primal and dual variable,sigma
andtau
respectively. Actually the docstring says soCIL/Wrappers/Python/cil/optimisation/algorithms/PDHG.py
Lines 89 to 90 in 39b6f7a
CIL/Wrappers/Python/cil/optimisation/algorithms/PDHG.py
Line 150 in 39b6f7a
CIL/Wrappers/Python/cil/optimisation/algorithms/PDHG.py
Line 163 in 39b6f7a
What I had not realised when talking to @gschramm is that he was using preconditioning, however if one uses the following:
OK, so now the problem of the choice of step sizes has become the problem of the choice of
gamma
.@gschramm's idea is that the fastest convergence is for
gamma
equal to the norm of the solution. So, one should first run another algorithm, like OSEM or FBP and calculate such norm and then use it. If I understood correctly this will make the SPDHG iteration be similar to OSEM, for data fidelity asKullbackLeibler
.In the case of CT normally the data fidelity is
L2NormSquared
given the different noise. At any rate, with all the differences of data fidelity, non preconditioning, I tried to choosegamma
as @gschramm. Actually I ran a grid search of the values of gamma starting around the value of the norm of the solution.So this is what I did:
Below I plot the objective value of PDHG after 100 iterations as a function of
gamma
, or better the valuegs
. As you can see, there is a minimum withg = 3e-1
and the default choice ofsigma=1.
is in red dot.If you choose
gamma
that is at the minimum you would get the same objective value (and reconstruction) as the default choice in 1/5 of the iterations, i.e. 1000 vs 5000.Below I show the ratio of the objective value of the 2 PDHG optimisations. As you can see the default choice is always higher than the optimal choice. (X axis is 100s of iterations)
Beta Was this translation helpful? Give feedback.
All reactions