ApproximateGradientSumFunction and SGFunction #1345

epapoutsellis · 2022-08-17T22:36:02Z

Describe your changes

Implements the SubsetSumFunction based on the Hackathon-000-Stochastic-Algorithms script and removes the AverageSumFunction.
Implements the SGDFunction based on the Hackathon-000-Stochastic-Algorithms script. I include the SGDFunction in this PR because is the easiest stochastic function class.

SubsetSumFunction

The SubsetSumFunction is defined as

$$ \frac{1}{n} \sum_{i=1}^{n}f_{i}(x)$$

where $n$ is the number of subsets and $f_{i}$ are smooth functions.

Example

$$ \frac{1}{n}\sum_{i=1}^{n} F_{i}(x) = \frac{1}{n}\sum_{i=1}^{n}||A_{i} x - b_{i}||^{2} $$

f = SubsetSumFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets))

Note

At the moment, there are 2 sampling methods: random and sequential. For the random case, we have the following.

sampling=random, replacement=True : means thats the next_subset method can select the same subset_num
sampling=random, replacement=False : means thats the next_subset method selects subset_num uniquely.

Example

a = [1,2,3,4,5,6]
num_subset = 2

sampling=random, replacement=True: [3,1,2], [4,2,6]
sampling=random, replacement=False: [1,3,2], [6,4,5]
sampling=sequential, [1,2,3], [4,5,6].

SGDFunction

Example

f = SGDFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets), sampling="random", replacement=False)

GD vs SGD in CIL

We can solve the LeastSquares minimisation problem $$\min_{x}\|Ax - b\|^{2}$$

Deterministically using: GD: objective_function = LeastSquares
Randomly using GD: objective_function = SGDFunction(list of LeastSquares)

Proximal Gradient Algorithms in CIL

For non-smooth objectives we can solve $$\min_{x>0}||Ax - b||^{2}$$

Deterministically using: ISTA/FISTA: f = LeastSquares, g=IndicatorBox(lower=0)
Randomly using : ISTA/FISTA: f = SGDFunction(list of LeastSquares) , g=IndicatorBox(lower=0)

G = IndicatorBox(lower=0)
F = LeastSquares(A, b = noisy_sino)
initial = ig.allocate(0)
ista = ISTA(initial=initial, f=F, g=G, update_objective_interval = 1, 
            max_iteration = 50)
ista.run(verbose=1)

F_sgd1 = SGDFunction(f_subsets, sampling = "random") 
G = IndicatorBox(lower=0) 
num_epochs = 10
initial = ig.allocate(0)
s_ista1 = ISTA(initial=initial, 
            f=F_sgd1,
            g=G, update_objective_interval = n_subsets, 
            max_iteration = num_epochs * n_subsets)
s_ista1.run(verbose=1)

Describe any testing you have performed

Add tests for SubsetSumFunction
Add tests for SGDFunction [wip]

Demos

https://github.com/TomographicImaging/Hackathon-000-Stochastic-Algorithms/tree/main/contrib

Link relevant issues

Close #1341
Close #1343

From the Hackathon-000-Stochastic-Algorithms repository
Close 11
Close 10
Close 9

Checklist when you are ready to request a review

I have performed a self-review of my code
I have added docstrings in line with the guidance in the developer guide
I have implemented unit tests that cover any new or modified functionality
CHANGELOG.md has been updated with any functionality change
Request review from all relevant developers
Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License.
I confirm that the contribution does not violate any intellectual property rights of third parties

epapoutsellis · 2022-08-18T08:34:20Z

@KrisThielemans (could not add you in the reviewers list)

KrisThielemans

I suggest some changes, but in particular to the design:

I really don't think using 1/n is a good idea. It means that an SGDFunction is not a direct replacement of the original, but a replacement of 1/num_subsets orig_function. This means that every derived algorithm needs to know this, and adjust step size etc.
I suggest a change to have a subset_gradient function (without next_subset())
There is no reason to put the preconditioner in SGDFunction. In principle, it could sit in SubsetSumFunction, but in fact, it could sit in Function. I'm not so sure you want to have it in this hierarchy at all though. A gradient preconditioner is something for gradient algorithms, not for functions.

KrisThielemans · 2022-09-10T18:46:15Z

Wrappers/Python/cil/optimisation/functions/Function.py

-        if len(functions) < 2:
+                       
+        self.num_functions = len(functions)
+        if self.num_functions < 2:
            raise ValueError('At least 2 functions need to be passed')


not sure why we'd want to throw an error here. Seems more useful to handle only 1 function, or even zero.

KrisThielemans · 2022-09-10T18:47:04Z

Wrappers/Python/cil/optimisation/functions/Function.py

+
+    r"""SubsetSumFunction represents the following sum 
+
+    .. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n})


why did we choose the 1/n normalisation? I don't find it very natural in some cases (not in KL, but also not when penalty is one of the functions). Also, it doesn't really fit the "SumFunction class

KrisThielemans · 2022-09-10T18:49:06Z

Wrappers/Python/cil/optimisation/functions/Function.py

+        return (1/self.num_subsets) * super(SubsetSumFunction, self).__call__(x)      
+
+    def _full_gradient(self, x, out=None):
+        r""" Computes the full gradient at :code:`x`. It is the sum of all the gradients for each function. """


why _full_gradient and not full_gradient?

Also, currently doc is wrong as it is the average of the gradients of all functions

doc is correct now. still remains why we'd call this _full_gradient

_full_gradient is used in SVRG although I am not sure if it is needed, see here, @zeljkozeljko any ideas?

the "old" SubsetSumFunction class had a _full_gradient method, and SVRG had an attribute called full_gradient, which was computed (updated) once every couple of epochs and it is the estimator of the full gradient of the objective at the snapshot/anchored image. This attribute is then used at every update and I would think we want to store (or at least have the option of storing it), and not recompute it at every update (otherwise why not use full GD). As to the naming, i'm not sure if the choice of the underscore in the method was because of some CIL convention, but of course the, as regards to the attribute in SVRG, this can be renamed (eg we could have snapshot_full_gradient and then also snapshot_image, or something similar), and reserve full_gradient for the SubsetSumFunction class

can be resolved now

KrisThielemans · 2022-09-10T19:12:59Z

Wrappers/Python/cil/optimisation/functions/Function.py

+
+    .. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n})
+
+    where :math:`n` is the number of subsets.


Should add some doc on what this intends to represent, i.e. the gradient is an approximation using some subsets/history etc.

KrisThielemans · 2022-09-10T21:53:04Z

Wrappers/Python/cil/optimisation/functions/Function.py

+        return (1/self.num_subsets) * super(SubsetSumFunction, self).gradient(x, out=out)
+
+    def gradient(self, x, out=None):
+        """ Computes the gradient for each subset function at :code:`x`."""      


In fact, I think it would be better to let this function call next_subset() and then an (undefined) subset_gradient(). All derived classes then have to define subset_gradient(), and never need to know anything about next_subset() (I think). It also means that a user can call subset_gradient() without "advancing" the subset

paskino · 2022-09-15T14:50:35Z

In your description you have 2 examples @epapoutsellis

f = SubsetSumFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets))

and

f = SGDFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets), sampling="random", replacement=False)

I think they should be as follows:

f = SubsetSumFunction(*[LeastSquares(Ai, b=bi) for Ai,bi in zip(A_subsets, b_subsets)])

and

f = SGDFunction(*[LeastSquares(Ai, b=bi) for Ai,bi in zip(A_subsets, b_subsets)], \
    sampling="random", replacement=False)

Is it true?

KrisThielemans · 2022-09-15T17:41:41Z

I think the *n has to be there to compensate for the 1/n in SubsetSumFunction (which is one reason one I think the latter is a bad idea)

epapoutsellis · 2022-09-16T07:52:19Z

I suggest some changes, but in particular to the design:

I really don't think using 1/n is a good idea. It means that an SGDFunction is not a direct replacement of the original, but a replacement of 1/num_subsets orig_function. This means that every derived algorithm needs to know this, and adjust step size etc.

I suggest a change to have a subset_gradient function (without next_subset())

There is no reason to put the preconditioner in SGDFunction. In principle, it could sit in SubsetSumFunction, but in fact, it could sit in Function. I'm not so sure you want to have it in this hierarchy at all though. A gradient preconditioner is something for gradient algorithms, not for functions.

For the preconditioner

Short answer: Totally agree, the preconditioner should be a parameter for each of the algorithms, e.g., GD, ISTA, FISTA that will follow the f.gradient(x, out=out) step. This is something we discussed with @zeljkozeljko .

Long answer: However, we do not have this option atm for out proximal gradient algorithms. In addition, the preconditioner should be a callable method. With this setup, we can cover the cases where

$$ x_{k+1}= x_{k} - \gamma_{k}D(x_{k}, iteration) \nabla f(x_{k}) $$

a) preconditioner is fixed $\frac{1}{A^{T}\mathbb{1}}$
b) depends on the iterates $\frac{x_{k}+\delta}{A^{T}\mathbb{1}}$
c) depends on the subsets_num, i.e., activate/deactivate it after some epochs or iterations.
d) None

Notice that in case c) the (deterministic) proximal gradient algorithms have no info about the number of subsets. Atm, this is known from the SubsetsSumFunction (SGDFunction, SAGFunction, etc). So if we have a callable method for the precond step that depends on (x, iteration) the user has all the information to define it.

self.objective_function.gradient(self.x, out=self.x_update)
self.x_update *= apply_precond(self.x, self.iteration)
self.x.sapyb(1.0, self.x_update, -self.step_size, out=self.x)

Notice that the same can be applied for adaptive step-size, e.g., decreasing step size in SGD that satisfy certain conditions. Unless we have a fix step size or another method armijo rule, it should be a callable method. See for instance SGDClassifier. Finally, all of the above should and can work in ISTA and FISTA.

epapoutsellis · 2022-09-21T14:16:51Z

I suggest some changes, but in particular to the design:

I really don't think using 1/n is a good idea. It means that an SGDFunction is not a direct replacement of the original, but a replacement of 1/num_subsets orig_function. This means that every derived algorithm needs to know this, and adjust step size etc.

I suggest a change to have a subset_gradient function (without next_subset())

There is no reason to put the preconditioner in SGDFunction. In principle, it could sit in SubsetSumFunction, but in fact, it could sit in Function. I'm not so sure you want to have it in this hierarchy at all though. A gradient preconditioner is something for gradient algorithms, not for functions.

For the 1/n convention

Short answer: I think it is important to have the 1/n weight in the base class SubsetSumFunction (inherits from SumFunction) . In this way we will not have problems when comparing with other algorithms e.g., deterministic and will not have to adjust future algorithms found in the literature when the 1/n is considered for finding the argmin.

Long answer: In the first hackathon, we implementedSAG based on this review paper. Then I think an AverageFunction class was created but we decided with @ClaireDelplancke @zeljkozeljko @Imraj-Singh @BillyTang to remove it and fix the scalings in the SubsetSumFunction.

Example: In CIL we can solve the following LS problem

$$ argmin_{x} f(x) = \frac{1}{2}||Ax - d||^{2} $$

using the LeastSquares function and the GD algorithm (Deterministic)
using the SumFunction function and the GD algorithm (Deterministic)
using the SGDFunction function and the GD algorithm (Stochastic)

For 3) the SGDFunction has the $\frac{1}{n}$ in the call method. Therefore to match the min for the deterministic cases we need $\frac{1}{n}$ in front of the LeastSquares and the SumFunction.

For 1) :

f = LeastSquares(Aop, b=bop, c = 0.5/n) 
initial = ig.allocate()
gd = GD(initial = initial, objective_function = f, max_iteration=n_epochs,
       update_objective_interval=1)
gd.run(verbose=1)

The default step_size of GD is $\frac{2*0.99}{L}$ and we have

$$ x_{k+1} = x_{k} - \frac{2 * 0.99}{L} * \frac{1}{n}A^{T}(Ax_{k}-d) = x_{k} - \frac{2 * 0.99}{\frac{||A^{T}A||}{n}} * \frac{1}{n}A^{T}(Ax_{k}-d) = x_{k} - \frac{2 * 0.99}{||A^{T}A||} * A^{T}(Ax_{k}-d)$$

For 2) : Here, we assume that $$f(x) : = \frac{1}{n}\sum_{i=1}^{n} f_{i}(x), \quad f_{i}(x) = \frac{1}{2}||A_{i}x-d_{i}||^{2} $$
Assuming that we have split $A$ into $A_{i}$'s and $d$ into $d_{i}$'s, we can have

average_sum = (1./n)*SumFunction(*fi_cil)
average_sum_gd = GD(initial = initial, objective_function = average_sum,  max_iteration=n_epochs,
       update_objective_interval=1)
average_sum_gd.run(verbose=1)

Here we have,

$$ L = \frac{1}{n}\sum_{i=1}^{n} L_{i} = \frac{1}{n}\sum_{i=1}^{n} ||A_{i}^{T}A_{i}|| $$

$$ x_{k+1} = x_{k} - \frac{2 * 0.99}{L} * \frac{1}{n}\sum_{i=1}^{n}A_{i}^{T}(A_{i}x_{k}-d_{i}) = x_{k} - \frac{2 * 0.99}{ \sum_{i=1}^{n} ||A_{i}^{T}A_{i}|| } \sum_{i=1}^{n}A_{i}^{T}(A_{i}x_{k}-d_{i})$$

Finally for the SGD case, we have

sgd_fun = SGDFunction(fi_cil)
step_size = 0.005
sgd = GD(initial = initial, objective_function = sgd_fun, 
        step_size = step_size, max_iteration=n_epochs*n_subsets,
       update_objective_interval=n_subsets)
sgd.run(verbose=1)

and

$$ x_{k+1} = x_{k} - \gamma_{k} \nabla f_{i_{k}} = x_{k} - \gamma_{k} A_{i_{k}}^{T}(A_{i_{k}} x - d_{i_{k}}) $$

Notice that in cases 1) and 2), there is no dependence on the number of subsets. Therefore finding the argmin is ok. However comparing the objective values for the deterministic vs stochastic algorithms we need the $\frac{1}{n}$ in front of LeastSquares and SumFunction.

zeljkozeljko · 2022-09-22T09:05:10Z

Wrappers/Python/cil/optimisation/functions/Function.py

+        self.data_passes = [0]
+        self.sampling = sampling  


Might be useful to also track the used subset_num used per iteration, by say creating an attribute self.subset_use = [subset_init], and then appending to it in each iteration

The current implementation does not seem to track the subset_use. One can create an attribute self.subset_use here, and then append to it after calling next_subset with a given function (or appending to it within the next_subset method, but then making sure that svrg does not call next_subset directly after it has updated the snapshots)

Also, data_passes doesn't seem to be tracked/updated (even though it is initialised here). Was this the intended use?

zeljkozeljko · 2022-09-22T09:11:08Z

Wrappers/Python/cil/optimisation/functions/Function.py

+        else:
+            raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))          
+


this is the point where one could append self.subset_num to the list self.subset_use

paskino · 2022-09-29T14:09:41Z

I'm trying to resume the conversation we had with @KrisThielemans @epapoutsellis @gfardell

With this PR we want to propose a structure in CIL for stochastic optimisation with gradient type algorithms. A number of choices have already been made, the most relevant being that we decided not to develop stochastic algorithms rather a special class of functions that, plugged in an algorithm like GD, will make it behave like SGD. Same applies for ISTA and FISTA.

As @epapoutsellis wrote we want to be able to solve optimisation problems of the form

$$ \text{argmin}_u F(u, b) + \alpha R(u) $$

where $F$ is a data fitting term, $b$ is the measured data, $R$ is the regulariser and $\alpha$ the regularisation parameter. $F$ is supposed to be smooth, while the regulariser might not be, and in such case we'd use an appropriate algorithm.

At the bottom of this PR we rewrite the $F$ as a separable sum

$$ F(u,b) = \frac{1}{n} \sum^n f_i(u,b) $$

The base class for this in CIL is the SubsetSumFunction. @KrisThielemans argued that the presence of the $1/n$ multiplicative factor makes the previous not exactly a separable sum, but an averaged separable sum, hence the name SubsetSumFunction would not be describing the object correctly, proposing to use something that contained average.

Most of the algorithms we are planning to implement, have been published following this convention. The optimisation problem becomes

$$ \text{argmin}_u \frac{1}{n} \sum^n f_i(u,b) + \alpha R(u) $$

I quote @epapoutsellis from a private communication,

If you change the number of subsets, you change the strength of your fidelity term, then you need to adjust the strength of your regulariser, i.e., alpha, in order to have the same argmin in every optimisation problem.

Also, the objective value is not immediately comparable between 2 set ups with different number of subsets, but it needs to be scaled by it.

This means that with the convention

$$ F(u,b) = \frac{1}{n} \sum^n f_i(u,b) $$

the user will need to be extra careful in scaling the regularisation parameter and the objective value with the number of subsets.

However, if we were to choose the convention for which the separable sum is a separable sum, i.e. it does not have the $1/n$ scaling:

$$ F(u,b) = \sum^n f_i(u,b) $$

the name SubsetSumFunction makes sense (to @KrisThielemans )
we would not have any problem of comparison of algorithms with different amount of subsets

Conversely, it would add a burden on the developers team as:

each subset function needs to take into account the missing $1/n$ which seems to be the convention in the papers
comparison with literature results (following the $1/n$ convention) will need to be appropriately scaled

epapoutsellis · 2022-11-03T07:21:35Z

I wonder if we can agree on how the user sets the sampling strategies, as these are also useful to SPDHG, see #1345 (comment).

Come to think of it, I think it might be better to isolate the "subset number generator" in a separate class. At some point, you'd want to use non-uniform sampling (in fact, already used in the spdhg paper if the penalty is a "subset" I believe). You don't really want people to have to modify SubsetSumFunction to enter their favorite sampler. It's probably also clearer if we just accept a Python discrete RNG in the constructor.

Yes, this is what we discussed in the previous meeting. The new class (suggested name: SubsetGenerator) will implement the next_subset method. And the new next_subset method will have only

self.subset_num = SubsetGenerator ( num_subset, `sampling_method`)

epapoutsellis · 2022-11-03T07:34:05Z

updated the link in my comment, which now points to #1344 (comment) which describes a "subset number generator" class, in fact.

that class is somewhat different. It points to a "all angles in all subset generator". That is something useful as well, but is related to the generation of the Ai etc. This isn't quite what we need here I think, as this is just about "which subset (or "i") do I choose next). This is just a random (or sequential or whatever) number from 0....N.

Yes, the class in #1344 (comment) will be used to generate operators (Ai) and data (di) and is prior to running the algorithm. The next_subset method which has the two basic strategies (replacement=True or False) is related to which function (fi) is going to be selected and how in every iteration/epoch.

paskino · 2022-11-03T07:49:09Z

Why closing this?

epapoutsellis · 2022-11-03T07:52:02Z

Why closing this?

I reopened it

KrisThielemans · 2022-11-03T09:56:50Z

this is what we discussed in the previous meeting. The new class (suggested name: SubsetGenerator) will implement the next_subset method. And the new next_subset method will have only
self.subset_num = SubsetGenerator ( num_subset, `sampling_method`)

I think this is slightly wrong. I suggest something like

generator =  SubsetGenerator ( num_subsets, `sampling_method`)
subset_func = SGFunction(fi_cil, generator)
...

# somewhere in SGFunction
 subset_num = self.subset_num_generator()

epapoutsellis · 2022-11-14T07:01:11Z

@KrisThielemans , @paskino is the following ok?

import numpy as np

class SubsetGenerator(object):

    def __init__(self, num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):

        self.num_subsets = num_subsets
        self.sampling_method = sampling_method
        self.replacement = replacement
        self.shuffle = shuffle        
        self.subset_num = -1
        self.index = 0       

        if self.replacement is False:
                        
            # create a list of subsets without replacement, first permutation
            self.list_of_subsets = np.random.choice(range(self.num_subsets),self.num_subsets, replace=False)            
                       
            if self.shuffle not in ["random","single"]:
                raise NotImplementedError("Only {} and {} are implemented for the replacement=False case.".format("random","single"))            
                         
    def next_subset(self):
            
            if self.sampling_method=="random" :
                
                if self.replacement is False:
                    
                    # list of subsets already permuted
                    self.subset_num = self.list_of_subsets[self.index]
                    self.index+=1                
                                    
                    if self.index == self.num_subsets:
                        self.index=0
                        
                        # For random shuffle, at the end of each epoch, we permute the list again                    
                        if self.shuffle=="random":                    
                            self.list_of_subsets = np.random.choice(range(self.num_subsets),self.num_subsets, replace=False)                                                                                          
                else:
                    
                    # at each iteration (not epoch) a subset is randomly selected
                    self.subset_num = np.random.randint(0, self.num_subsets)
                
            elif self.sampling_method=="sequential":    
                
                if self.subset_num + 1 >= self.num_subsets:
                    self.subset_num = 0                
                else:
                    self.subset_num += 1         
            else:
                raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))          
            
            return self.subset_num

and is used as

generator = SubsetGenerator(self.n_subsets)
self.F_SG = SGFunction(self.fi_cil, generator)

and the next_subset method of SubsetSumFunction is

    def next_subset(self):
        
        # Select the next subset using the subset_generator
        
        self.subset_num = self.subset_generator.next_subset()
        self.subsets_used.append(self.subset_num)

KrisThielemans · 2022-11-14T09:08:34Z

I think this is nice. What you wrote is not a SubsetGenerator, but a SubsetNumberGenerator.

However, it could be made even nicer if you'd be able to make it a Python generator. close to the C++ "distributions".

Usage would be

generator = make_subset_num_generator(6)
self.F_SG = SGFunction(self.fi_cil, generator) 
...def next_subset(self):
        self.subset_num = next(self.subset_generator)

I had hoped to that the Python random number generator classes would follow this paradigm, such that we could just as well have said

rng = numpy.random.default_rng()
# doesn't exist
generator = np.random.int_distribution(rng,0, num_subsets)

but it seems that the NumPy people don't like this (or are thinking mostly in terms of returning arrays, which isn't needed here).

Supposing you like this design, I'm not so sure how to implement it... Possibly something like this

def make_subset_num_generator(self, num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):
     # code from __init__
      self.num_subsets = num_subsets
      etc      

     while(True):
        # code from next_subset 
        if self.replacement is False:
             etc
       yield subset_num

It would probably more efficient to put the yield statement wherever you currently assign, but who cares about efficiency here?

Note that this is no longer a class, so I've changed naming convention (using the one from STIR/SIRF, not sure what CIL uses for functions).

epapoutsellis · 2022-11-14T16:01:36Z

No problem to make it function/generator, something like

def subset_num_generator(num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):

        num_subsets = num_subsets
        sampling_method = sampling_method
        replacement = replacement
        shuffle = shuffle        
        subset_num = -1
        index = 0       

        if replacement is False:
                        
            # create a list of subsets without replacement, first permutation
            list_of_subsets = np.random.choice(range(num_subsets),num_subsets, replace=False)            
                       
            if shuffle not in ["random","single"]:
                raise NotImplementedError("Only {} and {} are implemented for the replacement=False case.".format("random","single"))            

        while(True):                
            
            if sampling_method=="random" :
            
                if replacement is False:
                    
                    # list of subsets already permuted
                    subset_num = list_of_subsets[index]
                    index+=1                
                                    
                    if index == num_subsets:
                        index=0
                        
                        # For random shuffle, at the end of each epoch, we permute the list again                    
                        if shuffle=="random":                    
                            list_of_subsets = np.random.choice(range(num_subsets),num_subsets, replace=False)                                                                                          
                else:
                    
                    # at each iteration (not epoch) a subset is randomly selected
                    subset_num = np.random.randint(0, num_subsets)
                
            elif sampling_method=="sequential":    
                
                if subset_num + 1 >= num_subsets:
                    subset_num = 0                
                else:
                    subset_num += 1         
            else:
                raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))          
            
            yield subset_num

    sng = subset_num_generator(10, replacement=False)
    for i in range(20): 
               
        if i>=1 and np.mod(i,10)==0:
            print("new epoch")
            print(next(sng))
        else:
            print(next(sng))

@paskino is this ok? Where should we add this function?At the moment, I have created a new script inside Functions directory.

…ionNumberGenerator

epapoutsellis · 2022-11-18T15:54:04Z

@KrisThielemans, @paskino, @jakobsj @zeljkozeljko

Changes after Hackathon meeting (17/11/2022) and CIL-Dev meeting (18/11/2022):

A utilities directory is added in the cil.optimisation that contains the FunctionNumberGenerator.py

Accepted/Implemented sampling_method for the FunctionNumberGenerator.py are : random, random_permutation, fixed_permutation.

random: every function is selected randomly, with replacement.

fng = FunctionNumberGenerator(10, sampling_method="random")
generated_numbers=[print(next(fng),end=' ') for _ in range(2*10)] # epochs = 2

0 8 3 0 3 0 1 0 9 6 0 8 6 4 0 3 6 0 1 7

random_permutation: every function is selected randomly without replacement. At the end of each epoch, i.e., after all the functions have been selected, the list of functions is permuted again.

fng = FunctionNumberGenerator(10, sampling_method="random_permutation")
generated_numbers=[print(next(fng),end=' ') for _ in range(2*10)]

8 1 2 6 9 3 7 0 4 5 0 6 1 9 3 8 5 2 7 4

fixed_permutation: the list of functions is permuted one time at the beginning (without replacement) and is used for all epochs

fng = FunctionNumberGenerator(10, sampling_method="fixed_permutation")
generated_numbers=[print(next(fng),end=' ') for _ in range(2*10)]

5 6 4 1 0 9 3 7 8 2 5 6 4 1 0 9 3 7 8 2

SubsetSumFunction is remaned to ApproximateGradientSumFunction
ApproximateGradientSumFunction can accept either a string for the selection of functions or instance of the FunctionNumberGenerator
The subset term is removed from ApproximateGradientSumFunction and SGFunction
Unittests for ApproximateGradientSumFunction and SGFunction are updated

Example code for the LS problem

import numpy as np
from cil.optimisation.algorithms import GD
from cil.optimisation.functions import ApproximateGradientSumFunction, LeastSquares, SGFunction
from cil.optimisation.utilities import FunctionNumberGenerator
from cil.optimisation.operators import MatrixOperator
from cil.framework import VectorData


# create LS problem (Ax = b)
np.random.seed(10)
n = 50
m = 500
n_subsets = 10

Anp = np.random.uniform(0,1, (m, n)).astype('float32')
xnp = np.random.uniform(0,1, (n,)).astype('float32')
bnp = Anp.dot(xnp)

Ai = np.vsplit(Anp, n_subsets) 
bi = [bnp[i:i+n] for i in range(0, m, n)]     

Aop = MatrixOperator(Anp)
bop = VectorData(bnp) 
ig = Aop.domain        
x_cil = ig.allocate('random')

fi_cil = []
for i in range(n_subsets):   
    Ai_cil = MatrixOperator(Ai[i])
    bi_cil = VectorData(bi[i])
    fi_cil.append(LeastSquares(Ai_cil, bi_cil, c=1.0))

    
F_SG = SGFunction(fi_cil)  # default selection = "random" 

# or 
# F_SG = SGFunction(fi_cil, selection="random_permutation")  

# or 
# generator = FunctionNumberGenerator(n_subsets, sampling_method="fixed_permutation")
# F_SG = SGFunction(fi_cil, generator)

step_size = 1./F_SG.L
initial = ig.allocate()
epochs = 200
sgd = GD(initial = initial, objective_function = F_SG, step_size = step_size,
            max_iteration = epochs * n_subsets, 
            update_objective_interval = 10*n_subsets)
sgd.run(verbose=1)   

print("Function selected")
k=0
for i in range(0, epochs, 20):
    print("Epoch : {}, function used : {}".format(i, F_SG.functions_used[k:n_subsets+k]))
    k+=n_subsets

# import cvxpy
# u_cvxpy = cvxpy.Variable(ig.shape[0])
# objective = cvxpy.Minimize( 0.5*cvxpy.sum_squares(Aop.A @ u_cvxpy - bop.array))
# p = cvxpy.Problem(objective)
# p.solve(verbose=True, solver=cvxpy.SCS, eps=1e-4)     

# np.testing.assert_allclose(p.value, sgd.objective[-1], atol=1e-1)
# np.testing.assert_allclose(u_cvxpy.value, sgd.solution.array, atol=1e-1)

     Iter   Max Iter     Time/Iter            Objective
                               [s]                     
        0       2000         0.000          9.99876e+04
      100       2000         0.006          4.66201e+01
      200       2000         0.006          1.37573e+01
      300       2000         0.006          4.64783e+00
      400       2000         0.006          1.73196e+00
      500       2000         0.006          7.08509e-01
      600       2000         0.005          2.99555e-01
      700       2000         0.005          1.28043e-01
      800       2000         0.005          5.58902e-02
      900       2000         0.005          2.55412e-02
     1000       2000         0.005          1.11664e-02
     1100       2000         0.005          5.23994e-03
     1200       2000         0.005          2.30367e-03
     1300       2000         0.005          1.05134e-03
     1400       2000         0.005          4.96243e-04
     1500       2000         0.005          2.25485e-04
     1600       2000         0.005          1.04333e-04
     1700       2000         0.005          4.82236e-05
     1800       2000         0.005          2.21038e-05
     1900       2000         0.005          1.01833e-05
     2000       2000         0.005          4.69173e-06
-------------------------------------------------------
     2000       2000         0.005          4.69173e-06
Stop criterion has been reached.

Function selected
Epoch : 0, function used : [4, 0, 9, 6, 0, 1, 5, 1, 5, 5]
Epoch : 20, function used : [1, 3, 6, 4, 6, 3, 8, 5, 6, 8]
Epoch : 40, function used : [5, 4, 8, 9, 3, 1, 8, 5, 6, 0]
Epoch : 60, function used : [3, 4, 3, 6, 1, 6, 0, 9, 5, 8]
Epoch : 80, function used : [6, 5, 6, 3, 2, 0, 9, 1, 9, 7]
Epoch : 100, function used : [1, 0, 8, 6, 3, 0, 6, 3, 5, 2]
Epoch : 120, function used : [5, 9, 7, 7, 2, 0, 8, 8, 3, 7]
Epoch : 140, function used : [9, 3, 6, 0, 7, 7, 1, 1, 8, 8]
Epoch : 160, function used : [8, 7, 8, 0, 4, 7, 5, 9, 1, 5]
Epoch : 180, function used : [2, 2, 3, 1, 0, 0, 0, 0, 5, 5]

epapoutsellis · 2022-11-18T17:47:12Z

@zeljkozeljko thank you for your quick comments. In my latest commit, I have implemented as you suggested but I changed them again today after the discussion we had with @paskino, @jakobsj. I have not spent time fixing docs.

@KrisThielemans , @paskino, @jakobsj @zeljkozeljko before starting reviewing this PR, could we agree on the following:

Name and signature of the FunctionNumberGenerator.
Name and signature of the ApproximateGradientSumFunction.

KrisThielemans

Looks mostly ok to me. I don't know CIL naming conventions. In STIR/SIRF, CamelCase like FunctionNumberGenerator implies it is a class (which it isn't). That is one for @paskino .

One problem is duplication of doc on the generator in all derived classes, which will be terrible.
Possible solutions:

refer to doc of FunctionNumberGenerator (or whatever that ends up being called).
add a member set_selection_generator to ApproximateGradientSumFunction. Someone might want to change this over epochs. Doc of all __init__ functions could then refer to the doc of that member (which is only written once).
give up passing strings in all those constructors and enforce that it has be a generator.

KrisThielemans · 2022-11-19T09:36:02Z

Wrappers/Python/cil/optimisation/functions/Function.py

+    -----------
+    functions : list(functions) 
+                A list of functions: :code:`[F_{1}, F_{2}, ..., F_{n}]`. Each function is assumed to be smooth function with an implemented :func:`~Function.gradient` method.
+    selection : :obj:`string`, Default = :code:`random`. It can be :code:`random`, :code:`random_permutation`,  :code:`fixed_permutation`.


good idea to allow a string here, but the doc needs to mention the generator option. Also, I recommend referring to the doc of FunctionNumberGenerator for allowed strings. Otherwise things are going to do desynchronise

KrisThielemans · 2022-11-19T09:41:18Z

Wrappers/Python/cil/optimisation/functions/SGFunction.py

+    sampling : :obj:`string`, Default = :code:`random`
+               Selection process for each function in the list. It can be :code:`random` or :code:`sequential`. 
+    replacement : :obj:`boolean`. Default = :code:`True`
+               The same subset can be selected when :code:`replacement=True`. 


somehow avoid duplication as this is incorrect

KrisThielemans · 2022-11-19T09:41:37Z

Wrappers/Python/cil/optimisation/functions/SGFunction.py

+
+    .. math:: \partial F_{i}(x) .
+
+    The ith function is selected from the :meth:`~SubsetSumFunction.next_subset` method.


KrisThielemans · 2022-11-19T09:41:52Z

Wrappers/Python/cil/optimisation/functions/SGFunction.py

+    Note
+    ----
+
+    The :meth:`~SGFunction.gradient` computes the `gradient` of one function from the list :math:`[F_{1},\cdots,F_{n}]`,


KrisThielemans · 2022-11-19T09:42:04Z

Wrappers/Python/cil/optimisation/functions/SGFunction.py

+
+    def approximate_gradient(self,subset_num, x, out):
+
+        """ Returns the gradient of the selected function at :code:`x`. The function is selected using the :meth:`~SubsetSumFunction.next_subset`


KrisThielemans · 2022-11-19T09:43:46Z

Wrappers/Python/test/test_ApproximateGradientSumFunction.py

+
+initialise_tests()
+
+class TestApproximateGradientSumFunction(unittest.TestCase):


no test for the string argument

KrisThielemans · 2022-11-19T09:46:51Z

Wrappers/Python/cil/optimisation/utilities/FunctionNumberGenerator.py

+
+    if replacement is False:                    
+        # create a list of functions without replacement, first permutation
+        list_of_functions = np.random.choice(range(num_functions),num_functions, replace=False)            


I guess this variable should be called list_of function_nums. I'd also prefix with _ to make it "private"

KrisThielemans · 2022-11-19T09:54:25Z

Wrappers/Python/cil/optimisation/utilities/FunctionNumberGenerator.py

+            Selection process for each function in the list. It can be :code:`random`, :code:`random_permutation`,  :code:`fixed_permutation`.               
+               -  :code:`random`: Every function is selected randomly with replacement.
+               -  :code:`random_permutation`: Every function is selected randomly without replacement. After selecting all the functions in the list, i.e., after one epoch, the list is randomly permuted.
+               -  :code:`fixed_permuation`: Every function is selected randomly without replacement and the list of function is permuted only once.


I don't see why anyone would want to use fixed_permutation. If anything, they'd want to specify an order. No harm in keeping it of course.

Probably the only 2 fixed orders that make sense are 0,1,2,3... and 0,n/2,n/3,3n/4,... (with the last appropriate when the functions correspond to projections ordered by angle). Maybe you could call these "fixed_sequential" and "fixed_staggered"?

@robbietuk would have code to implement the "staggered" approach

KrisThielemans · 2022-11-19T10:00:58Z

Wrappers/Python/cil/optimisation/functions/Function.py

+
+        # Select the next function using the selection:=function_number_generator        
+        self.function_num = next(self.selection)
+        self.functions_used.append(self.function_num)


this should do a range check and write a nice error message if out of range

epapoutsellis · 2022-11-19T22:45:43Z

Thank you @KrisThielemans for your comments, before addressing your comments and @zeljkozeljko, I think we need to have a final decision for the design. Please have a look #1377.

KrisThielemans · 2022-11-20T18:21:21Z

You might consider using the new interface of numpy.random which uses (quite confusingly named) Generators, see https://numpy.org/doc/stable/reference/random/index.html

rng = np.random.default_rng(optional_seed)
rints = rng.integers(low=0, high=10, size=3)

it would allow passing a seed such that the generated sequence is reproducible. (cannot do that with random.choice as you don't know who's going to call that as well). Of course, that could be done later.

KrisThielemans · 2022-12-02T11:08:35Z

Recording part of an email conversation here on adding capabilities for non-uniform random sampling. @epapoutsellis had a proposal but noted that algorithms might need to ask what those probablities are. I think that leads back to the need of having a class. At that point, a "generator" is probably not useful anymore and we can revert to having an iterator interface

def IndexGenerator:
  def __init__(lots of arguments here)
  def get_probabilities(self)
  def uses_replacement(self)
  # either this is an iteratable object 
  def iter(self) # could return a generator as far as I know
  # or it is already an iterator
  def __iter__(self)
  def __next__(self)

and we’d have to decide to use a string+extra arguments
'''python
ig = IndexGenerator(num_functions, “random”, probabilities=[.1, .3, .2, …])
gen = ig.iter()
next(gen)

or use a hierarchy of derived classes, e.g.
```python
  def RandomIndexGenerator: IndexGenerator
     def __init__ (either_num_functions_or_list_of_probabilities, replacement=True)
     ...
  
 ig = RandomIndexGenerator (5)

where I think the latter design is more easily extendable.

epapoutsellis · 2022-12-08T11:56:05Z

@KrisThielemans @paskino @jakobsj see below my proposal. I created a base class RandomSampling that creates random indices or batches (e.g. mini-batch SGD). Depending on the batch_size, we have control of which of the RandomIndex or RandomBatch classes will be created. The signature remains the same for these two classes and the __next__ method is different. I also included some static methods in the base class that work for different batch_sizes. I guess for the names we need to decide. I have no objection.

import numpy as np
import random

class RandomSampling():
    
    def __new__(cls, num_indices, batch_size=1, prob = None, replace = True, shuffle=False, seed = None):
        
        cls.batch_size = batch_size
        
        if cls.batch_size == 1:
            return super(RandomSampling, cls).__new__(RandomIndex)
        else:
            return super(RandomSampling, cls).__new__(RandomBatch)
    
    def __init__(self, num_indices, batch_size=1, prob = None, replace = True, shuffle=False, seed = None ):
        
        self.num_indices = num_indices
        self.batch_size = batch_size
        self.equal_size_batches = self.num_indices%self.batch_size==0        
        if self.equal_size_batches:
            self.num_batches = self.num_indices//self.batch_size
        else:
            self.num_batches = (self.num_indices//self.batch_size)+1        
        self.prob = prob
        self.replace = replace
        self.shuffle = shuffle
        self.indices_used = []
        self.index = 0
        np.random.seed(seed)
                            
        if self.replace is False: 
            self.list_of_indices = np.random.choice(num_indices, size=self.num_indices, p=prob, replace=False)                                        
        else:
            if shuffle is True and self.batch_size==1:
                raise ValueError("Shuffle is used only with replace=False")   
            self.list_of_indices = np.random.choice(num_indices, size=self.num_indices, p=prob, replace=True)         
            
        if self.batch_size>1: 
            self.partition_list = [self.list_of_indices[i:i + self.batch_size] for i in range(0, self.num_indices, self.batch_size)]             
            
    def show_epochs(self, epochs):
        
        total_its = epochs * self.num_indices
        for _ in range(total_its):
            next(self)
                    
        if self.batch_size==1:
            k = 0
            for i in range(epochs):
                print(" Epoch : {}, indices used : {} ".format(i, self.indices_used[k:k+self.num_indices]))    
                k+=self.num_indices 
            print("")   
        else:
            k=0
            for i in range(epochs):
                print(" Epoch : {}, batchs used : {} ".format(i, self.indices_used[k:k+num_batches]))                 
                k += num_batches
                
            
    @staticmethod    
    def uniform(num_indices, batch_size = 1, seed=None):
        return RandomSampling(num_indices,  batch_size=batch_size, prob=None, replace = True, shuffle = False, seed = seed)
    
    @staticmethod
    def non_uniform(num_indices, prob, batch_size = 1, seed=None):
        return RandomSampling(num_indices, batch_size=batch_size, replace = True, prob=prob, shuffle = False, seed = seed) 
    
    @staticmethod    
    def uniform_no_replacement(num_indices, batch_size = 1, shuffle=False, seed=None):
        return RandomSampling(num_indices, batch_size=batch_size, prob=None, replace = False, shuffle = shuffle, seed = seed) 
    
    @staticmethod    
    def non_uniform_no_replacement(num_indices, prob, batch_size = 1, shuffle=False, seed=None):
        return RandomSampling(num_indices, batch_size=batch_size, prob=prob, replace = False, shuffle = shuffle, seed = seed)     
        
    @staticmethod
    def single_shuffle(num_indices, batch_size = 1, seed=None):
        return RandomSampling(num_indices, batch_size = batch_size, replace = False, shuffle = False, seed = seed)
    
    @staticmethod
    def random_shuffle(num_indices, batch_size = 1, seed=None):
        return RandomSampling(num_indices, batch_size = batch_size, replace = False, shuffle = True, seed = seed)              
            

class RandomBatch(RandomSampling):
            
    def __next__(self):
        
        tmp = list(self.partition_list[self.index])
        self.indices_used.append(tmp)         
        self.index+=1
        
        if self.index==len(self.partition_list):
            self.index=0            
            
            if self.shuffle is True:
                self.list_of_indices = np.random.choice(self.num_indices, size=self.num_indices, p=self.prob, replace=self.replace)        
                self.partition_list = [self.list_of_indices[i:i + self.batch_size] for i in range(0, self.num_indices, self.batch_size)]
                                               
        return tmp
    
class RandomIndex(RandomSampling):   
    
    def __next__(self):
        
        if self.replace is False:

            index_num = self.list_of_indices[self.index]
            self.index+=1                

            if self.index == self.num_indices:
                self.index = 0                
                if self.shuffle is True:                    
                    self.list_of_indices = np.random.choice(self.num_indices, size=self.num_indices, p=self.prob, replace=False)                                                                                         
        else:

            index_num = np.random.choice(self.num_indices, size=1, p=self.prob, replace=True).item()

        self.indices_used.append(index_num)

        return index_num

Examples

Our users can do, in the case of batch_size=1

rs = RandomSampling.uniform(10)
rs.show_epochs(epochs=3)

 Epoch : 0, indices used : [8, 1, 4, 8, 5, 9, 1, 9, 0, 3] 
 Epoch : 1, indices used : [3, 4, 9, 2, 4, 0, 7, 9, 4, 1] 
 Epoch : 2, indices used : [9, 6, 6, 8, 9, 6, 2, 0, 1, 5]

rs = RandomSampling.random_shuffle(10, seed=100)
rs.show_epochs(epochs=3)

 Epoch : 0, indices used : [7, 6, 1, 5, 4, 2, 0, 3, 9, 8] 
 Epoch : 1, indices used : [7, 8, 5, 3, 4, 6, 0, 1, 9, 2] 
 Epoch : 2, indices used : [9, 7, 1, 8, 5, 4, 3, 2, 6, 0]

tmp = [random.random() for i in range(10)]
tmp_sum = sum(tmp)
prob = [i/tmp_sum for i in tmp]
rs = RandomSampling.non_uniform(10, prob=prob)
rs.show_epochs(epochs=3)

 Epoch : 0, indices used : [2, 5, 7, 7, 2, 1, 6, 6, 2, 7] 
 Epoch : 1, indices used : [7, 4, 9, 9, 7, 1, 6, 1, 4, 9] 
 Epoch : 2, indices used : [6, 5, 6, 2, 5, 2, 8, 2, 2, 2]

And for the case where batch_size>1

rs = RandomSampling.uniform(10, batch_size=5)
rs.show_epochs(epochs=3)

 Epoch : 0, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]] 
 Epoch : 1, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]] 
 Epoch : 2, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]]

rs = RandomSampling.single_shuffle(10, batch_size=2, seed=100)
rs.show_epochs(epochs=3)

 Epoch : 0, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]] 
 Epoch : 1, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]] 
 Epoch : 2, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]]

Comparison `ISTA`, `proxSGD`, `proxMBSGD`

For the 3D walnut dataset , I reconstructed one 2D slice with

 Acquisition Geometry 2D: (1601, 392) with labels ('angle', 'horizontal')
 Image Geometry 2D: (392, 392) with labels ('horizontal_y', 'horizontal_x')

For the $optimal=x^{*}$, I used FISTA with 1000 iterations.

num_subsets=100 and n_epochs=40.
The subsets are generated using AcquisitionGeometrySubsetGenerator with random_permutation.
For both proxSGD and proxMBSGD, I use uniform sampling with replacement and batch_size=5 for the latter.
step_size is the same for both.

rs2 = RandomSampling(len(f_subsets), batch_size=5) 
F2 = SGFunctionBase(f_subsets, selection=rs2)

For the proxMBSGD we need to add some lines in the approximate_gradient method as

    if isinstance(self.function_num, int):
        self.functions[self.function_num].gradient(x, out=out)
        out*=self.num_functions        
    else:
        tmp = 0
        # function_num is a list of lists, prange???
        for i in self.function_num:
            tmp += self.functions[i].gradient(x)        
        out.fill(tmp)
        out *= self.selection.num_batches

MargaretDuff · 2024-03-26T12:37:53Z

Superseded by #1550

epapoutsellis added 3 commits August 17, 2022 12:40

add num_functions attribute to SumFunction

9b22353

Adds SubsetSumFunction

670e3c8

Add SGDFunction

4f63f10

epapoutsellis requested review from paskino, gfardell, lauramurgatroyd, BillyTang, ClaireDelplancke, gschramm, Imraj-Singh, jakobsj, samdporter and zeljkozeljko August 17, 2022 22:36

add tests fos SGDFunction

5526597

epapoutsellis added the Waiting for review label Aug 18, 2022

epapoutsellis added 2 commits August 23, 2022 17:45

precond as callable function

277b85d

remove out is not None case

a29fd55

epapoutsellis added a commit to epapoutsellis/CIL that referenced this pull request Aug 23, 2022

fix SGD TomographicImaging#1345, update SAG, SAGA

11331f7

epapoutsellis added a commit to epapoutsellis/CIL that referenced this pull request Aug 25, 2022

sync tes_SGD from TomographicImaging#1345, rename SAG test

4695760

Improve cvxpy tests

c0881d0

KrisThielemans reviewed Sep 10, 2022

View reviewed changes

zeljkozeljko reviewed Sep 22, 2022

View reviewed changes

epapoutsellis mentioned this pull request Sep 27, 2022

Add preconditioning in GD #1359

Closed

epapoutsellis closed this Nov 3, 2022

epapoutsellis reopened this Nov 3, 2022

epapoutsellis added 2 commits November 18, 2022 14:56

add optimisation utilities, FunctionNumberGenerator

5a0aaf0

rename SubsetSumFunction to ApproximateGradientSumFunction, use Funct…

8f4d25c

…ionNumberGenerator

epapoutsellis changed the title ~~Subset sum function and SGDFunction~~ ApproximateGradientSumFunction and SGFunction Nov 18, 2022

add optimisation utilities

b1b99b9

epapoutsellis added 3 commits November 18, 2022 17:58

update setup.py

50aad5d

update setup.py

fd2c75e

update setup.py

a81b01f

KrisThielemans reviewed Nov 19, 2022

View reviewed changes

paskino mentioned this pull request Nov 25, 2022

StochasticAlgorithm with Gradient Descent and FISTA example #439

Closed

lauramurgatroyd removed their request for review July 28, 2023 09:46

epapoutsellis mentioned this pull request Feb 15, 2024

SGD with mini-batches #1707

Open

paskino mentioned this pull request Mar 25, 2024

ApproximateGradientSumFunction class and example SGFunction for Stochastic Gradient Descent #1550

Merged

8 tasks

MargaretDuff closed this Mar 26, 2024


		r"""SubsetSumFunction represents the following sum

		.. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n})


		.. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n})

		where :math:`n` is the number of subsets.

		else:
		raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))


		.. math:: \partial F_{i}(x) .

		The ith function is selected from the :meth:`~SubsetSumFunction.next_subset` method.


		def approximate_gradient(self,subset_num, x, out):

		""" Returns the gradient of the selected function at :code:`x`. The function is selected using the :meth:`~SubsetSumFunction.next_subset`


		initialise_tests()

		class TestApproximateGradientSumFunction(unittest.TestCase):

ApproximateGradientSumFunction and SGFunction #1345

ApproximateGradientSumFunction and SGFunction #1345

Conversation

epapoutsellis commented Aug 17, 2022 • edited by paskino Loading

Describe your changes

SubsetSumFunction

Example

Note

Example

SGDFunction

Example

GD vs SGD in CIL

Proximal Gradient Algorithms in CIL

Describe any testing you have performed

Demos

Link relevant issues

Checklist when you are ready to request a review

Contribution Notes

epapoutsellis commented Aug 18, 2022

KrisThielemans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paskino commented Sep 15, 2022

KrisThielemans commented Sep 15, 2022

epapoutsellis commented Sep 16, 2022 • edited Loading

epapoutsellis commented Sep 21, 2022

zeljkozeljko Sep 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paskino commented Sep 29, 2022

epapoutsellis commented Nov 3, 2022

epapoutsellis commented Nov 3, 2022

paskino commented Nov 3, 2022

epapoutsellis commented Nov 3, 2022

KrisThielemans commented Nov 3, 2022

epapoutsellis commented Nov 14, 2022

KrisThielemans commented Nov 14, 2022

epapoutsellis commented Nov 14, 2022

epapoutsellis commented Nov 18, 2022 • edited Loading

epapoutsellis commented Nov 18, 2022

KrisThielemans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epapoutsellis commented Nov 19, 2022

KrisThielemans commented Nov 20, 2022

KrisThielemans commented Dec 2, 2022

epapoutsellis commented Dec 8, 2022

Examples

Comparison ISTA, proxSGD, proxMBSGD

MargaretDuff commented Mar 26, 2024

epapoutsellis commented Aug 17, 2022 •

edited by paskino

Loading

epapoutsellis commented Sep 16, 2022 •

edited

Loading

zeljkozeljko Sep 22, 2022 •

edited

Loading

epapoutsellis commented Nov 18, 2022 •

edited

Loading

Comparison `ISTA`, `proxSGD`, `proxMBSGD`