-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ApproximateGradientSumFunction and SGFunction #1345
ApproximateGradientSumFunction and SGFunction #1345
Conversation
@KrisThielemans (could not add you in the reviewers list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest some changes, but in particular to the design:
- I really don't think using 1/n is a good idea. It means that an
SGDFunction
is not a direct replacement of the original, but a replacement of1/num_subsets orig_function
. This means that every derived algorithm needs to know this, and adjust step size etc. - I suggest a change to have a
subset_gradient
function (withoutnext_subset()
) - There is no reason to put the preconditioner in
SGDFunction
. In principle, it could sit inSubsetSumFunction
, but in fact, it could sit inFunction
. I'm not so sure you want to have it in this hierarchy at all though. A gradient preconditioner is something for gradient algorithms, not for functions.
if len(functions) < 2: | ||
self.num_functions = len(functions) | ||
if self.num_functions < 2: | ||
raise ValueError('At least 2 functions need to be passed') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we'd want to throw an error here. Seems more useful to handle only 1 function, or even zero.
|
||
r"""SubsetSumFunction represents the following sum | ||
|
||
.. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we choose the 1/n normalisation? I don't find it very natural in some cases (not in KL, but also not when penalty is one of the functions). Also, it doesn't really fit the "SumFunction
class
return (1/self.num_subsets) * super(SubsetSumFunction, self).__call__(x) | ||
|
||
def _full_gradient(self, x, out=None): | ||
r""" Computes the full gradient at :code:`x`. It is the sum of all the gradients for each function. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why _full_gradient
and not full_gradient
?
Also, currently doc is wrong as it is the average of the gradients of all functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc is correct now. still remains why we'd call this _full_gradient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_full_gradient
is used in SVRG
although I am not sure if it is needed, see here, @zeljkozeljko any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "old" SubsetSumFunction
class had a _full_gradient
method, and SVRG had an attribute called full_gradient
, which was computed (updated) once every couple of epochs and it is the estimator of the full gradient of the objective at the snapshot/anchored image. This attribute is then used at every update and I would think we want to store (or at least have the option of storing it), and not recompute it at every update (otherwise why not use full GD). As to the naming, i'm not sure if the choice of the underscore in the method was because of some CIL convention, but of course the, as regards to the attribute in SVRG, this can be renamed (eg we could have snapshot_full_gradient
and then also snapshot_image
, or something similar), and reserve full_gradient
for the SubsetSumFunction
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be resolved now
|
||
.. math:: \frac{1}{n}\sum_{i=1}^{n} F_{i} = \frac{1}{n}(F_{1} + F_{2} + ... + F_{n}) | ||
|
||
where :math:`n` is the number of subsets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add some doc on what this intends to represent, i.e. the gradient
is an approximation using some subsets/history etc.
return (1/self.num_subsets) * super(SubsetSumFunction, self).gradient(x, out=out) | ||
|
||
def gradient(self, x, out=None): | ||
""" Computes the gradient for each subset function at :code:`x`.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think it would be better to let this function call next_subset()
and then an (undefined) subset_gradient()
. All derived classes then have to define subset_gradient()
, and never need to know anything about next_subset()
(I think). It also means that a user can call subset_gradient()
without "advancing" the subset
In your description you have 2 examples @epapoutsellis f = SubsetSumFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets)) and f = SGDFunction([LeastSquares(Ai, b=bi)]*n for Ai,bi in zip(A_subsets, b_subsets), sampling="random", replacement=False) I think they should be as follows: f = SubsetSumFunction(*[LeastSquares(Ai, b=bi) for Ai,bi in zip(A_subsets, b_subsets)]) and f = SGDFunction(*[LeastSquares(Ai, b=bi) for Ai,bi in zip(A_subsets, b_subsets)], \
sampling="random", replacement=False) Is it true? |
I think the |
For the preconditioner Short answer: Totally agree, the preconditioner should be a parameter for each of the algorithms, e.g., Long answer: However, we do not have this option atm for out proximal gradient algorithms. In addition, the a) preconditioner is fixed Notice that in case c) the (deterministic) proximal gradient algorithms have no info about the number of subsets. Atm, this is known from the self.objective_function.gradient(self.x, out=self.x_update)
self.x_update *= apply_precond(self.x, self.iteration)
self.x.sapyb(1.0, self.x_update, -self.step_size, out=self.x) Notice that the same can be applied for adaptive step-size, e.g., decreasing step size in SGD that satisfy certain conditions. Unless we have a fix step size or another method |
For the 1/n convention Short answer: I think it is important to have the Long answer: In the first hackathon, we implemented Example: In CIL we can solve the following LS problem
For 3) the For 1) : f = LeastSquares(Aop, b=bop, c = 0.5/n)
initial = ig.allocate()
gd = GD(initial = initial, objective_function = f, max_iteration=n_epochs,
update_objective_interval=1)
gd.run(verbose=1) The default For 2) : Here, we assume that average_sum = (1./n)*SumFunction(*fi_cil)
average_sum_gd = GD(initial = initial, objective_function = average_sum, max_iteration=n_epochs,
update_objective_interval=1)
average_sum_gd.run(verbose=1) Here we have, Finally for the SGD case, we have sgd_fun = SGDFunction(fi_cil)
step_size = 0.005
sgd = GD(initial = initial, objective_function = sgd_fun,
step_size = step_size, max_iteration=n_epochs*n_subsets,
update_objective_interval=n_subsets)
sgd.run(verbose=1) and Notice that in cases 1) and 2), there is no dependence on the number of subsets. Therefore finding the |
self.data_passes = [0] | ||
self.sampling = sampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to also track the used subset_num
used per iteration, by say creating an attribute self.subset_use = [subset_init]
, and then appending to it in each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation does not seem to track the subset_use
. One can create an attribute self.subset_use
here, and then append to it after calling next_subset
with a given function (or appending to it within the next_subset
method, but then making sure that svrg does not call next_subset
directly after it has updated the snapshots)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, data_passes
doesn't seem to be tracked/updated (even though it is initialised here). Was this the intended use?
else: | ||
raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the point where one could append self.subset_num
to the list self.subset_use
I'm trying to resume the conversation we had with @KrisThielemans @epapoutsellis @gfardell With this PR we want to propose a structure in CIL for stochastic optimisation with gradient type algorithms. A number of choices have already been made, the most relevant being that we decided not to develop stochastic algorithms rather a special class of functions that, plugged in an algorithm like As @epapoutsellis wrote we want to be able to solve optimisation problems of the form where At the bottom of this PR we rewrite the The base class for this in CIL is the Most of the algorithms we are planning to implement, have been published following this convention. The optimisation problem becomes I quote @epapoutsellis from a private communication,
Also, the objective value is not immediately comparable between 2 set ups with different number of subsets, but it needs to be scaled by it. This means that with the convention the user will need to be extra careful in scaling the regularisation parameter and the objective value with the number of subsets. However, if we were to choose the convention for which the separable sum is a separable sum, i.e. it does not have the
Conversely, it would add a burden on the developers team as:
|
Yes, this is what we discussed in the previous meeting. The new class (suggested name:
|
Yes, the class in #1344 (comment) will be used to generate operators (Ai) and data (di) and is prior to running the algorithm. The |
Why closing this? |
I reopened it |
I think this is slightly wrong. I suggest something like generator = SubsetGenerator ( num_subsets, `sampling_method`)
subset_func = SGFunction(fi_cil, generator)
...
# somewhere in SGFunction
subset_num = self.subset_num_generator() |
@KrisThielemans , @paskino is the following ok? import numpy as np
class SubsetGenerator(object):
def __init__(self, num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):
self.num_subsets = num_subsets
self.sampling_method = sampling_method
self.replacement = replacement
self.shuffle = shuffle
self.subset_num = -1
self.index = 0
if self.replacement is False:
# create a list of subsets without replacement, first permutation
self.list_of_subsets = np.random.choice(range(self.num_subsets),self.num_subsets, replace=False)
if self.shuffle not in ["random","single"]:
raise NotImplementedError("Only {} and {} are implemented for the replacement=False case.".format("random","single"))
def next_subset(self):
if self.sampling_method=="random" :
if self.replacement is False:
# list of subsets already permuted
self.subset_num = self.list_of_subsets[self.index]
self.index+=1
if self.index == self.num_subsets:
self.index=0
# For random shuffle, at the end of each epoch, we permute the list again
if self.shuffle=="random":
self.list_of_subsets = np.random.choice(range(self.num_subsets),self.num_subsets, replace=False)
else:
# at each iteration (not epoch) a subset is randomly selected
self.subset_num = np.random.randint(0, self.num_subsets)
elif self.sampling_method=="sequential":
if self.subset_num + 1 >= self.num_subsets:
self.subset_num = 0
else:
self.subset_num += 1
else:
raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))
return self.subset_num and is used as generator = SubsetGenerator(self.n_subsets)
self.F_SG = SGFunction(self.fi_cil, generator) and the def next_subset(self):
# Select the next subset using the subset_generator
self.subset_num = self.subset_generator.next_subset()
self.subsets_used.append(self.subset_num) |
I think this is nice. What you wrote is not a However, it could be made even nicer if you'd be able to make it a Python generator. close to the C++ "distributions". Usage would be generator = make_subset_num_generator(6)
self.F_SG = SGFunction(self.fi_cil, generator)
...def next_subset(self):
self.subset_num = next(self.subset_generator) I had hoped to that the Python random number generator classes would follow this paradigm, such that we could just as well have said
but it seems that the NumPy people don't like this (or are thinking mostly in terms of returning arrays, which isn't needed here). Supposing you like this design, I'm not so sure how to implement it... Possibly something like this def make_subset_num_generator(self, num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):
# code from __init__
self.num_subsets = num_subsets
etc
while(True):
# code from next_subset
if self.replacement is False:
etc
yield subset_num It would probably more efficient to put the Note that this is no longer a class, so I've changed naming convention (using the one from STIR/SIRF, not sure what CIL uses for functions). |
No problem to make it function/generator, something like def subset_num_generator(num_subsets, sampling_method = "random", replacement = True, shuffle ="random"):
num_subsets = num_subsets
sampling_method = sampling_method
replacement = replacement
shuffle = shuffle
subset_num = -1
index = 0
if replacement is False:
# create a list of subsets without replacement, first permutation
list_of_subsets = np.random.choice(range(num_subsets),num_subsets, replace=False)
if shuffle not in ["random","single"]:
raise NotImplementedError("Only {} and {} are implemented for the replacement=False case.".format("random","single"))
while(True):
if sampling_method=="random" :
if replacement is False:
# list of subsets already permuted
subset_num = list_of_subsets[index]
index+=1
if index == num_subsets:
index=0
# For random shuffle, at the end of each epoch, we permute the list again
if shuffle=="random":
list_of_subsets = np.random.choice(range(num_subsets),num_subsets, replace=False)
else:
# at each iteration (not epoch) a subset is randomly selected
subset_num = np.random.randint(0, num_subsets)
elif sampling_method=="sequential":
if subset_num + 1 >= num_subsets:
subset_num = 0
else:
subset_num += 1
else:
raise NotImplementedError("Only {} and {} are implemented at the moment.".format("random","sequential"))
yield subset_num
sng = subset_num_generator(10, replacement=False)
for i in range(20):
if i>=1 and np.mod(i,10)==0:
print("new epoch")
print(next(sng))
else:
print(next(sng)) 4
0
1
6
8
2
9
7
5
3
new epoch
5
9
4
8
7
0
1
3
6
2 @paskino is this ok? Where should we add this function?At the moment, I have created a new script inside |
@KrisThielemans, @paskino, @jakobsj @zeljkozeljko Changes after Hackathon meeting (17/11/2022) and CIL-Dev meeting (18/11/2022):
Example code for the LS problem import numpy as np
from cil.optimisation.algorithms import GD
from cil.optimisation.functions import ApproximateGradientSumFunction, LeastSquares, SGFunction
from cil.optimisation.utilities import FunctionNumberGenerator
from cil.optimisation.operators import MatrixOperator
from cil.framework import VectorData
# create LS problem (Ax = b)
np.random.seed(10)
n = 50
m = 500
n_subsets = 10
Anp = np.random.uniform(0,1, (m, n)).astype('float32')
xnp = np.random.uniform(0,1, (n,)).astype('float32')
bnp = Anp.dot(xnp)
Ai = np.vsplit(Anp, n_subsets)
bi = [bnp[i:i+n] for i in range(0, m, n)]
Aop = MatrixOperator(Anp)
bop = VectorData(bnp)
ig = Aop.domain
x_cil = ig.allocate('random')
fi_cil = []
for i in range(n_subsets):
Ai_cil = MatrixOperator(Ai[i])
bi_cil = VectorData(bi[i])
fi_cil.append(LeastSquares(Ai_cil, bi_cil, c=1.0))
F_SG = SGFunction(fi_cil) # default selection = "random"
# or
# F_SG = SGFunction(fi_cil, selection="random_permutation")
# or
# generator = FunctionNumberGenerator(n_subsets, sampling_method="fixed_permutation")
# F_SG = SGFunction(fi_cil, generator)
step_size = 1./F_SG.L
initial = ig.allocate()
epochs = 200
sgd = GD(initial = initial, objective_function = F_SG, step_size = step_size,
max_iteration = epochs * n_subsets,
update_objective_interval = 10*n_subsets)
sgd.run(verbose=1)
print("Function selected")
k=0
for i in range(0, epochs, 20):
print("Epoch : {}, function used : {}".format(i, F_SG.functions_used[k:n_subsets+k]))
k+=n_subsets
# import cvxpy
# u_cvxpy = cvxpy.Variable(ig.shape[0])
# objective = cvxpy.Minimize( 0.5*cvxpy.sum_squares(Aop.A @ u_cvxpy - bop.array))
# p = cvxpy.Problem(objective)
# p.solve(verbose=True, solver=cvxpy.SCS, eps=1e-4)
# np.testing.assert_allclose(p.value, sgd.objective[-1], atol=1e-1)
# np.testing.assert_allclose(u_cvxpy.value, sgd.solution.array, atol=1e-1)
Iter Max Iter Time/Iter Objective
[s]
0 2000 0.000 9.99876e+04
100 2000 0.006 4.66201e+01
200 2000 0.006 1.37573e+01
300 2000 0.006 4.64783e+00
400 2000 0.006 1.73196e+00
500 2000 0.006 7.08509e-01
600 2000 0.005 2.99555e-01
700 2000 0.005 1.28043e-01
800 2000 0.005 5.58902e-02
900 2000 0.005 2.55412e-02
1000 2000 0.005 1.11664e-02
1100 2000 0.005 5.23994e-03
1200 2000 0.005 2.30367e-03
1300 2000 0.005 1.05134e-03
1400 2000 0.005 4.96243e-04
1500 2000 0.005 2.25485e-04
1600 2000 0.005 1.04333e-04
1700 2000 0.005 4.82236e-05
1800 2000 0.005 2.21038e-05
1900 2000 0.005 1.01833e-05
2000 2000 0.005 4.69173e-06
-------------------------------------------------------
2000 2000 0.005 4.69173e-06
Stop criterion has been reached.
Function selected
Epoch : 0, function used : [4, 0, 9, 6, 0, 1, 5, 1, 5, 5]
Epoch : 20, function used : [1, 3, 6, 4, 6, 3, 8, 5, 6, 8]
Epoch : 40, function used : [5, 4, 8, 9, 3, 1, 8, 5, 6, 0]
Epoch : 60, function used : [3, 4, 3, 6, 1, 6, 0, 9, 5, 8]
Epoch : 80, function used : [6, 5, 6, 3, 2, 0, 9, 1, 9, 7]
Epoch : 100, function used : [1, 0, 8, 6, 3, 0, 6, 3, 5, 2]
Epoch : 120, function used : [5, 9, 7, 7, 2, 0, 8, 8, 3, 7]
Epoch : 140, function used : [9, 3, 6, 0, 7, 7, 1, 1, 8, 8]
Epoch : 160, function used : [8, 7, 8, 0, 4, 7, 5, 9, 1, 5]
Epoch : 180, function used : [2, 2, 3, 1, 0, 0, 0, 0, 5, 5] |
@zeljkozeljko thank you for your quick comments. In my latest commit, I have implemented as you suggested but I changed them again today after the discussion we had with @paskino, @jakobsj. I have not spent time fixing docs. @KrisThielemans , @paskino, @jakobsj @zeljkozeljko before starting reviewing this PR, could we agree on the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly ok to me. I don't know CIL naming conventions. In STIR/SIRF, CamelCase like FunctionNumberGenerator
implies it is a class (which it isn't). That is one for @paskino .
One problem is duplication of doc on the generator in all derived classes, which will be terrible.
Possible solutions:
-
refer to doc of
FunctionNumberGenerator
(or whatever that ends up being called). -
add a member
set_selection_generator
toApproximateGradientSumFunction
. Someone might want to change this over epochs. Doc of all__init__
functions could then refer to the doc of that member (which is only written once). -
give up passing strings in all those constructors and enforce that it has be a generator.
----------- | ||
functions : list(functions) | ||
A list of functions: :code:`[F_{1}, F_{2}, ..., F_{n}]`. Each function is assumed to be smooth function with an implemented :func:`~Function.gradient` method. | ||
selection : :obj:`string`, Default = :code:`random`. It can be :code:`random`, :code:`random_permutation`, :code:`fixed_permutation`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea to allow a string here, but the doc needs to mention the generator option. Also, I recommend referring to the doc of FunctionNumberGenerator
for allowed strings. Otherwise things are going to do desynchronise
sampling : :obj:`string`, Default = :code:`random` | ||
Selection process for each function in the list. It can be :code:`random` or :code:`sequential`. | ||
replacement : :obj:`boolean`. Default = :code:`True` | ||
The same subset can be selected when :code:`replacement=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow avoid duplication as this is incorrect
|
||
.. math:: \partial F_{i}(x) . | ||
|
||
The ith function is selected from the :meth:`~SubsetSumFunction.next_subset` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect
Note | ||
---- | ||
|
||
The :meth:`~SGFunction.gradient` computes the `gradient` of one function from the list :math:`[F_{1},\cdots,F_{n}]`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
times N
|
||
def approximate_gradient(self,subset_num, x, out): | ||
|
||
""" Returns the gradient of the selected function at :code:`x`. The function is selected using the :meth:`~SubsetSumFunction.next_subset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
times N
|
||
initialise_tests() | ||
|
||
class TestApproximateGradientSumFunction(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no test for the string argument
|
||
if replacement is False: | ||
# create a list of functions without replacement, first permutation | ||
list_of_functions = np.random.choice(range(num_functions),num_functions, replace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this variable should be called list_of function_nums
. I'd also prefix with _
to make it "private"
Selection process for each function in the list. It can be :code:`random`, :code:`random_permutation`, :code:`fixed_permutation`. | ||
- :code:`random`: Every function is selected randomly with replacement. | ||
- :code:`random_permutation`: Every function is selected randomly without replacement. After selecting all the functions in the list, i.e., after one epoch, the list is randomly permuted. | ||
- :code:`fixed_permuation`: Every function is selected randomly without replacement and the list of function is permuted only once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why anyone would want to use fixed_permutation
. If anything, they'd want to specify an order. No harm in keeping it of course.
Probably the only 2 fixed orders that make sense are 0,1,2,3... and 0,n/2,n/3,3n/4,... (with the last appropriate when the functions correspond to projections ordered by angle). Maybe you could call these "fixed_sequential" and "fixed_staggered"?
@robbietuk would have code to implement the "staggered" approach
|
||
# Select the next function using the selection:=function_number_generator | ||
self.function_num = next(self.selection) | ||
self.functions_used.append(self.function_num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do a range check and write a nice error message if out of range
Thank you @KrisThielemans for your comments, before addressing your comments and @zeljkozeljko, I think we need to have a final decision for the design. Please have a look #1377. |
You might consider using the new interface of rng = np.random.default_rng(optional_seed)
rints = rng.integers(low=0, high=10, size=3) it would allow passing a seed such that the generated sequence is reproducible. (cannot do that with |
Recording part of an email conversation here on adding capabilities for non-uniform random sampling. @epapoutsellis had a proposal but noted that algorithms might need to ask what those probablities are. I think that leads back to the need of having a class. At that point, a "generator" is probably not useful anymore and we can revert to having an iterator interface def IndexGenerator:
def __init__(lots of arguments here)
def get_probabilities(self)
def uses_replacement(self)
# either this is an iteratable object
def iter(self) # could return a generator as far as I know
# or it is already an iterator
def __iter__(self)
def __next__(self) and we’d have to decide to use a string+extra arguments
where I think the latter design is more easily extendable. |
@KrisThielemans @paskino @jakobsj see below my proposal. I created a base class import numpy as np
import random
class RandomSampling():
def __new__(cls, num_indices, batch_size=1, prob = None, replace = True, shuffle=False, seed = None):
cls.batch_size = batch_size
if cls.batch_size == 1:
return super(RandomSampling, cls).__new__(RandomIndex)
else:
return super(RandomSampling, cls).__new__(RandomBatch)
def __init__(self, num_indices, batch_size=1, prob = None, replace = True, shuffle=False, seed = None ):
self.num_indices = num_indices
self.batch_size = batch_size
self.equal_size_batches = self.num_indices%self.batch_size==0
if self.equal_size_batches:
self.num_batches = self.num_indices//self.batch_size
else:
self.num_batches = (self.num_indices//self.batch_size)+1
self.prob = prob
self.replace = replace
self.shuffle = shuffle
self.indices_used = []
self.index = 0
np.random.seed(seed)
if self.replace is False:
self.list_of_indices = np.random.choice(num_indices, size=self.num_indices, p=prob, replace=False)
else:
if shuffle is True and self.batch_size==1:
raise ValueError("Shuffle is used only with replace=False")
self.list_of_indices = np.random.choice(num_indices, size=self.num_indices, p=prob, replace=True)
if self.batch_size>1:
self.partition_list = [self.list_of_indices[i:i + self.batch_size] for i in range(0, self.num_indices, self.batch_size)]
def show_epochs(self, epochs):
total_its = epochs * self.num_indices
for _ in range(total_its):
next(self)
if self.batch_size==1:
k = 0
for i in range(epochs):
print(" Epoch : {}, indices used : {} ".format(i, self.indices_used[k:k+self.num_indices]))
k+=self.num_indices
print("")
else:
k=0
for i in range(epochs):
print(" Epoch : {}, batchs used : {} ".format(i, self.indices_used[k:k+num_batches]))
k += num_batches
@staticmethod
def uniform(num_indices, batch_size = 1, seed=None):
return RandomSampling(num_indices, batch_size=batch_size, prob=None, replace = True, shuffle = False, seed = seed)
@staticmethod
def non_uniform(num_indices, prob, batch_size = 1, seed=None):
return RandomSampling(num_indices, batch_size=batch_size, replace = True, prob=prob, shuffle = False, seed = seed)
@staticmethod
def uniform_no_replacement(num_indices, batch_size = 1, shuffle=False, seed=None):
return RandomSampling(num_indices, batch_size=batch_size, prob=None, replace = False, shuffle = shuffle, seed = seed)
@staticmethod
def non_uniform_no_replacement(num_indices, prob, batch_size = 1, shuffle=False, seed=None):
return RandomSampling(num_indices, batch_size=batch_size, prob=prob, replace = False, shuffle = shuffle, seed = seed)
@staticmethod
def single_shuffle(num_indices, batch_size = 1, seed=None):
return RandomSampling(num_indices, batch_size = batch_size, replace = False, shuffle = False, seed = seed)
@staticmethod
def random_shuffle(num_indices, batch_size = 1, seed=None):
return RandomSampling(num_indices, batch_size = batch_size, replace = False, shuffle = True, seed = seed)
class RandomBatch(RandomSampling):
def __next__(self):
tmp = list(self.partition_list[self.index])
self.indices_used.append(tmp)
self.index+=1
if self.index==len(self.partition_list):
self.index=0
if self.shuffle is True:
self.list_of_indices = np.random.choice(self.num_indices, size=self.num_indices, p=self.prob, replace=self.replace)
self.partition_list = [self.list_of_indices[i:i + self.batch_size] for i in range(0, self.num_indices, self.batch_size)]
return tmp
class RandomIndex(RandomSampling):
def __next__(self):
if self.replace is False:
index_num = self.list_of_indices[self.index]
self.index+=1
if self.index == self.num_indices:
self.index = 0
if self.shuffle is True:
self.list_of_indices = np.random.choice(self.num_indices, size=self.num_indices, p=self.prob, replace=False)
else:
index_num = np.random.choice(self.num_indices, size=1, p=self.prob, replace=True).item()
self.indices_used.append(index_num)
return index_num ExamplesOur users can do, in the case of rs = RandomSampling.uniform(10)
rs.show_epochs(epochs=3) Epoch : 0, indices used : [8, 1, 4, 8, 5, 9, 1, 9, 0, 3]
Epoch : 1, indices used : [3, 4, 9, 2, 4, 0, 7, 9, 4, 1]
Epoch : 2, indices used : [9, 6, 6, 8, 9, 6, 2, 0, 1, 5]
rs = RandomSampling.random_shuffle(10, seed=100)
rs.show_epochs(epochs=3) Epoch : 0, indices used : [7, 6, 1, 5, 4, 2, 0, 3, 9, 8]
Epoch : 1, indices used : [7, 8, 5, 3, 4, 6, 0, 1, 9, 2]
Epoch : 2, indices used : [9, 7, 1, 8, 5, 4, 3, 2, 6, 0] tmp = [random.random() for i in range(10)]
tmp_sum = sum(tmp)
prob = [i/tmp_sum for i in tmp]
rs = RandomSampling.non_uniform(10, prob=prob)
rs.show_epochs(epochs=3) Epoch : 0, indices used : [2, 5, 7, 7, 2, 1, 6, 6, 2, 7]
Epoch : 1, indices used : [7, 4, 9, 9, 7, 1, 6, 1, 4, 9]
Epoch : 2, indices used : [6, 5, 6, 2, 5, 2, 8, 2, 2, 2] And for the case where rs = RandomSampling.uniform(10, batch_size=5)
rs.show_epochs(epochs=3) Epoch : 0, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]]
Epoch : 1, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]]
Epoch : 2, batchs used : [[5, 7, 9, 5, 9], [1, 4, 6, 9, 6]] rs = RandomSampling.single_shuffle(10, batch_size=2, seed=100)
rs.show_epochs(epochs=3) Epoch : 0, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]]
Epoch : 1, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]]
Epoch : 2, batchs used : [[7, 6], [1, 5], [4, 2], [0, 3], [9, 8]] Comparison
|
Superseded by #1550 |
Describe your changes
SubsetSumFunction
based on the Hackathon-000-Stochastic-Algorithms script and removes theAverageSumFunction
.SGDFunction
based on the Hackathon-000-Stochastic-Algorithms script. I include theSGDFunction
in this PR because is the easiest stochastic function class.SubsetSumFunction
The
SubsetSumFunction
is defined aswhere$n$ is the number of subsets and $f_{i}$ are smooth functions.
Example
Note
At the moment, there are 2
sampling
methods:random
andsequential
. For therandom
case, we have the following.sampling=random
,replacement=True
: means thats thenext_subset
method can select the samesubset_num
sampling=random
,replacement=False
: means thats thenext_subset
method selectssubset_num
uniquely.Example
random
, replacement=True: [3,1,2], [4,2,6]random
, replacement=False: [1,3,2], [6,4,5]sequential
, [1,2,3], [4,5,6].SGDFunction
Example
GD vs SGD in CIL
We can solve the LeastSquares minimisation problem $$\min_{x}\|Ax - b\|^{2}$$GD
:objective_function = LeastSquares
GD
:objective_function = SGDFunction(list of LeastSquares)
Proximal Gradient Algorithms in CIL
For non-smooth objectives we can solve$$\min_{x>0}||Ax - b||^{2}$$
ISTA/FISTA
:f = LeastSquares
,g=IndicatorBox(lower=0)
ISTA/FISTA
:f = SGDFunction(list of LeastSquares)
,g=IndicatorBox(lower=0)
Describe any testing you have performed
SubsetSumFunction
SGDFunction
[wip]Demos
Link relevant issues
Close #1341
Close #1343
From the Hackathon-000-Stochastic-Algorithms repository
Close 11
Close 10
Close 9
Checklist when you are ready to request a review
Contribution Notes
Please read and adhere to the developer guide and local patterns and conventions.