Skip to content

Commit

Permalink
ApproximateGradientSumFunction class and example SGFunction for Stoch…
Browse files Browse the repository at this point in the history
…astic Gradient Descent (#1550)

Signed-off-by: Margaret Duff <43645617+MargaretDuff@users.noreply.github.com>
Co-authored-by: epapoutsellis <epapoutsellis@gmail.com>
Co-authored-by: gfardell <gemma.fardell@stfc.ac.uk>
Co-authored-by: Edoardo Pasca <edo.paskino@gmail.com>
  • Loading branch information
4 people authored Mar 26, 2024
1 parent 755801d commit 040f214
Show file tree
Hide file tree
Showing 8 changed files with 663 additions and 3 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
- Improved import error/warning messages
- New adjoint operator
- Bug fix for complex matrix adjoint
- Added the ApproximateGradientSumFunction and SGFunction to allow for stochastic gradient algorithms to be created using functions with an approximate gradient and deterministic algorithms


* 23.1.0
Expand Down
8 changes: 7 additions & 1 deletion NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Institutions Key:
9 - Swansea University
10 - University of Warwick
11 - University of Helsinki
12 - Australian e-Health Research, Australia
13 - KU Leuven

CIL Developers in date order:
Edoardo Pasca (2017 – present) - 1
Expand All @@ -49,14 +51,18 @@ Srikanth Nagella (2017-2018) - 1
Daniil Kazantsev (2018) - 3
Ryan Warr (2019) - 3
Tomas Kulhanek (2019) - 1
Claire Delplancke (2019 - 2022) - 7
Claire Delplancke (2019 - 2022) - 7
Matthias Ehrhardt (2019 - 2023) - 7
Richard Brown (2020-2021) - 5
Sam Tygier (2022) - 1
Andrew Sharits (2022) - 8
Kyle Pidgeon (2023) - 1
Letizia Protopapa (2023) - 1
Tommi Heikkilä (2023) - 11
Ashley Gillman (2024) -12
Zeljko Kerata (2024) - 5
Evgueni Ovtchinnikov (2024) -1
Georg Schramm (2024) - 13

CIL Advisory Board:
Llion Evans - 9
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Copyright 2024 United Kingdom Research and Innovation
# Copyright 2024 The University of Manchester
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Authors:
# - CIL Developers, listed at: https://github.com/TomographicImaging/CIL/blob/master/NOTICE.txt
# - Daniel Deidda (National Physical Laboratory, UK)
# - Claire Delplancke (Electricite de France, Research and Development)
# - Ashley Gillman (Australian e-Health Res. Ctr., CSIRO, Brisbane, Queensland, Australia)
# - Zeljko Kerata (Department of Computer Science, University College London, UK)
# - Evgueni Ovtchinnikov (STFC - UKRI)
# - Georg Schramm (Department of Imaging and Pathology, Division of Nuclear Medicine, KU Leuven, Leuven, Belgium)



from cil.optimisation.functions import SumFunction
from cil.optimisation.utilities import Sampler
import numbers
from abc import ABC, abstractmethod
import numpy as np


class ApproximateGradientSumFunction(SumFunction, ABC):
r"""ApproximateGradientSumFunction represents the following sum
.. math:: \sum_{i=0}^{n-1} f_{i} = (f_{0} + f_{2} + ... + f_{n-1})
where there are :math:`n` functions. This function class has two ways of calling gradient
- `full_gradient` calculates the gradient of the sum :math:`\sum_{i=0}^{n-1} \nabla f_{i}`
- `gradient` calls an `approximate_gradient` function which may be less computationally expensive to calculate than the full gradient
This class is an abstract class.
Parameters
-----------
functions : `list` of functions
A list of functions: :math:`[f_{0}, f_{2}, ..., f_{n-1}]`. Each function is assumed to be smooth with an implemented :func:`~Function.gradient` method. All functions must have the same domain. The number of functions (equivalently the length of the list) must be strictly greater than 1.
sampler: An instance of a CIL Sampler class ( :meth:`~optimisation.utilities.sampler`) or of another class which has a :code:`__next__` function implemented to output integers in :math:`{0,...,n-1}`.
This sampler is called each time :code:`gradient` is called and sets the internal :code:`function_num` passed to the :code:`approximate_gradient` function. Default is :code:`Sampler.random_with_replacement(len(functions))`.
Note
-----
We ensure that the approximate gradient is of a similar order of magnitude to the full gradient calculation. For example, in the :code:`SGFunction` we approximate the full gradient by :math:`n\nabla f_i` for an index :math:`i` given by the sampler.
The multiplication by :math:`n` is a choice to more easily allow comparisons between stochastic and non-stochastic methods and between stochastic methods with varying numbers of subsets.
Note
-----
Each time :code:`gradient` is called the class keeps track of which functions have been used to calculate the gradient. This may be useful for debugging or plotting after using this function in an iterative algorithm.
- The property :code:`data_passes_indices` is a list of lists holding the indices of the functions that are processed in each call of `gradient`. This list is updated each time `gradient` is called by appending a list of the indices of the functions used to calculate the gradient.
- The property :code:`data_passes` is a list of floats that holds the amount of data that has been processed up until each call of `gradient`. This list is updated each time `gradient` is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Warning: if your functions do not contain an equal `amount` of data, for example your data was not partitioned into equal batches, then you must first use the `set_data_partition_weights" function for this to be accurate.
Note
----
The :meth:`~ApproximateGradientSumFunction.gradient` returns the approximate gradient depending on an index provided by the :code:`sampler` method.
Example
-------
This class is an abstract base class, so we give an example using the SGFunction child class.
Consider the objective is to minimise:
.. math:: \sum_{i=0}^{n-1} f_{i}(x) = \sum_{i=0}^{n-1}\|A_{i} x - b_{i}\|^{2}
>>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets))
>>> f = ApproximateGradientSumFunction(list_of_functions)
>>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets))
>>> sampler = Sampler.sequential(len(list_of_functions))
>>> f = SGFunction(list_of_functions, sampler=sampler)
>>> f.full_gradient(x)
This will return :math:`\sum_{i=0}^{n-1} \nabla f_{i}(x)`
>>> f.gradient(x)
As per the approximate gradient implementation in the SGFunction this will return :math:`\nabla f_{0}`. The choice of the `0` index is because we chose a `sequential` sampler and this is the first time we called `gradient`.
>>> f.gradient(x)
This will return :math:`\nabla f_{1}` because we chose a `sequential` sampler and this is the second time we called `gradient`.
"""

def __init__(self, functions, sampler=None):

if sampler is None:
sampler = Sampler.random_with_replacement(len(functions))

if not isinstance(functions, list):
raise TypeError("Input to functions should be a list of functions")
if not hasattr(sampler, "next"):
raise ValueError('The provided sampler must have a `next` method')

self.sampler = sampler

self._partition_weights = [1 / len(functions)] * len(functions)

self._data_passes_indices = []

super(ApproximateGradientSumFunction, self).__init__(*functions)

def __call__(self, x):
r"""Returns the value of the sum of functions at :math:`x`.
.. math:: (f_{0} + f_{1} + ... + f_{n-1})(x) = f_{0}(x) + f_{1}(x) + ... + f_{n-1}(x)
Parameters
----------
x : DataContainer
--------
float
the value of the SumFunction at x
"""
return super(ApproximateGradientSumFunction, self).__call__(x)

def full_gradient(self, x, out=None):
r"""Returns the value of the full gradient of the sum of functions at :math:`x`.
.. math:: \nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)
Parameters
----------
x : DataContainer
out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
Returns
--------
DataContainer
The value of the gradient of the sum function at x or nothing if `out`
"""

return super(ApproximateGradientSumFunction, self).gradient(x, out=out)

@abstractmethod
def approximate_gradient(self, x, function_num, out=None):
""" Returns the approximate gradient at a given point :code:`x` given a `function_number` in {0,...,len(functions)-1}.
Parameters
----------
x : DataContainer
out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
function_num: `int`
Between 0 and the number of functions in the list
Returns
--------
DataContainer
the value of the approximate gradient of the sum function at :code:`x` given a `function_number` in {0,...,len(functions)-1}
"""
pass

def gradient(self, x, out=None):
""" Selects a random function using the `sampler` and then calls the approximate gradient at :code:`x`
Parameters
----------
x : DataContainer
out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
Returns
--------
DataContainer
the value of the approximate gradient of the sum function at :code:`x`
"""

self.function_num = self.sampler.next()

self._update_data_passes_indices([self.function_num])



return self.approximate_gradient(x, self.function_num, out=out)


def _update_data_passes_indices(self, indices):
""" Internal function that updates the list of lists containing the function indices used to calculate the approximate gradient.
Parameters
----------
indices: list
List of indices used to calculate the approximate gradient in a given iteration
"""
self._data_passes_indices.append(indices)

def set_data_partition_weights(self, weights):
""" Setter for the partition weights used to calculate the data passes
Parameters
----------
weights: list of positive floats that sum to one.
The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.
"""
if len(weights) != len(self.functions):
raise ValueError(
'The provided weights must be a list the same length as the number of functions')

if abs(sum(weights) - 1) > 1e-6:
raise ValueError('The provided weights must sum to one')

if any(np.array(weights) < 0):
raise ValueError(
'The provided weights must be greater than or equal to zero')

self._partition_weights = weights

@property
def data_passes_indices(self):
""" The property :code:`data_passes_indices` is a list of lists holding the indices of the functions that are processed in each call of `gradient`. This list is updated each time `gradient` is called by appending a list of the indices of the functions used to calculate the gradient. """
return self._data_passes_indices

@property
def data_passes(self):
""" The property :code:`data_passes` is a list of floats that holds the amount of data that has been processed up until each call of `gradient`. This list is updated each time `gradient` is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Warning: if your functions do not contain an equal `amount` of data, for example your data was not partitioned into equal batches, then you must first use the `set_data_partition_weights" function for this to be accurate. """
data_passes = []
for el in self._data_passes_indices:
try:
data_passes.append(data_passes[-1])
except IndexError:
data_passes.append(0)
for i in el:
data_passes[-1] += self._partition_weights[i]
return data_passes
3 changes: 3 additions & 0 deletions Wrappers/Python/cil/optimisation/functions/Function.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,9 @@ def __add__(self, other):
else:
return super(SumFunction, self).__add__(other)

@property
def num_functions(self):
return len(self.functions)

class ScaledFunction(Function):

Expand Down
82 changes: 82 additions & 0 deletions Wrappers/Python/cil/optimisation/functions/SGFunction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Copyright 2024 United Kingdom Research and Innovation
# Copyright 2024 The University of Manchester
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Authors:
# - CIL Developers, listed at: https://github.com/TomographicImaging/CIL/blob/master/NOTICE.txt
# - Daniel Deidda (National Physical Laboratory, UK)
# - Claire Delplancke (Electricite de France, Research and Development)
# - Ashley Gillman (Australian e-Health Res. Ctr., CSIRO, Brisbane, Queensland, Australia)
# - Zeljko Kerata (Department of Computer Science, University College London, UK)
# - Evgueni Ovtchinnikov (STFC - UKRI)
# - Georg Schramm (Department of Imaging and Pathology, Division of Nuclear Medicine, KU Leuven, Leuven, Belgium)

from .ApproximateGradientSumFunction import ApproximateGradientSumFunction
from .Function import SumFunction

class SGFunction(ApproximateGradientSumFunction):

r"""
Stochastic gradient function, a child class of :code:`ApproximateGradientSumFunction`, which defines from a list of functions, :math:`{f_0,...,f_{n-1}}` a `SumFunction`, :math:`f_0+...+f_{n-1}` where each time the `gradient` is called, the :code:`sampler` provides an index, :math:`i \in {0,...,n-1}`
and the :code:`gradient` method returns the approximate gradient :math:`n \nabla_x f_i(x)`. This can be used with the :code:`cil.optimisation.algorithms` algorithm :code:`GD` to give a stochastic gradient descent algorithm.
Parameters
-----------
functions : `list` of functions
A list of functions: :code:`[f_{0}, f_{1}, ..., f_{n-1}]`. Each function is assumed to be smooth with an implemented :func:`~Function.gradient` method. Although CIL does not define a domain of a :code:`Function`, all functions are supposed to have the same domain. The number of functions must be strictly greater than 1.
sampler: An instance of a CIL Sampler class ( :meth:`~optimisation.utilities.sampler`) or of another class which has a :code:`__next__` function implemented to output integers in {0,...,n-1}.
This sampler is called each time gradient is called and sets the internal :code:`function_num` passed to the :code:`approximate_gradient` function. Default is :code:`Sampler.random_with_replacement(len(functions))`.
"""

def __init__(self, functions, sampler=None):
super(SGFunction, self).__init__(functions, sampler)


def approximate_gradient(self, x, function_num, out=None):

r""" Returns the gradient of the function at index `function_num` at :code:`x`.
Parameters
----------
x : DataContainer
out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
function_num: `int`
Between 0 and the number of functions in the list
Returns
--------
DataContainer
the value of the approximate gradient of the sum function at :code:`x` given a `function_number` in {0,...,len(functions)-1}
"""
if self.function_num >= self.num_functions or self.function_num<0 :
raise IndexError(
'The sampler has outputted an index larger than the number of functions to sample from. Please ensure your sampler samples from {0,1,...,len(functions)-1} only.')



# compute gradient of the function indexed by function_num
if out is None:
out = self.functions[function_num].gradient(x)
else:
self.functions[function_num].gradient(x, out = out)

# scale wrt number of functions
out*=self.num_functions

return out






3 changes: 3 additions & 0 deletions Wrappers/Python/cil/optimisation/functions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,6 @@
from .KullbackLeibler import KullbackLeibler
from .Rosenbrock import Rosenbrock
from .TotalVariation import TotalVariation
from .ApproximateGradientSumFunction import ApproximateGradientSumFunction
from .SGFunction import SGFunction

Loading

0 comments on commit 040f214

Please sign in to comment.