ApproximateGradientSumFunction class and example SGFunction for Stoch…

…astic Gradient Descent (#1550) Signed-off-by: Margaret Duff <43645617+MargaretDuff@users.noreply.github.com> Co-authored-by: epapoutsellis <epapoutsellis@gmail.com> Co-authored-by: gfardell <gemma.fardell@stfc.ac.uk> Co-authored-by: Edoardo Pasca <edo.paskino@gmail.com>
TomographicImaging · Mar 26, 2024 · 040f214 · 040f214
1 parent 755801d
commit 040f214
Show file tree

Hide file tree

Showing 8 changed files with 663 additions and 3 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -40,6 +40,7 @@
   - Improved import error/warning messages
   - New adjoint operator
   - Bug fix for complex matrix adjoint
+  - Added the ApproximateGradientSumFunction and SGFunction to allow for stochastic gradient algorithms to be created using functions with an approximate gradient and deterministic algorithms
 
 
 * 23.1.0

diff --git a/NOTICE.txt b/NOTICE.txt
@@ -32,6 +32,8 @@ Institutions Key:
 9 - Swansea University
 10 - University of Warwick
 11 - University of Helsinki
+12 - Australian e-Health Research, Australia
+13 - KU Leuven 
 
 CIL Developers in date order:
 Edoardo Pasca (2017 – present) - 1
@@ -49,14 +51,18 @@ Srikanth Nagella (2017-2018) - 1
 Daniil Kazantsev (2018) - 3
 Ryan Warr (2019) - 3
 Tomas Kulhanek (2019) - 1
-Claire Delplancke (2019 - 2022) - 7
+Claire Delplancke (2019 - 2022) - 7 
 Matthias Ehrhardt (2019 - 2023) - 7
 Richard Brown (2020-2021) - 5
 Sam Tygier (2022) - 1
 Andrew Sharits (2022) - 8
 Kyle Pidgeon (2023) - 1
 Letizia Protopapa (2023) - 1
 Tommi Heikkilä (2023) - 11
+Ashley Gillman (2024) -12 
+Zeljko Kerata (2024) - 5
+Evgueni Ovtchinnikov (2024) -1 
+Georg Schramm (2024) - 13
 
 CIL Advisory Board:
 Llion Evans - 9

diff --git a/Wrappers/Python/cil/optimisation/functions/ApproximateGradientSumFunction.py b/Wrappers/Python/cil/optimisation/functions/ApproximateGradientSumFunction.py
@@ -0,0 +1,239 @@
+#  Copyright 2024 United Kingdom Research and Innovation
+#  Copyright 2024 The University of Manchester
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# Authors:
+# - CIL Developers, listed at: https://github.com/TomographicImaging/CIL/blob/master/NOTICE.txt
+# - Daniel Deidda (National Physical Laboratory, UK)
+# - Claire Delplancke (Electricite de France, Research and Development)
+# - Ashley Gillman (Australian e-Health Res. Ctr., CSIRO, Brisbane, Queensland, Australia)
+# - Zeljko Kerata (Department of Computer Science, University College London, UK)
+# - Evgueni Ovtchinnikov (STFC - UKRI)
+# - Georg Schramm (Department of Imaging and Pathology, Division of Nuclear Medicine, KU Leuven, Leuven, Belgium)
+
+
+
+from cil.optimisation.functions import SumFunction
+from cil.optimisation.utilities import Sampler
+import numbers
+from abc import ABC, abstractmethod
+import numpy as np
+
+
+class ApproximateGradientSumFunction(SumFunction, ABC):
+    r"""ApproximateGradientSumFunction represents the following sum 
+
+    .. math:: \sum_{i=0}^{n-1} f_{i} = (f_{0} + f_{2} + ... + f_{n-1})
+
+    where there are :math:`n` functions. This function class has two ways of calling gradient
+    
+        - `full_gradient` calculates the gradient of the sum :math:`\sum_{i=0}^{n-1} \nabla f_{i}`
+        - `gradient` calls an `approximate_gradient` function which may be less computationally expensive to calculate than the full gradient
+    
+    
+
+    This class is an abstract class.
+
+    Parameters
+    -----------
+    functions : `list`  of functions
+        A list of functions: :math:`[f_{0}, f_{2}, ..., f_{n-1}]`. Each function is assumed to be smooth with an implemented :func:`~Function.gradient` method. All functions must have the same domain. The number of functions (equivalently the length of the list) must be strictly greater than 1. 
+    sampler: An instance of a CIL Sampler class ( :meth:`~optimisation.utilities.sampler`) or of another class which has a :code:`__next__` function implemented to output integers in :math:`{0,...,n-1}`.
+        This sampler is called each time :code:`gradient` is called and sets the internal :code:`function_num` passed to the :code:`approximate_gradient` function.  Default is :code:`Sampler.random_with_replacement(len(functions))`. 
+
+    Note
+    -----
+    We ensure that the approximate gradient is of a similar order of magnitude to the full gradient calculation. For example, in the :code:`SGFunction` we approximate the full gradient by :math:`n\nabla f_i` for an index :math:`i` given by the sampler. 
+    The multiplication by :math:`n` is a choice to more easily allow comparisons between stochastic and non-stochastic methods and between stochastic methods with varying numbers of subsets. 
+
+    Note
+    -----
+    Each time :code:`gradient` is called the class keeps track of which functions have been used to calculate the gradient. This may be useful for debugging or plotting after using this function in an iterative algorithm.  
+    
+        - The property :code:`data_passes_indices` is a list of lists holding the indices of the functions that are processed in each call of `gradient`. This list is updated each time `gradient` is called by appending a list of the indices of the functions used to calculate the gradient. 
+        - The property :code:`data_passes` is a list of floats that holds the amount of data that has been processed up until each call of `gradient`. This list is updated each time `gradient` is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Warning: if your functions do not contain an equal `amount` of data, for example your data was not partitioned into equal batches, then you must first use the `set_data_partition_weights" function for this to be accurate.    
+
+
+
+    Note
+    ----
+    The :meth:`~ApproximateGradientSumFunction.gradient` returns the approximate gradient depending on an index provided by the  :code:`sampler` method. 
+
+    Example
+    -------
+    This class is an abstract base class, so we give an example using the SGFunction child class. 
+    
+    Consider the objective is to minimise: 
+
+    .. math:: \sum_{i=0}^{n-1} f_{i}(x) = \sum_{i=0}^{n-1}\|A_{i} x - b_{i}\|^{2}
+
+    >>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets))   
+    >>> f = ApproximateGradientSumFunction(list_of_functions) 
+
+    >>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets)) 
+    >>> sampler = Sampler.sequential(len(list_of_functions))
+    >>> f = SGFunction(list_of_functions, sampler=sampler)   
+    >>> f.full_gradient(x)
+    This will return :math:`\sum_{i=0}^{n-1} \nabla f_{i}(x)`
+    >>> f.gradient(x)
+    As per the approximate gradient implementation in the SGFunction this will return :math:`\nabla f_{0}`. The choice of the `0` index is because we chose a `sequential` sampler and this is the first time we called `gradient`. 
+    >>> f.gradient(x)
+    This will return :math:`\nabla f_{1}` because we chose  a `sequential` sampler and this is the second time we called `gradient`. 
+
+
+
+    """
+
+    def __init__(self, functions, sampler=None):
+
+        if sampler is None:
+            sampler = Sampler.random_with_replacement(len(functions))
+
+        if not isinstance(functions, list):
+            raise TypeError("Input to functions should be a list of functions")
+        if not hasattr(sampler, "next"):
+            raise ValueError('The provided sampler must have a `next` method')
+
+        self.sampler = sampler
+
+        self._partition_weights = [1 / len(functions)] * len(functions)
+
+        self._data_passes_indices = []
+
+        super(ApproximateGradientSumFunction, self).__init__(*functions)
+
+    def __call__(self, x):
+        r"""Returns the value of the sum of functions at :math:`x`.
+
+        .. math:: (f_{0} + f_{1} + ... + f_{n-1})(x) = f_{0}(x) + f_{1}(x) + ... + f_{n-1}(x)
+
+        Parameters
+        ----------
+        x : DataContainer
+
+        --------
+        float
+            the value of the SumFunction at x
+
+
+        """
+        return super(ApproximateGradientSumFunction, self).__call__(x)
+
+    def full_gradient(self, x, out=None):
+        r"""Returns the value of the  full gradient of the sum of functions at :math:`x`.
+
+        .. math:: \nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)
+
+        Parameters
+        ----------
+        x : DataContainer
+        out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
+
+        Returns
+        --------
+        DataContainer
+            The value of the gradient of the sum function at x or nothing if `out`  
+        """
+
+        return super(ApproximateGradientSumFunction, self).gradient(x, out=out)
+
+    @abstractmethod
+    def approximate_gradient(self, x, function_num,   out=None):
+        """ Returns the approximate gradient at a given point :code:`x` given a `function_number` in {0,...,len(functions)-1}.
+
+        Parameters
+        ----------
+        x : DataContainer
+        out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
+        function_num: `int` 
+            Between 0 and the number of functions in the list  
+        Returns
+        --------
+        DataContainer
+            the value of the approximate gradient of the sum function at :code:`x` given a `function_number` in {0,...,len(functions)-1} 
+        """
+        pass
+
+    def gradient(self, x, out=None):
+        """ Selects a random function using the `sampler` and then calls the approximate gradient at :code:`x`
+
+        Parameters
+        ----------
+        x : DataContainer
+        out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
+
+        Returns
+        --------
+        DataContainer
+            the value of the approximate gradient of the sum function at :code:`x`   
+        """
+
+        self.function_num = self.sampler.next()
+
+        self._update_data_passes_indices([self.function_num])
+
+
+
+        return self.approximate_gradient(x, self.function_num, out=out)
+
+
+    def _update_data_passes_indices(self, indices):
+        """ Internal function that updates the list of lists containing the function indices used to calculate the approximate gradient. 
+        Parameters
+        ----------
+        indices: list
+            List of indices used to calculate the approximate gradient in a given iteration
+
+        """
+        self._data_passes_indices.append(indices)
+
+    def set_data_partition_weights(self, weights):
+        """ Setter for the partition weights used to calculate the data passes  
+
+        Parameters
+        ----------
+        weights: list of positive floats that sum to one. 
+            The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into. 
+
+        """
+        if len(weights) != len(self.functions):
+            raise ValueError(
+                'The provided weights must be a list the same length as the number of functions')
+
+        if abs(sum(weights) - 1) > 1e-6:
+            raise ValueError('The provided weights must sum to one')
+
+        if any(np.array(weights) < 0):
+            raise ValueError(
+                'The provided weights must be greater than or equal to zero')
+
+        self._partition_weights = weights
+
+    @property
+    def data_passes_indices(self):
+        """ The property :code:`data_passes_indices` is a list of lists holding the indices of the functions that are processed in each call of `gradient`. This list is updated each time `gradient` is called by appending a list of the indices of the functions used to calculate the gradient.  """
+        return self._data_passes_indices
+
+    @property
+    def data_passes(self):
+        """ The property :code:`data_passes` is a list of floats that holds the amount of data that has been processed up until each call of `gradient`. This list is updated each time `gradient` is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Warning: if your functions do not contain an equal `amount` of data, for example your data was not partitioned into equal batches, then you must first use the `set_data_partition_weights" function for this to be accurate.   """
+        data_passes = []
+        for el in self._data_passes_indices:
+            try:
+                data_passes.append(data_passes[-1])
+            except IndexError:
+                data_passes.append(0)
+            for i in el:
+                data_passes[-1] += self._partition_weights[i]
+        return data_passes
diff --git a/Wrappers/Python/cil/optimisation/functions/Function.py b/Wrappers/Python/cil/optimisation/functions/Function.py
@@ -377,6 +377,9 @@ def __add__(self, other):
         else:
             return super(SumFunction, self).__add__(other)
 
+    @property
+    def num_functions(self):
+        return len(self.functions)
 
 class ScaledFunction(Function):
 

diff --git a/Wrappers/Python/cil/optimisation/functions/SGFunction.py b/Wrappers/Python/cil/optimisation/functions/SGFunction.py
@@ -0,0 +1,82 @@
+#  Copyright 2024 United Kingdom Research and Innovation
+#  Copyright 2024 The University of Manchester
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# Authors:
+# - CIL Developers, listed at: https://github.com/TomographicImaging/CIL/blob/master/NOTICE.txt
+# - Daniel Deidda (National Physical Laboratory, UK)
+# - Claire Delplancke (Electricite de France, Research and Development)
+# - Ashley Gillman (Australian e-Health Res. Ctr., CSIRO, Brisbane, Queensland, Australia)
+# - Zeljko Kerata (Department of Computer Science, University College London, UK)
+# - Evgueni Ovtchinnikov (STFC - UKRI)
+# - Georg Schramm (Department of Imaging and Pathology, Division of Nuclear Medicine, KU Leuven, Leuven, Belgium)
+
+from .ApproximateGradientSumFunction import ApproximateGradientSumFunction
+from .Function import SumFunction
+
+class SGFunction(ApproximateGradientSumFunction):
+
+    r"""
+    Stochastic gradient function, a child class of :code:`ApproximateGradientSumFunction`, which defines from a list of functions, :math:`{f_0,...,f_{n-1}}` a `SumFunction`, :math:`f_0+...+f_{n-1}` where each time the `gradient` is called, the :code:`sampler` provides an index, :math:`i \in {0,...,n-1}` 
+    and the :code:`gradient` method returns the approximate gradient :math:`n \nabla_x f_i(x)`. This can be used with the :code:`cil.optimisation.algorithms` algorithm :code:`GD` to give a stochastic gradient descent algorithm. 
+   
+    Parameters
+    -----------
+    functions : `list`  of functions
+        A list of functions: :code:`[f_{0}, f_{1}, ..., f_{n-1}]`. Each function is assumed to be smooth with an implemented :func:`~Function.gradient` method. Although CIL does not define a domain of a :code:`Function`, all functions are supposed to have the same domain. The number of functions must be strictly greater than 1. 
+    sampler: An instance of a CIL Sampler class ( :meth:`~optimisation.utilities.sampler`) or of another class which has a :code:`__next__` function implemented to output integers in {0,...,n-1}.
+        This sampler is called each time gradient is called and  sets the internal :code:`function_num` passed to the :code:`approximate_gradient` function.  Default is :code:`Sampler.random_with_replacement(len(functions))`. 
+    """
+
+    def __init__(self, functions, sampler=None):
+        super(SGFunction, self).__init__(functions, sampler)    
+
+
+    def approximate_gradient(self, x, function_num,  out=None):
+
+        r""" Returns the gradient of the function at index `function_num` at :code:`x`. 
+        
+        Parameters
+        ----------
+        x : DataContainer
+        out: return DataContainer, if `None` a new DataContainer is returned, default `None`.
+        function_num: `int` 
+            Between 0 and the number of functions in the list  
+        Returns
+        --------
+        DataContainer
+            the value of the approximate gradient of the sum function at :code:`x` given a `function_number` in {0,...,len(functions)-1} 
+        """ 
+        if self.function_num >= self.num_functions or self.function_num<0 :
+            raise IndexError(
+                'The sampler has outputted an index larger than the number of functions to sample from. Please ensure your sampler samples from {0,1,...,len(functions)-1} only.')
+
+
+
+        # compute gradient of the function indexed by function_num
+        if out is None:
+            out = self.functions[function_num].gradient(x)
+        else:
+            self.functions[function_num].gradient(x, out = out) 
+
+        # scale wrt number of functions 
+        out*=self.num_functions 
+
+        return out         
+
+
+
+
+
+
diff --git a/Wrappers/Python/cil/optimisation/functions/__init__.py b/Wrappers/Python/cil/optimisation/functions/__init__.py
@@ -34,3 +34,6 @@
 from .KullbackLeibler import KullbackLeibler
 from .Rosenbrock import Rosenbrock
 from .TotalVariation import TotalVariation
+from .ApproximateGradientSumFunction import ApproximateGradientSumFunction
+from .SGFunction import SGFunction
+