Skip to content

Commit

Permalink
Merge pull request #3 from vincent-laurent/main
Browse files Browse the repository at this point in the history
[DEV] major update
  • Loading branch information
vincent-laurent authored Jul 30, 2024
2 parents 76c409d + cf7822a commit 6c5cc3a
Show file tree
Hide file tree
Showing 27 changed files with 896 additions and 466 deletions.
1 change: 1 addition & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install pytest pytest-cov
pip install git+https://github.com/modAL-python/modAL.git
pip install .
Expand Down
Binary file modified .public/active_vs_passive.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .public/example_krg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
67 changes: 32 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,26 @@

# Active Strategy for surface response estimation

[![License](https://img.shields.io/badge/license-apache_2.0-blue.svg)]( https://github.com/eurobios-mews-labs/active-bagging-learning/blob/master/LICENSE)
![cov](https://github.com/eurobios-mews-labs/active-bagging-learning/blob/coverage-badge/coverage.svg)
[![Maintenance](https://img.shields.io/badge/maintained%3F-yes-green.svg)](https://GitHub.com/eurobios-mews-labs/active-bagging-learning/graphs/commit-activity)
## Installation
# Active Strategy for surface response estimation
This library proposes a plug-in approach to active learning utilizing bagging techniques.
Bagging, or bootstrap aggregating, is an ensemble learning method designed to improve
the stability and accuracy of machine learning algorithms. By leveraging bagging,
we aim to enhance the efficiency of active learning strategies in approximating the target function $`f`$.
* The objective is to approximate function $`f \in \mathcal{X} \rightarrow \mathbb{R}^n`$.
* **Objective :** find an estimation of $`f`$, $`\hat{f}`$ in a family of measurable function $`\mathcal{F}`$ such that $` f^* = \underset{\hat{f} \in \mathcal{F}}{\text{argmin}} \|f - \hat{f} \| `$
* At time $`t`$ we dispose of a set of $`n`$ evaluations $`(x_i, f(x_i))_{i\leqslant n}`$
* All feasible points can be sampled in domain $`\mathcal{X}`$
* This tools enable users to query new point based on uncertainty measure.

```shell
python -m pip install git+https://gitlab.eurobios.com/vlaurent/surrogate-models.git
```

## Literature
* **Review** [Simpson2001](https://ntrs.nasa.gov/api/citations/19990087092/downloads/19990087092.pdf)

<img height="300" src="https://i.imgur.com/w571mZ7.png" width="400"/>

* **Reliability** in [[Marelli2018]](https://arxiv.org/pdf/1709.01589) using polynomial chaos expansion. The problem is to find a region defined by a function $\{x ; \, g(x) \leqslant 0\}$ where $g$ is called limit state function. *Bootstrap approach to estimate variance*
* **Properties in multilayer percpetron network** [[Fukumizu2000]](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.1885&rep=rep1&type=pdf) regression problem. Active learning : resampling trapped in local minima ? Redundancy of hidden units in active learning
* Gaussian process using mutual information
* **Surface response methodology** [[Bezerra2008]](https://d1wqtxts1xzle7.cloudfront.net/45518928/Response_Surface_Methodology_RSM_as_a_20160510-11788-z5s7f4-with-cover-page-v2.pdf?Expires=1647600354&Signature=FWuGdH4xQIPYbo6gjfofYOvSiNCZknuwktVpgOuRU0wbBAjHhrN2a2cYCoLaqFmhLzuJNl~TeX2iXFh7rYFlAfgBwqQh6-lV29XxuU6AJTqj6lkP2MaIMHke4RMcJ6mJN39lXcfg6Ohf5D9TnD7v-Eze4fHCHbklEk9REPok6O0V3MIvx7A4XriV5Tffe5yu1HZ1fCuHBULS5PiRyuRBzKavclvPFQBPDWx5-J~y9a85oB6JGcey3VId7fvtfRUGXXn49WqHm3fJfqpLbYj62drFGjE6XcmBWm1CzBn0Guaf~ig8k6JfI9wOrErxofAkR8tjnd51VUAelB0XCY4v1A__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA) based on linear models
## Context
## Installation

Plug in approach to active learning for surface response estimation
```shell
python -m pip install git+https://github.com/eurobios-mews-labs/active-bagging-learning.git
```

* The objective is to approximate function $`f \in \mathcal{X} \rightarrow \mathbb{R}^n`$.
* **Objective :** find an estimation of $`f`$, $`\hat{f}`$ in a family of measurable function $`\mathcal{F}`$ such that $` f^* = \underset{\hat{f} \in \mathcal{F}}{\text{argmin}} \|f - \hat{f} \| `$
* At time $`t`$ we dispose of a set of $`n`$ evaluations $`(x_i, f(x_i))_{i\leqslant n}`$
* All feasible points can be sampled in domain $`\mathcal{X}`$

## Basic usage

Expand All @@ -35,35 +30,37 @@ import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor

from active_learning import ActiveSRLearner
from active_learning.components.active_criterion import ServiceVarianceEnsembleMethod
from active_learning import ActiveSurfaceLearner
from active_learning.components.active_criterion import VarianceEnsembleMethod
from active_learning.components.query_strategies import ServiceQueryVariancePDF
from active_learning.benchmark import functions

fun = functions.grammacy_lee_2009 # The function we want to learn
bounds = np.array(functions.bounds[fun]) # [x1 bounds, x2 bounds]
fun = functions.grammacy_lee_2009 # The function we want to learn
bounds = np.array(functions.bounds[fun]) # [x1 bounds, x2 bounds]
n = 50
X_train = pd.DataFrame(
{'x1': (bounds[0, 0] - bounds[0, 1]) * np.random.rand(n) + bounds[0, 1],
'x2': (bounds[1, 0] - bounds[1, 1]) * np.random.rand(n) + bounds[1, 1],
}) # Initiate distribution
}) # Initiate distribution
y_train = -fun(X_train)

active_criterion = ServiceVarianceEnsembleMethod( # Parameters to be used to estimate the surface response
estimator=ExtraTreesRegressor( # Base estimator for the surface
max_features=0.8, bootstrap=True)
active_criterion = VarianceEnsembleMethod( # Parameters to be used to estimate the surface response
estimator=ExtraTreesRegressor( # Base estimator for the surface
max_features=0.8, bootstrap=True)
)
query_strategy = ServiceQueryVariancePDF(bounds, num_eval=int(20000))

# QUERY NEW POINTS
active_learner = ActiveSRLearner(
active_criterion, # Active criterion yields a surface
query_strategy, # Given active criterion surface, execute query
X_train, # Input data X
y_train, # Input data y (target)
active_learner = ActiveSurfaceLearner(
active_criterion, # Active criterion yields a surface
query_strategy, # Given active criterion surface, execute query
bounds=bounds)

X_new = active_learner.query(3) # Request 3 points
active_learner.fit(
X_train, # Input data X
y_train) # Input data y (target))

X_new = active_learner.query(3) # Request 3 points
```
To use the approach, one has to dispose of

Expand All @@ -79,7 +76,7 @@ To use the approach, one has to dispose of

* 1D example :

<img alt="benchmark" height="500" src=".public/example_krg.png" width="800"/>
<img alt="benchmark" height="800" src=".public/example_krg.png"/>

## Benchmark

Expand Down
2 changes: 1 addition & 1 deletion active_learning/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from active_learning.base import ActiveSRLearner
from active_learning.base import ActiveSurfaceLearner
from active_learning.components import query_strategies
from active_learning.components import active_criterion

66 changes: 27 additions & 39 deletions active_learning/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,62 +9,50 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import numpy as np
import pandas as pd
from copy import deepcopy

from active_learning.components.active_criterion import IActiveCriterion
from active_learning.components.query_strategies import IQueryStrategy


class ActiveSRLearner:
class ActiveSurfaceLearner:

def __init__(
self,
active_criterion: IActiveCriterion,
query_strategy: IQueryStrategy,
X_train: pd.DataFrame,
y_train: pd.DataFrame,
bounds=None,
):
self.active_criterion = active_criterion
self.query_strategy = query_strategy
self.x_input = X_train.copy()
self.y_input = y_train.copy()
self.bounds = bounds
self.result = {}
self.iter = 0
self.budget = len(X_train)
self.x_input.index = 0 * np.ones(len(self.x_input))
self.x_new = pd.DataFrame()
self.__active_criterion = active_criterion
self.__query_strategy = query_strategy
self.__bounds = bounds

def learn(self):
self.active_criterion.fit(
self.x_input,
self.y_input)
def fit(self, X: pd.DataFrame, y):
self.active_criterion.fit(X, y)
self.__columns = X.columns

def query(self, *args):
self.learn()
self.query_strategy.set_bounds(self.bounds)
def query(self, *args) -> pd.DataFrame:
self.query_strategy.set_bounds(self.__bounds)
self.query_strategy.set_active_function(self.active_criterion.__call__)
self.x_new = pd.DataFrame(self.query_strategy.query(*args), columns=self.x_input.columns)
self.save()
x_new = pd.DataFrame(self.query_strategy.query(*args), columns=self.__columns)
return x_new

return self.x_new
@property
def active_criterion(self) -> IActiveCriterion:
return self.__active_criterion

def add_labels(self, x: pd.DataFrame, y: pd.DataFrame):
self.iter += 1
x.index = self.iter * np.ones(len(x))
y.index = self.iter * np.ones(len(x))
self.x_input = pd.concat((x, self.x_input), axis=0)
self.y_input = pd.concat((y, self.y_input), axis=0)
self.budget = len(self.x_input)
@property
def query_strategy(self) -> IQueryStrategy:
return self.__query_strategy

def save(self):
@property
def surface(self) -> callable:
return self.__active_criterion.function

self.result[self.iter] = dict(
surface=deepcopy(self.active_criterion.function),
active_criterion=deepcopy(self.active_criterion),
budget=int(self.budget),
data=self.x_input
)
@property
def predict(self) -> callable:
return self.__active_criterion.function

@property
def bounds(self) -> iter:
return self.__bounds
16 changes: 8 additions & 8 deletions active_learning/benchmark/analyse.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
from active_learning.components import query_strategies
from active_learning.components.active_criterion import VarianceBis
from active_learning.components.sampling import latin_square
from active_learning.benchmark.test import TestingClass
from active_learning.benchmark.base import TestingClass

name = "grammacy_lee_2009_rand"
fun = functions.__dict__[name]
Expand Down Expand Up @@ -63,37 +63,37 @@ def get_method_for_benchmark(name):
crit = query_strategies.ServiceReject(num_eval=100)

elif name == "branin":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)

elif name == "branin_rand":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)
elif name == "himmelblau":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)

elif name == "himmelblau_rand":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)

elif name == "synthetic_2d_1":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)

elif name == "synthetic_2d_2":
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ExtraTreesRegressor(bootstrap=True,
max_samples=0.9))
crit = query_strategies.ServiceQueryVariancePDF(num_eval=1000)

else:
est = active_criterion.ServiceVarianceEnsembleMethod(
est = active_criterion.VarianceEnsembleMethod(
estimator=ensemble.ServiceExtraTreesRegressor(bootstrap=True,
max_samples=0.9,
max_features=1))
Expand Down
Loading

0 comments on commit 6c5cc3a

Please sign in to comment.