-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the Brier Score and its consistency test. #232
Open
pabloitu
wants to merge
3
commits into
SCECcode:main
Choose a base branch
from
pabloitu:brier_evaluations
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
import numpy | ||
import numpy as np | ||
from scipy.stats import poisson | ||
|
||
from csep.models import EvaluationResult | ||
from csep.core.exceptions import CSEPCatalogException | ||
|
||
|
||
def _brier_score_ndarray(forecast, observations): | ||
""" Computes the brier (binary) score for spatial-magnitude cells | ||
using the formula: | ||
|
||
Q(Lambda, Sigma) = 1/N sum_{i=1}^N (Lambda_i - Ind(Sigma_i > 0 ))^2 | ||
|
||
where Lambda is the forecast array, Sigma is the observed catalog, N the | ||
number of spatial-magnitude cells and Ind is the indicator function, which | ||
is 1 if Sigma_i > 0 and 0 otherwise. | ||
|
||
Args: | ||
forecast: 2d array of forecasted rates | ||
observations: 2d array of observed counts | ||
Returns | ||
brier: float, brier score | ||
""" | ||
|
||
prob_success = 1 - poisson.cdf(0, forecast) | ||
brier_cell = np.square(prob_success.ravel() - (observations.ravel() > 0)) | ||
brier = -2 * brier_cell.sum() | ||
|
||
for n_dim in observations.shape: | ||
brier /= n_dim | ||
return brier | ||
|
||
|
||
def _simulate_catalog(sim_cells, sampling_weights, random_numbers=None): | ||
""" | ||
Simulates a catalog by sampling from the sampling_weights array. | ||
Identical to binomial_evaluations._simulate_catalog | ||
|
||
Args: | ||
sim_cells: | ||
sampling_weights: | ||
random_numbers: | ||
|
||
Returns: | ||
|
||
""" | ||
sim_fore = numpy.zeros(sampling_weights.shape) | ||
|
||
if random_numbers is None: | ||
# Reset simulation array to zero, but don't reallocate | ||
sim_fore.fill(0) | ||
num_active_cells = 0 | ||
while num_active_cells < sim_cells: | ||
random_num = numpy.random.uniform(0,1) | ||
loc = numpy.searchsorted(sampling_weights, random_num, | ||
side='right') | ||
if sim_fore[loc] == 0: | ||
sim_fore[loc] = 1 | ||
num_active_cells += 1 | ||
else: | ||
# Find insertion points using binary search inserting | ||
# to satisfy a[i-1] <= v < a[i] | ||
pnts = numpy.searchsorted(sampling_weights, random_numbers, | ||
side='right') | ||
# Create simulated catalog by adding to the original locations | ||
numpy.add.at(sim_fore, pnts, 1) | ||
|
||
assert sim_fore.sum() == sim_cells, "simulated the wrong number of events!" | ||
return sim_fore | ||
|
||
|
||
def _brier_score_test(forecast_data, observed_data, num_simulations=1000, | ||
random_numbers=None, seed=None, verbose=True): | ||
""" Computes the Brier consistency test conditional on the total observed | ||
number of events | ||
|
||
Args: | ||
forecast_data: 2d array of forecasted rates for spatial_magnitude cells | ||
observed_data: 2d array of a catalog resampled to spatial_magnitude | ||
cells | ||
num_simulations: number of synthetic catalog simulations | ||
random_numbers: numpy array of random numbers to use for simulation | ||
seed: seed for random number generator | ||
verbose: print status updates | ||
|
||
|
||
|
||
""" | ||
# Array-masking that avoids log singularities: | ||
forecast_data = numpy.ma.masked_where(forecast_data <= 0.0, forecast_data) | ||
|
||
# set seed for the likelihood test | ||
if seed is not None: | ||
numpy.random.seed(seed) | ||
|
||
# used to determine where simulated earthquake should | ||
# be placed, by definition of cumsum these are sorted | ||
sampling_weights = (numpy.cumsum(forecast_data.ravel()) / | ||
numpy.sum(forecast_data)) | ||
|
||
# data structures to store results | ||
simulated_brier = [] | ||
n_active_cells = len(numpy.unique(numpy.nonzero(observed_data.ravel()))) | ||
num_cells_to_simulate = int(n_active_cells) | ||
|
||
# main simulation step in this loop | ||
for idx in range(num_simulations): | ||
if random_numbers is None: | ||
sim_fore = _simulate_catalog(num_cells_to_simulate, | ||
sampling_weights) | ||
else: | ||
sim_fore = _simulate_catalog(num_cells_to_simulate, | ||
sampling_weights, | ||
random_numbers=random_numbers[idx, :]) | ||
|
||
# compute Brier score | ||
current_brier = _brier_score_ndarray(forecast_data.data, sim_fore) | ||
|
||
# append to list of simulated Brier score | ||
simulated_brier.append(current_brier) | ||
|
||
# just be verbose | ||
if verbose: | ||
if (idx + 1) % 100 == 0: | ||
print(f'... {idx + 1} catalogs simulated.') | ||
|
||
obs_brier = _brier_score_ndarray(forecast_data.data, observed_data) | ||
# quantile score | ||
qs = numpy.sum(simulated_brier <= obs_brier) / num_simulations | ||
|
||
# float, float, list | ||
return qs, obs_brier, simulated_brier | ||
|
||
|
||
def brier_score_test(gridded_forecast, | ||
observed_catalog, | ||
num_simulations=1000, | ||
seed=None, | ||
random_numbers=None, | ||
verbose=False): | ||
""" | ||
Performs the Brier conditional test on a Gridded Forecast using an | ||
Observed Catalog. Normalizes the forecast so the forecasted rate are | ||
consistent with the observations. This modification eliminates the strong | ||
impact differences in the number distribution have on the forecasted rates. | ||
""" | ||
|
||
# grid catalog onto spatial grid | ||
try: | ||
_ = observed_catalog.region.magnitudes | ||
except CSEPCatalogException: | ||
observed_catalog.region = gridded_forecast.region | ||
gridded_catalog_data = observed_catalog.spatial_magnitude_counts() | ||
|
||
# simply call likelihood test on catalog data and forecast | ||
qs, obs_brier, simulated_brier = _brier_score_test( | ||
gridded_forecast.data, | ||
gridded_catalog_data, | ||
num_simulations=num_simulations, | ||
seed=seed, | ||
random_numbers=random_numbers, | ||
verbose=verbose | ||
) | ||
|
||
# populate result data structure | ||
result = EvaluationResult() | ||
result.test_distribution = simulated_brier | ||
result.name = 'Brier score-Test' | ||
result.observed_statistic = obs_brier | ||
result.quantile = qs | ||
result.sim_name = gridded_forecast.name | ||
result.obs_name = observed_catalog.name | ||
result.status = 'normal' | ||
result.min_mw = numpy.min(gridded_forecast.magnitudes) | ||
|
||
return result | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware of the potential side effects of filtering away zero-rate bins, see #226 (tl;dr: allows a modeler to cheat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Brier score zero-rates bins are not problematic and should not be removed. One of the advantages of the Brier is indeed that it is always finite and we do not need to filter out or modify any of the rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about negative rates (just in case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rates should be always non-negative. If we estimate them from catalogue-based forecasts as the average (per synthetic catalog) number of points per bin we should never get negative rates. If the rates come from a grid-based forecast then it should be an indication that there is a problem with the model on how these rates are generated. More specifically, the Brier score does not work directly with the rates, it works by estimating the probability that a bin is active (percentage of synthetic catalogues with at least 1 event in that bin), so zero-rates bins have zero probability, but bins with the same (positive) rate may have different probabilities depending on the variance. The rate completely determines the probability only in the case of grid-based forecasts assuming an independent Poisson distribution for the number of events in each bin. In both cases, (I think) it should be impossible to get negative rates, if the forecast is catalogue-based this is by construction, if the forecast is grid-based with Poisson counts then the rate should always be greater or equal to zero by definition of the Poisson distribution. The same holds if we consider a Negative Binomial distribution which has both parameters, and the mean, greater or equal zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick explainer, Francesco!
This reminds me that we should clarify that this PR is currently for grid-based forecasts only.
I only ask about negative rates to think about necessary "safety" measures (currently there are none, apart the silent masking here and in
binomial_evaluations
). Since ≤ 0 rates could in principle be used for cheating in LL-based tests when masking them away (#226), I wanted to inquire whether similar things are possible with the Brier score.Ah, sure; I mentally replaced positive with non-negative when I read your first sentence ;-) (Btw: you can always edit your posts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited and removed the last comment :) Anyway, I don't know what would happen with the Brier score when using negative rates, it depends on how we calculate the probability of activity given a negative rate. I think that a possible solution could be to impose them to zero and rise a warning saying that some of the rates were negative. This would be similar to setting the negative rates to the minimum possible (not forecasted) value. Otherwise, we can return -inf (which is not attainable by any "valid" forecast with the Brier score) and it would be a clear indication that something is wrong in the forecast. I'd still throw a specific warning though to let the user know what is going on.
Excluding them may be used to cheat by placing them in bins that a malicious user wants to exclude from the computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, spitting out at least a Warning in those cases seems a common desire in #226. But there are good reasons to not "touch"/"fix" the original forecasts - at least the zero rates. In any case, we should continue this discussion there.