Implementation of the Brier Score and its consistency test. #232

pabloitu · 2023-06-06T16:35:41Z

Added new module that makes a rough calculation, derived from the Binary LL test. Added a unit test that evaluates a forecast and plots the results.
pending:

Check about if this test should be a conditional test (e.g. to the total number of events)
Check if the simulation scheme is appropriate
Re-do unit tests

@wsavran @Serra314 This is a preliminary implementation and we can start reviewing the code it in this same PR.

pyCSEP Pull Request Checklist

Type of change:

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

mherrmann3 · 2023-06-14T09:05:49Z

csep/core/brier_evaluations.py

+    """
+
+    # Array-masking that avoids log singularities:
+    forecast_data = numpy.ma.masked_where(forecast_data <= 0.0, forecast_data)


Beware of the potential side effects of filtering away zero-rate bins, see #226 (tl;dr: allows a modeler to cheat).

For the Brier score zero-rates bins are not problematic and should not be removed. One of the advantages of the Brier is indeed that it is always finite and we do not need to filter out or modify any of the rates.

How about negative rates (just in case)?

The rates should be always non-negative. If we estimate them from catalogue-based forecasts as the average (per synthetic catalog) number of points per bin we should never get negative rates. If the rates come from a grid-based forecast then it should be an indication that there is a problem with the model on how these rates are generated. More specifically, the Brier score does not work directly with the rates, it works by estimating the probability that a bin is active (percentage of synthetic catalogues with at least 1 event in that bin), so zero-rates bins have zero probability, but bins with the same (positive) rate may have different probabilities depending on the variance. The rate completely determines the probability only in the case of grid-based forecasts assuming an independent Poisson distribution for the number of events in each bin. In both cases, (I think) it should be impossible to get negative rates, if the forecast is catalogue-based this is by construction, if the forecast is grid-based with Poisson counts then the rate should always be greater or equal to zero by definition of the Poisson distribution. The same holds if we consider a Negative Binomial distribution which has both parameters, and the mean, greater or equal zero.

Thanks for the quick explainer, Francesco!

This reminds me that we should clarify that this PR is currently for grid-based forecasts only.

I only ask about negative rates to think about necessary "safety" measures (currently there are none, apart the silent masking here and in binomial_evaluations). Since ≤ 0 rates could in principle be used for cheating in LL-based tests when masking them away (#226), I wanted to inquire whether similar things are possible with the Brier score.

The first sentence should be "The rates should be always non-negative"

Ah, sure; I mentally replaced positive with non-negative when I read your first sentence ;-) (Btw: you can always edit your posts)

I edited and removed the last comment :) Anyway, I don't know what would happen with the Brier score when using negative rates, it depends on how we calculate the probability of activity given a negative rate. I think that a possible solution could be to impose them to zero and rise a warning saying that some of the rates were negative. This would be similar to setting the negative rates to the minimum possible (not forecasted) value. Otherwise, we can return -inf (which is not attainable by any "valid" forecast with the Brier score) and it would be a clear indication that something is wrong in the forecast. I'd still throw a specific warning though to let the user know what is going on.

Excluding them may be used to cheat by placing them in bins that a malicious user wants to exclude from the computations.

Yes, spitting out at least a Warning in those cases seems a common desire in #226. But there are good reasons to not "touch"/"fix" the original forecasts - at least the zero rates. In any case, we should continue this discussion there.

mherrmann3 · 2023-06-14T09:13:56Z

csep/core/brier_evaluations.py

+    return brier
+
+
+def _simulate_catalog(sim_cells, sampling_weights, sim_fore, random_numbers=None):


No need to pass sim_fore, as it is created in this function.
Instead, move L73 inside here (sim_fore = numpy.zeros(sampling_weights.shape).
(Same applies to core/poisson_evaluations._simulate_catalog(), from which this function derives from.)

mherrmann3 · 2023-06-14T09:32:04Z

csep/core/brier_evaluations.py

+            loc = numpy.searchsorted(sampling_weights, random_num, side='right')
+            if sim_fore[loc] == 0:
+                sim_fore[loc] = 1
+                num_active_cells = num_active_cells + 1


num_active_cells += 1

mherrmann3 · 2023-06-14T15:25:17Z

csep/core/brier_evaluations.py

+from csep.core.exceptions import CSEPCatalogException
+
+
+def bier_score_ndarray(forecast, observations):


Why not prepending an underscore (_) to the function name?

…unit_test that only evaluates a forecast and plot the result.

… test. It is implemented only using a Binary calculation for spatial-magnitude bins. tests: Implemented brier score tests.

codecov-commenter · 2024-01-22T19:23:35Z

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (083c5ca) 54.95% compared to head (0b9ea62) 55.05%.

❗ Current head 0b9ea62 differs from pull request most recent head a50d1e3. Consider uploading reports for the commit a50d1e3 to get more accurate results

Files	Patch %	Lines
csep/core/brier_evaluations.py	60.93%	20 Missing and 5 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #232      +/-   ##
==========================================
+ Coverage   54.95%   55.05%   +0.09%     
==========================================
  Files          23       24       +1     
  Lines        3963     4027      +64     
  Branches      578      587       +9     
==========================================
+ Hits         2178     2217      +39     
- Misses       1648     1668      +20     
- Partials      137      142       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pabloitu · 2024-01-22T19:41:13Z

@Serra314 had an important comment here which I copy here:

Hi Pablo, nice work implementing the Brier score. I only have one doubt with it, why we use poisson.cdf to define the probabilities? I foresee two problems with this, 1. the usual problem with models that do not assume a Poisson distribution in each bin, and 2. if we plan to use this function also to score the magnitude distribution then using Poisson to calculate probabilities is not correct and we would need to implement a different function for that task. My suggestion would be to create a function that takes in input the probabilities (however calculated) and the observations and just calculate the Brier score, then we can apply it to different cases just creating the probability vector differently. This could be also useful to study the difference in the results depending on how we calculate these probabilities. This is just a thought, I am keen to discuss further (sorry to have missed last meeting, probably you already discussed this there).

We definitely have to discuss this. Since we started implementing new metrics, there is an subtle entanglement in pycsep code-base between scores, tests and the probabilistic models, and I have no idea how to address them without a major refactoring. For instance, I agree that the Brier score should accept other distributions than Poisson, but so should the log-likelihood metric (e.g., already implemented as bernoulli and negative binomial models in the binomial_evaluations.py module). Additionally, the Brier score should admit spatial/magnitude/cl consistency tests, and also a non-binary form. Rankings should embrace also different scores other than log-likelihoods, and perhaps t/w-tests when allowed.

I would stick with the Poisson model for the preliminary implementation (as it was the one published by you and soon by me), and then implement stuff forward once we think how to categorize score/tests/rankings in the codebase.

pabloitu · 2024-01-22T19:43:03Z

@Serra314 Would it be possible for you to test my latest implementation with the forecasts and brier results of your manuscript? If okay with you Francesco, we could include your pycsep2 manuscript's brier paragraph into the documentation.

mherrmann3 reviewed Jun 14, 2023

View reviewed changes

mherrmann3 mentioned this pull request Jun 16, 2023

How to deal with forecast rates ≤ 0 in any LL-based test? #226

Open

pabloitu added 2 commits January 15, 2024 19:53

ft: added preliminary implementation of the brier score test, and an …

e36bc42

…unit_test that only evaluates a forecast and plot the result.

ft: significantly increased brier test speed

5fb55ca

pabloitu force-pushed the brier_evaluations branch from 24338e6 to 5fb55ca Compare January 15, 2024 22:02

ft: finalized brier score implementation as well as brier consistency…

a50d1e3

… test. It is implemented only using a Binary calculation for spatial-magnitude bins. tests: Implemented brier score tests.

pabloitu changed the title ~~Preliminary implementation of the Brier Score and its consistency test.~~ Implementation of the Brier Score and its consistency test. Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the Brier Score and its consistency test. #232

Implementation of the Brier Score and its consistency test. #232

pabloitu commented Jun 6, 2023 •

edited

Loading

mherrmann3 Jun 14, 2023 •

edited

Loading

Serra314 Jun 14, 2023

mherrmann3 Jun 14, 2023

Serra314 Jun 14, 2023 •

edited

Loading

mherrmann3 Jun 14, 2023

Serra314 Jun 14, 2023 •

edited

Loading

mherrmann3 Jun 16, 2023

mherrmann3 Jun 14, 2023 •

edited

Loading

mherrmann3 Jun 14, 2023

mherrmann3 Jun 14, 2023

codecov-commenter commented Jan 22, 2024

pabloitu commented Jan 22, 2024

pabloitu commented Jan 22, 2024

		return brier


		def _simulate_catalog(sim_cells, sampling_weights, sim_fore, random_numbers=None):

		from csep.core.exceptions import CSEPCatalogException


		def bier_score_ndarray(forecast, observations):

Implementation of the Brier Score and its consistency test. #232

Are you sure you want to change the base?

Implementation of the Brier Score and its consistency test. #232

Conversation

pabloitu commented Jun 6, 2023 • edited Loading

pyCSEP Pull Request Checklist

Type of change:

Checklist:

mherrmann3 Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

Serra314 Jun 14, 2023

Choose a reason for hiding this comment

mherrmann3 Jun 14, 2023

Choose a reason for hiding this comment

Serra314 Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

mherrmann3 Jun 14, 2023

Choose a reason for hiding this comment

Serra314 Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

mherrmann3 Jun 16, 2023

Choose a reason for hiding this comment

mherrmann3 Jun 14, 2023 • edited Loading

Choose a reason for hiding this comment

mherrmann3 Jun 14, 2023

Choose a reason for hiding this comment

mherrmann3 Jun 14, 2023

Choose a reason for hiding this comment

codecov-commenter commented Jan 22, 2024

Codecov Report

pabloitu commented Jan 22, 2024

pabloitu commented Jan 22, 2024

pabloitu commented Jun 6, 2023 •

edited

Loading

mherrmann3 Jun 14, 2023 •

edited

Loading

Serra314 Jun 14, 2023 •

edited

Loading

Serra314 Jun 14, 2023 •

edited

Loading

mherrmann3 Jun 14, 2023 •

edited

Loading