-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the Brier Score and its consistency test. #232
base: main
Are you sure you want to change the base?
Conversation
""" | ||
|
||
# Array-masking that avoids log singularities: | ||
forecast_data = numpy.ma.masked_where(forecast_data <= 0.0, forecast_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware of the potential side effects of filtering away zero-rate bins, see #226 (tl;dr: allows a modeler to cheat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Brier score zero-rates bins are not problematic and should not be removed. One of the advantages of the Brier is indeed that it is always finite and we do not need to filter out or modify any of the rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about negative rates (just in case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rates should be always non-negative. If we estimate them from catalogue-based forecasts as the average (per synthetic catalog) number of points per bin we should never get negative rates. If the rates come from a grid-based forecast then it should be an indication that there is a problem with the model on how these rates are generated. More specifically, the Brier score does not work directly with the rates, it works by estimating the probability that a bin is active (percentage of synthetic catalogues with at least 1 event in that bin), so zero-rates bins have zero probability, but bins with the same (positive) rate may have different probabilities depending on the variance. The rate completely determines the probability only in the case of grid-based forecasts assuming an independent Poisson distribution for the number of events in each bin. In both cases, (I think) it should be impossible to get negative rates, if the forecast is catalogue-based this is by construction, if the forecast is grid-based with Poisson counts then the rate should always be greater or equal to zero by definition of the Poisson distribution. The same holds if we consider a Negative Binomial distribution which has both parameters, and the mean, greater or equal zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick explainer, Francesco!
This reminds me that we should clarify that this PR is currently for grid-based forecasts only.
I only ask about negative rates to think about necessary "safety" measures (currently there are none, apart the silent masking here and in binomial_evaluations
). Since ≤ 0 rates could in principle be used for cheating in LL-based tests when masking them away (#226), I wanted to inquire whether similar things are possible with the Brier score.
The first sentence should be "The rates should be always non-negative"
Ah, sure; I mentally replaced positive with non-negative when I read your first sentence ;-) (Btw: you can always edit your posts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited and removed the last comment :) Anyway, I don't know what would happen with the Brier score when using negative rates, it depends on how we calculate the probability of activity given a negative rate. I think that a possible solution could be to impose them to zero and rise a warning saying that some of the rates were negative. This would be similar to setting the negative rates to the minimum possible (not forecasted) value. Otherwise, we can return -inf (which is not attainable by any "valid" forecast with the Brier score) and it would be a clear indication that something is wrong in the forecast. I'd still throw a specific warning though to let the user know what is going on.
Excluding them may be used to cheat by placing them in bins that a malicious user wants to exclude from the computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, spitting out at least a Warning in those cases seems a common desire in #226. But there are good reasons to not "touch"/"fix" the original forecasts - at least the zero rates. In any case, we should continue this discussion there.
csep/core/brier_evaluations.py
Outdated
return brier | ||
|
||
|
||
def _simulate_catalog(sim_cells, sampling_weights, sim_fore, random_numbers=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to pass sim_fore
, as it is created in this function.
Instead, move L73 inside here (sim_fore = numpy.zeros(sampling_weights.shape)
.
(Same applies to core/poisson_evaluations._simulate_catalog()
, from which this function derives from.)
csep/core/brier_evaluations.py
Outdated
loc = numpy.searchsorted(sampling_weights, random_num, side='right') | ||
if sim_fore[loc] == 0: | ||
sim_fore[loc] = 1 | ||
num_active_cells = num_active_cells + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_active_cells += 1
csep/core/brier_evaluations.py
Outdated
from csep.core.exceptions import CSEPCatalogException | ||
|
||
|
||
def bier_score_ndarray(forecast, observations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not prepending an underscore (_
) to the function name?
…unit_test that only evaluates a forecast and plot the result.
24338e6
to
5fb55ca
Compare
… test. It is implemented only using a Binary calculation for spatial-magnitude bins. tests: Implemented brier score tests.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #232 +/- ##
==========================================
+ Coverage 54.95% 55.05% +0.09%
==========================================
Files 23 24 +1
Lines 3963 4027 +64
Branches 578 587 +9
==========================================
+ Hits 2178 2217 +39
- Misses 1648 1668 +20
- Partials 137 142 +5 ☔ View full report in Codecov by Sentry. |
@Serra314 had an important comment here which I copy here:
We definitely have to discuss this. Since we started implementing new metrics, there is an subtle entanglement in pycsep code-base between scores, tests and the probabilistic models, and I have no idea how to address them without a major refactoring. For instance, I agree that the Brier score should accept other distributions than Poisson, but so should the log-likelihood metric (e.g., already implemented as bernoulli and negative binomial models in the I would stick with the Poisson model for the preliminary implementation (as it was the one published by you and soon by me), and then implement stuff forward once we think how to categorize score/tests/rankings in the codebase. |
@Serra314 Would it be possible for you to test my latest implementation with the forecasts and brier results of your manuscript? If okay with you Francesco, we could include your pycsep2 manuscript's brier paragraph into the documentation. |
Added new module that makes a rough calculation, derived from the Binary LL test. Added a unit test that evaluates a forecast and plots the results.
pending:
@wsavran @Serra314 This is a preliminary implementation and we can start reviewing the code it in this same PR.
pyCSEP Pull Request Checklist
Type of change:
Please delete options that are not relevant.
Checklist: