-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to deal with forecast rates ≤ 0 in any LL-based test? #226
Comments
Shouldn’t subzero rates throw an error to indicate unphysical numbers? A zero rate cell with an earthquake should in my view generate a negative infinity in the rate based tests, at least so the evaluators are aware. They may then decide on a strategy if the experiment design didn’t already.
…________________________________
From: Marcus Herrmann ***@***.***>
Sent: Sunday, May 21, 2023 12:39:51 PM
To: SCECcode/pycsep ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [SCECcode/pycsep] How to deal with forecast rates ≤ 0 in any LL-based test? (Issue #226)
It's not a surprise that zero (or even negative) rates cause issues in LL scores, but I believe pyCSEP does not generally check, deal, or warn in LL-based tests if forecasts contain them (an example: Akinci's HAZGRIDX in the 5-yr Italy experiment).
This affects every grid-based and catalog-based test except the N-test.
I noticed one exception: rates ≤ 0 are silently omitted in binomial_evaluations<https://github.com/SCECcode/pycsep/blob/master/csep/core/binomial_evaluations.py> using masked arrays (numpy.ma.masked_where). But this is not an optimal treatment, because it provides a model the opportunity to cheat: in areas where it is unsure, it simply forecasts 0; if a target event should occur, it won't count. Apart that, excluding rates ≤ 0 could trigger a corner case in the T-test when all target event rates are ≤ 0 (see case 3 in #225<#225>).
So the better approach is to replace forecast rates ≤ 0 with something small, preferably the minimum among all forecast rates, or similar (e.g., an order of magnitude below the minimum). The most suitable location for this treatment is at the top of
* .core.poisson_evaluations._poisson_likelihood_test(),
* .core.poisson_evaluations._t_test_ndarray(),
* .utils.calc._compute_likelihood(), and
* .core.catalog_evaluations.magnitude_test().
―
Reply to this email directly, view it on GitHub<#226>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD6OISJF5MK2ICQRGR2YSQTXHH5IPANCNFSM6AAAAAAYJK7KN4>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Currently no Error is thrown when reading or processing unreasonable rates. Case 2 in #225 illustrates what happens in the T-test:
Good idea, Max, to use
Granted, it's a rare case, but it does seem to happen with grid-based forecasts. Catalog-based forecasts with a limited number of simulations and no dedicated background simulation may be even more prone. But since target events should rarely occur in ≤ 0 rate bins, I believe that an |
Nice catch. I think the best is to provide a forecast format checker (e.g. prohibit negative mean rate values), and additionally, to raise an additional warning in case the forecast has zero-rate bins. However, the now present Runtime warnings are low-level numpy warnings, and although they are ugly, I don't think they should be silenced or bypassed. Also, agree with -inf should be passed as the given result. Personally, I wouldn't provide options (or even less defaulting) to "fix" the forecasts. Although can be obvious that they need fixing for negative rates, it is not for zero rated forecasts. The method selection would have an impact on expected results, and perhaps should be addressed by the modeler in their workflow as a modeling decision, not testing. |
I agree with Pablo here. Grid based forecasts should not contain zero-rate cells, so we should have a mechanism for checking this condition. If we want to use the forecasts as is, we can implement some type of masking process into forecasts to simply ignore these cells and carry on with the tests like normal. |
So let me summarize what we probably agree on to implement:
Things are a bit different for the Brier score (see #232 for feedback from Francesco on this issue), which is not affected by rates = 0 (not sure about negative rates). So additionally we could do inside the format checker (still have to agree on):
(I can keep those item list and check boxes updated based on further discussions and/or progress 👀) |
Thank you Marcus with the summary! I agree with 1, 2 and 4. Somethings I have already done in floatcsep, and should be moved to pycsep. Will update soon about this. |
Just wanted to mention, given our discussion last meeting, that having a checker for negative rates or zero rates is not trivial because of performance. We could
any ideas welcome before entering testing those 2 ideas. |
Since we only want to be informative, without modifying the forecast data at all, I believe option (1) would be sufficient: simply implement a format checker that gets called after loading the forecasts. (a check on every T-test execution would be indeed overkill). I'd place the format checker in a new method
Likewise, we should add a format checker in About
in my previous comment: this is not a performance issue as it checks only one value, not an array. |
I agree with @mherrmann3 here. I think we should have a checker that gets called after loading the forecasts along with a flag that can disable this option in case someone doesn't want to use it. An associated warning could let them know in the output that the results are unchecked. As far as the catalog forecasts go, a checker will be a lot more heavy weight because you will have to create a gridded forecast from the catalogs in order to do the checking. This should probably be called in |
It's not a surprise that zero (or even negative) rates cause issues in LL scores, but I believe pyCSEP does not generally check, deal, or warn in LL-based tests if forecasts contain them (an example: Akinci's HAZGRIDX in the 5-yr Italy experiment).
This affects every grid-based and catalog-based test except the N-test.
I noticed one exception: rates ≤ 0 are silently omitted in binomial_evaluations using masked arrays (
numpy.ma.masked_where
). But this is not an optimal treatment, because it provides a model the opportunity to cheat and game the system: in areas where it is unsure or expects low seismicity, it simply forecasts 0; if a target event should occur in such bins, it won't count in the testing. Apart that, excluding rates ≤ 0 could trigger a corner case in the T-test when all target event rates are ≤ 0 (see case 3 in #225).So the better approach is to replace forecast rates ≤ 0 (in a reproducible way, not randomly), e.g., with
It would be convenient to write a separate function for this treatment and use it
.core.poisson_evaluations._poisson_likelihood_test()
,.core.poisson_evaluations._t_test_ndarray()
,.utils.calc._compute_likelihood()
, and.core.catalog_evaluations.magnitude_test()
.The text was updated successfully, but these errors were encountered: