Bayesian analysis for calibration of input parameters #88

SorooshMani-NOAA · 2022-09-13T17:19:20Z

There's a package called PyMC3 that can be used for Bayesian analysis.

There's a good "hacker book" available at: https://nbviewer.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb

Initial discussion:

Bayesian approach is for parameter calibration
Using NN for surrogate generation vs polynomial chaos (we most accurate results for param calibration)
All the steps up to NN surrogate generation is the same as before
- We still run ensembles because we want to try it on SCHISM vs ADCirc from before
Priors for MCMC can be the same as ensemble generation probabilities
How to obtain the right likelihood function is the main question.
- One can just assume i.i.d. Gaussian to start with, but this won't give us the right results (measurements = data + Gaussian noise)
- Gaussian i.i.d. assumption is similar to LSE fit in deterministic world
There's an custom MCMC package that Khachik can provide.
- We can switch to PyMC whenever we need to as it is a well known widely used package.
- Using the internal package has the benefit that Khachik can help us with the issue details.
We'll be using best track forcing (no forecast) for hindcast of several storms
- We can try adding NWM forcing
- We can also try adding in rainfall by using PaHM to generate best track winds and then use ERA5 (?)

The text was updated successfully, but these errors were encountered:

SorooshMani-NOAA · 2022-09-13T19:03:46Z

@WPringle based on the discussion we had today, we only need to run best track for a couple of storms and with a couple of different ensemble sizes with Korobov sampling right?

WPringle · 2022-09-13T21:15:58Z

@SorooshMani-NOAA because it is best track we should change how we do the ensemble perturbation. I think we need to work out that strategy first, then we do ensemble then check that the ensemble results encompass the observations.

This paper is relevant which also look at best track:
Sochala, P., Chen, C., Dawson, C., & Iskandarani, M. (2020). A polynomial chaos framework for probabilistic predictions of storm surge events. Computational Geosciences, 24, 109–128. https://doi.org/10.1007/s10596-019-09898-5

See section 4.2. In there they treat the errors as uniform random variables with some amplification factor which we can tune.
Check out Figure 5 and 6.

I think we should follow their methodology for doing the track and intensity perturbations.

WPringle · 2022-09-13T21:16:58Z

@SorooshMani-NOAA You can see they also perturb manning's roughness too.

SorooshMani-NOAA · 2022-10-03T19:41:02Z

Just for the sake of completeness of this discussion:
We discussed how to generate ensembles and eventually decided to just run ensembles of different storms with different sizes for now. This is due to the limitation we had on using the available resources in terms of time.

Later we'll explore other parameter perturbations.

SorooshMani-NOAA · 2022-10-03T19:44:04Z

@WPringle have you tried running KL-NN code on the sample data? During our meeting (and also in the slides) it says the data is time dependent. But looking at the ydata.txt and the code that was shared with us, it seems that it's using data that is of the shape ensemble x mesh-node. Do you know if the plots in the slides where generated using the same code or not?

I haven't tried running it yet. I was just reviewing the slides and the code.

WPringle · 2022-10-03T20:15:17Z

I did run the code using the sample data and the plots on the slides are the result of that execution.

I wouldn't worry about whether it is "x" or "t", it is somewhat arbitrary and methodology can handle spatiotemporal in theory. But I am pretty sure "x" is referring to a time dimension here based on how it looks

SorooshMani-NOAA · 2022-10-03T20:34:15Z

I see, thank you

SorooshMani-NOAA · 2022-10-03T21:26:08Z

@WPringle I forgot to also ask, have you tried it on an ensemble too? Was there any issue I need to be aware of if I want to run this on combine maxelev ensemble dataset?

WPringle · 2022-10-04T15:04:14Z

No I did not. Just be aware of the dimensions of the data, number of parameters, number of nodes, and number of ensembles and how that fits into that code. It should be able to run through to the part where it requires the observations

SorooshMani-NOAA · 2022-10-04T16:00:20Z

OK, thank you

SorooshMani-NOAA · 2022-10-18T18:38:12Z

@WPringle I'm testing the KLNN (without MCMC part) on the same ensemble I used with your KLPC. I actually subsetted the maxelev dataset using your subsetting part of the KLPC script (from examples/karhunenloeve_polynomialchaos/klpc_wetonly.py). Although the dataset now fits into memory, still the KLE.build takes a lot of time. Would it be OK if we replace the KL part of Khachik's script with the same PCA that you were using in the following code?

EnsemblePerturbation/ensembleperturbation/uncertainty_quantification/karhunen_loeve_expansion.py

Line 39 in 33141b8

pca_obj = PCA(n_components=neig, random_state=666, whiten=True)

Out of curiosity, is there any reason behind using KLE/PCA for reducing dimensions? Why not use other methods?

WPringle · 2022-10-18T18:42:55Z

Yes, please use the PCA method we use in EnsemblePerturbation

What other methods are there? For the KLPC we have an analytical solution to convert the KL fit surrogate back to the full-dimension surrogate, which is at least one advantage I can think of...

SorooshMani-NOAA · 2022-10-18T18:56:15Z

Yes, please use the PCA method we use in EnsemblePerturbation

OK thanks!

What other methods are there?

I was talking in general, I know there are a bunch of dimension reduction techniques which might be better than PCA based certain ways/applications; so I was wondering if there's any specific reason. e.g.: https://towardsdatascience.com/11-dimensionality-reduction-techniques-you-should-know-in-2021-dcb9500d388b

... to convert the KL fit surrogate back to the full-dimension surrogate...

I see, I didn't really think about this part. It would still be interesting to see what other methods have to offer if we get the chance to explore

WPringle · 2022-10-18T19:12:14Z

yeah that's interesting! For sure explore them if you want to, we can have a meeting with Khachik sometime to discuss this, he might be able to say why we would prefer one type of method over another.

SorooshMani-NOAA · 2022-10-31T14:52:01Z

Copying the email for future reference:

As I was going over the KL-NN-MCMC code thinking about how to use USGS data within the model, or later how to use the temporal data, I decided to focus more on the PCA part of the code and read more about it. I stumbled upon this not so new paper:

https://www.researchgate.net/publication/263138615_Principal_Component_Analysis_on_Spatial_Data_An_Overview

This is an overview of PCA use for spatial data. For the standard PCA that we are using, it defines different modes of running, and neither of them incorporates all time, location and attributes at the same time.

Based on my understanding, in our case node index is "location", each ensemble is an "attribute", and the elevation output time in the time series is "time" (for later). We could also consider both ensemble-member and the timeseries as "time", but I'm not sure if that makes a lot of sense. So right now for max-elev analysis we are actually in Q-mode, but I don't know which mode we should use for (later) time series analysis.

Another question that I have is how to see if we can find a way to "train" KLNN on all the nodes, but use USGS data on ~270 HWM locations for MCMC tuning. The alternative obvious approach is to train KLNN on the station locations and then do MCMC on the same station locations (noting that since we don't have a lot of computational time, we interpolate field outputs to the HWM locations for now!).

SorooshMani-NOAA · 2022-10-31T14:53:42Z

I tried using the sklearn PCA and the KLNN part goes through successfully now. I downloaded USGS data and selected all the locations that are within the polygon of the mesh. Since the stations were not available at the time of simulation, I'm interpolating results to the station locations for the analysis. Now I'm trying to fix nan data and/or dimension mismatches to run the full KLNNMCMC.

WPringle · 2022-11-16T20:08:59Z

@SorooshMani-NOAA Could you please share the code you have so far for the results we looked at yesterday with Khachik?

SorooshMani-NOAA · 2022-11-16T20:35:34Z

@WPringle I uploaded the code on a private repo for now. I just added you and Saeed as collaborator. Note that the code must run on an already subsetted dataset. I tried to get it to work with minimum changes possible.

SorooshMani-NOAA · 2022-11-16T20:41:37Z

This is the plot I get for the MCMC results. If I discard the first half and pick every 10th:

Each plot is for one of the parameters values in Y axis vs X axis, the sample index. It seems that gamma in the code plays a major role in determining how white-noise like the parameters are; this if for gamma=0.01

Update
These plots are not in the repo code yet!

SorooshMani-NOAA · 2022-11-16T20:50:29Z

I also increased the 2000 epochs to 20,000 and it definitely improved the NN:

and

SorooshMani-NOAA · 2022-11-16T21:25:09Z

@WPringle if I want to train on the whole model but only MCMC on the USGS HWM stations, what should the non-station dimension value be when transforming observations using PCA? Should it be 0?

WPringle · 2022-11-16T21:39:37Z

@WPringle if I want to train on the whole model but only MCMC on the USGS HWM stations, what should the non-station dimension value be when transforming observations using PCA? Should it be 0?

@SorooshMani-NOAA That's a good question. I suppose 0 makes sense because in the matrix multiplication zeros will add nothing. Or if it doesn't work then we need to probably do the MCMC on the full result, rather than the dimensionally reduced result. First let's try zero and see...

Just to be clear, you are asking about what dummy value to put in obs_data at non-station locations right?

SorooshMani-NOAA · 2022-11-18T18:45:16Z

Just to be clear, you are asking about what dummy value to put in obs_data at non-station locations right?

@WPringle, sorry for late reply. Yes that's what I meant. I'll try and see how it goes

SorooshMani-NOAA · 2022-11-18T18:50:37Z

On a separate note, I plotted the MCMC results vs obs as well as model vs obs and it doesn't really look good. Note that I've already cut off HWM locations that are higher than 2 meter or less that 1 meter outside the min/max of HWM station resutls in the model and still it doesn't look good. I think I need to actually rerun the simulations:

where each thin bar is an station and the limits indicate max/min results from all the 30. and

SorooshMani-NOAA · 2022-11-21T15:34:31Z

Drop the med to low quality data before plotting
Plot both ADCIRC and SCHISM results on the scatter plot against the observations.
See how the peak of observation timeseries compare against the model

SorooshMani-NOAA · 2022-12-22T15:58:21Z

The result is not good for sure, but this time there seems to be a method to this madness! I mean there seems to be a bias involved here in the MCMC errorbar results. Still not sure how to interpret it, what do you think @WPringle ? Also please let me know how to share the .nc files. The combined for the 50 ensemble (42-43 went through) is around 36 GB, the subset is ~10GB

Updated sizes

WPringle · 2022-12-22T19:13:11Z

I guess the bias mostly comes from the fact that model results are generally under predicted. But it's a bit weird we would think the MAP crosses should be closest to the observations, instead they are lower down on the error bars.
Could you add the MCMC error bars to the Ensemble vs observation plot? So we can compare original ensemble to MCMC.

WPringle · 2023-01-03T16:37:01Z

@SorooshMani-NOAA I think we can use this function to transfer Gaussian to uniform https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.ndtr.html#scipy.special.ndtr

This will give you values of [0,1] then can rescale it to [-1,1] by multiply by 2 then subtract 1.

WPringle · 2023-01-17T16:59:20Z

@SorooshMani-NOAA I put my jupyter notebook and some modifications to mcmc.py and utils.py for using PCs on the globus Ensemble research collection under KLPC_MCMC_William

WPringle · 2023-01-18T16:51:46Z

@SorooshMani-NOAA Also check out https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html for trying to figure out best hyperparameters for the ANN model (used by this paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022JD037617)

SorooshMani-NOAA · 2023-01-31T18:39:32Z

@WPringle I think we missed one point during our discussion about Physical vs KL space:

When using Polynomial Chaos and we're in physical space, it's clear what is the input to the physical surrogate, but for the NN case, even if MCMC takes care of transforming back to physical, still we don't know what to pass to the NN in the first place.

NN only accepts KL space input, so we still run into the same issue of transforming data from observation-points-only to KL space, to then pass it to NN. So we should find a way to "transform" the neural network to accept physical inputs. Does that make sense?

WPringle · 2023-01-31T21:02:00Z

No I think it is the same because the “input” to the surrogate is just the value of the set of parameters while the length of the surrogate is either the number of eigenmodes or number of physical nodes depending on if in KL space of physics space. Of course we need to also provide the observed data and the corresponding indices in the physical surrogate that map to the data.

…

________________________________ From: Soroosh Mani ***@***.***> Sent: Tuesday, January 31, 2023 12:39:44 PM To: noaa-ocs-modeling/EnsemblePerturbation ***@***.***> Cc: Pringle, William ***@***.***>; Mention ***@***.***> Subject: Re: [noaa-ocs-modeling/EnsemblePerturbation] Bayesian analysis for calibration of input parameters (Issue #88) @WPringle<https://github.com/WPringle> I think we missed one point during our discussion about Physical vs KL space: When using Polynomial Chaos and we're in physical space, it's clear what is the input to the physical surrogate, but for the NN case, even if MCMC takes care of transforming back to physical, still we don't know what to pass to the NN in the first place. NN only accepts KL space input, so we still run into the same issue of transforming data from observation-points-only to KL space, to then pass it to NN. So we should find a way to "transform" the neural network to accept physical inputs. Does that make sense? — Reply to this email directly, view it on GitHub<#88 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFBHFHQL4EPL6RIG7KRZURTWVFL7BANCNFSM6AAAAAAQLUYNOU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

SorooshMani-NOAA · 2023-01-31T22:28:39Z

“input” to the surrogate is just the value of the set of parameters

Oh I see my mistake ... thanks! I'll just need to inverse_transform the KL space predictions from NN and then get the ones on obs location for MCMC to use!

Apart from that, I'm not sure if I use cross validation term correctly. Based on the description here https://scikit-learn.org/stable/modules/grid_search.html, this term is used for hyperparameter tuning (e.g. number of hidden layers or neurons in case of NN) and not model parameter selection. What we're doing really is looking at train/test for tuning parameters.

In the case of PC model, the existing code has:

cross_validator = LeaveOneOut()
regression_model = ElasticNetCV(
    fit_intercept=False,
    cv=cross_validator,
    l1_ratio=0.5,
    selection='random',
    random_state=666,
)

based on https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html my understanding is that cross validation is actually not used here, since l1_ratio is not a list of possible values.

WPringle · 2023-01-31T22:43:47Z

In the case of PC model, the existing code has:
cross_validator = LeaveOneOut()
regression_model = ElasticNetCV(
    fit_intercept=False,
    cv=cross_validator,
    l1_ratio=0.5,
    selection='random',
    random_state=666,
)
based on https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html my understanding is that cross validation is actually not used here, since l1_ratio is not a list of possible values.

Yep so it still is doing the CV for the alpha parameter in the cost function just not the l1_ratio but yes that could also be added as a list.

SorooshMani-NOAA · 2023-02-01T13:41:23Z

doing the CV for the alpha

Oh, yeah I missed it. In any case, what I'm trying to say is that unless we want to really find a good # of neurons or layers, CV approach might not make much sense for NN and we can do just the internal validation it already has for choosing parameters

WPringle · 2023-02-01T15:29:38Z

@SorooshMani-NOAA hmm I don't know about that. Maybe those CV functions aren't appropriate but I think it's important to do CV so we reduce the over training problem.

SorooshMani-NOAA · 2023-02-01T19:50:02Z

There seems to be other methods to avoid overfitting, e.g.:
https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html
In any case, this is the results by using val input but without any further change to the NN code:

The best validation loss happens early in the epochs, so doing more and more epochs doesn't really help based on this graph.

And this is the observation vs ensemble (with dry node extrapolation):

Also the results from SCHISM ensemble also improved by using physical-space-surrogate:

SorooshMani-NOAA · 2023-02-21T16:29:20Z

Action items as of 2/21/23 meeting:

Soroosh:

Try running a "nice" mesh ensemble (30 members) - best track
Generate plot
- model vs observed for MCMC
SCHISM-Internal Wave Model (update in parameters and compiled code)
Forecast track run instead of best (needs pre processing for it to become like best track for passing to SCHISM/PAHM)
- TroPycal might have some tools (for advisories, etc.)
- Need scripts from William
- Maybe stormevents is already doing the pre processing

William:

GAHM: Make everything Gaussian errors, 34,50,60-kt radiis, TC velocities, etc.
Bayesian Optimization of the NN
Uncertainty Maps

What do we need for finalizing/publishing?

Ideas of what's new compared to William's previous paper
- e.g. Hydrology, GAHM, BT vs Forecast, MCMC, NN vs PC

WPringle · 2023-02-21T17:06:07Z

@SorooshMani-NOAA See these scripts and best_track and advisory files here for Florence:
florence_tracks_scripts.zip

The plot_nhc_adv.py uses the tropycal package and get_advisory.py uses StormEvents, but not sure if works properly for getting advisory. But it's supposed to be able to.

WPringle · 2023-03-07T16:32:04Z

Plumlee et al. - 2021 - High-fidelity hurricane surge forecasting using em.pdf
lynomial.chaos.framework.for.probabilistic.predictions.of.storm.surge.events.pdf)

WPringle · 2023-03-07T16:33:06Z

Sochala et al. - 2020 - A polynomial chaos framework for probabilistic predictions of storm surge events.pdf

SorooshMani-NOAA · 2023-03-07T16:40:36Z

Todo list:
Soroosh:

Find the issue with the latest 30 member ensemble -> a 15 mem new run worked fine!
Add WWM capability (Add support for SCHISM wave model oceanmodeling/ondemand-storm-workflow#7)
Add capability to run forecast track (instead of only best track) (https://github.com/SorooshMani-NOAA/odssm-infra-private/pull/27)
Setup SCHISM with symmetric parametric (Using different hurricane models from PaHM schism-dev/schism#100, Use Symmetric Model for PaHM in the Workflow oceanmodeling/ondemand-storm-workflow#10)

William:

How to perturb/analyze GAHM?

SorooshMani-NOAA · 2023-03-08T13:27:15Z

@WPringle when I download data for all the past storms from stormevents it looks like there are only two advisory_numbers, 01 and 03, which I think is different from that actual advisory ID that is based on the forecast start time. This comes from the ATCF track files. Based on this, here's how I'd like to get the 48hr forecast data for past storms using stormevents:

First get all the OFCL tracks for the storm using VortexTrack.from_storm_name by passing advisories=['OFCL'] and file_deck='a'
Next get the underlying dataframe using either .data or .unfiltered_data
Find the intersection of the acquired dataframe with the shape of US (from NaturalEarth_Lowres)
For each track_start_date (eqv. to date from ATCF), get the first datetime (eqv. to date from ATCF + forecast hour) that the track point falls on land.
In the result of above operation get the difference of datetime and track_start_date
Get the last entry whose time diff is equal or immediately greater than 2 days to be the "48-hour forecast"
Use the track with the start time found in step 6 for perturbing.

What do you think of this approach?

WPringle · 2023-03-08T15:06:31Z

That sounds good to me @SorooshMani-NOAA. What do you mean by only two advisory_numbers?

SorooshMani-NOAA · 2023-03-08T15:22:07Z

If I get the tracks using:

tracks = stormevents.nhc.track.VortexTrack.from_storm_name('florence', 2018, file_deck='a')

then this would be the result of tracks.data

      basin storm_number            datetime advisory_number advisory  ... isowave_radius_for_NWQ  isowave_radius_for_SWQ  extra_values                    geometry    track_start_time
0        AL           06 2018-08-29 06:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-10.90000 12.90000) 2018-08-30 06:00:00
1        AL           06 2018-08-29 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-12.30000 12.90000) 2018-08-30 06:00:00
2        AL           06 2018-08-29 18:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-13.80000 12.90000) 2018-08-30 06:00:00
3        AL           06 2018-08-30 00:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-15.40000 12.90000) 2018-08-30 06:00:00
4        AL           06 2018-08-30 06:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-17.00000 12.90000) 2018-08-30 06:00:00
...     ...          ...                 ...             ...      ...  ...                    ...                     ...           ...                         ...                 ...
10229    AL           06 2018-09-18 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10230    AL           06 2018-09-18 12:00:00              01     CARQ  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10231    AL           06 2018-09-18 12:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-73.30000 42.20000) 2018-09-18 12:00:00
10232    AL           06 2018-09-18 15:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-71.90000 42.60000) 2018-09-18 12:00:00
10233    AL           06 2018-09-19 00:00:00              03     OFCL  ...                    NaN                     NaN          <NA>  POINT (-67.70000 43.50000) 2018-09-18 12:00:00

There's an advisory_number in the columns. Before looking at the values I thought this is just a number counting advisories as they are issued, but this seems to mean something different.

WPringle · 2023-03-08T15:47:38Z

yeah maybe that is being parsed incorrectly or the column in ALCF meant something else

SorooshMani-NOAA · 2023-03-10T16:18:23Z

@WPringle following our discussion about getting forecast track this is where I am:

I'm using the approach described in this comment to get the OFCL forecast track at ~48hr prior to landfall.
I realized that when one tries to fetch the data from the VortexTrack object, it automatically runs the correction function if OFCL is inside the requested advisories
PySCHISM and ADCIRCPy rely on BEST track for their forcing type, so if I use a forecast OFCL track, I need to pass it as a fake BEST. Due to how StormEvents triggers the correction I also need to do some gymnastics to do this:
- I get the OFCL and CARQ tracks with track_start_time at ~48hr before landfall
- Then I write it to a file to force running the correction algorithm for OFCL and I only request OFCL advisory so that after the correction CARQ is automatically removed (some of it just relates to StormEvents implementation details)
- Then I load this new file as a VortexTrack and change advisory name from OFCL to BEST and also set forecast_hours all to 0 (since I noticed that's how it is in actual best track as written by StormEvents)
- Then I write this as my fake best track to be used by perturbation and SCHISM.

After this I still get a wrong central pressure and r-max for some of the entries; for example, this is one such output (before I set forecast hours to 0) for Florence 2018:

AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  34, NEQ,  170,  150,  110,  140,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  50, NEQ,  100,   90,   70,   80,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091306, 03, BEST,   0, 325N,  743W,   95,    0, HU,  64, NEQ,   70,   60,   50,   60,    1,    0,   0, 115,   0,    ,   0, DPB,   0,   0,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  34, NEQ,  170,  150,  110,  140,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  50, NEQ,  100,   90,   70,   80,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091309, 03, BEST,   3, 328N,  747W,   95,  956, HU,  64, NEQ,   70,   60,   50,   60,  957,    0,   0, 115,   0,    ,   0, DPB, 312,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  34, NEQ,  170,  150,  110,  130,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  50, NEQ,  100,   90,   70,   80,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091318, 03, BEST,  12, 337N,  761W,   95,    0, HU,  64, NEQ,   70,   60,   50,   60,    1,    0,   0, 115,   0,    ,   0, DPB, 308,   5,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  34, NEQ,  160,  150,  100,   90,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  50, NEQ,   90,   80,   60,   50,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091406, 03, BEST,  24, 342N,  774W,   90,    0, HU,  64, NEQ,   60,   60,   40,   35,    1,    0,   0, 110,   0,    ,   0, DPB, 295,   3,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  34, NEQ,  150,  150,  100,   80,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  50, NEQ,   80,   80,   50,   40,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091418, 03, BEST,  36, 343N,  784W,   70,    0, HU,  64, NEQ,   50,   50,   30,   20,    1,    0,   0,  85,   0,    ,   0, DPB, 277,   2,           ,,,,,,,,
AL, 06, 2018091506, 03, BEST,  48, 341N,  792W,   50,    0, TS,  34, NEQ,  140,  140,   90,   70,    1,    0,   0,  60,   0,    ,   0, DPB, 253,   2,           ,,,,,,,,
AL, 06, 2018091506, 03, BEST,  48, 341N,  792W,   50,    0, TS,  50, NEQ,   60,   70,   40,    0,    1,    0,   0,  60,   0,    ,   0, DPB, 253,   2,           ,,,,,,,,
AL, 06, 2018091606, 03, BEST,  72, 339N,  812W,   30,    0, TD,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  40,   0,    ,   0, DPB, 264,   2,           ,,,,,,,,
AL, 06, 2018091706, 03, BEST,  96, 354N,  833W,   25,    0, TD,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  35,   0,    ,   0, DPB, 311,   3,           ,,,,,,,,
AL, 06, 2018091806, 03, BEST, 120, 395N,  810W,   20,    0, LO,  34, NEQ,    0,    0,    0,    0,    1,    0,   0,  30,   0,    ,   0, DPB,  23,   6,           ,,,,,,,,

for reference, this is part of an actual best track from a previous successfully run ensemble:

AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  34, NEQ,  170,  150,  110,  140, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  50, NEQ,  100,   90,   70,   80, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091306,   , BEST,   0, 324N,  742W,  100,  955, HU,  64, NEQ,   70,   60,   50,   60, 1010,  200,  20, 115,   0,   L,   0,    , 317,   6,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  34, NEQ,  170,  150,  120,  140, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  50, NEQ,  100,   90,   80,   80, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091312,   , BEST,   0, 331N,  751W,   95,  954, HU,  64, NEQ,   70,   60,   50,   60, 1011,  200,  20, 115,   0,   L,   0,    , 313,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  34, NEQ,  170,  150,  120,  140, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  50, NEQ,  100,   90,   80,   80, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091318,   , BEST,   0, 336N,  760W,   90,  953, HU,  64, NEQ,   70,   60,   50,   60, 1011,  200,  20, 110,   0,   L,   0,    , 304,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  34, NEQ,  170,  150,  130,  100, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  50, NEQ,  100,   80,   80,   70, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091400,   , BEST,   0, 340N,  765W,   90,  952, HU,  64, NEQ,   70,   60,   50,   50, 1012,  200,  20, 105,   0,   L,   0,    , 314,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  34, NEQ,  170,  150,  130,  100, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  50, NEQ,  100,   80,   80,   70, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091406,   , BEST,   0, 342N,  772W,   85,  952, HU,  64, NEQ,   70,   60,   60,   50, 1012,  200,  20, 100,   0,   L,   0,    , 289,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  34, NEQ,  170,  150,  140,   90, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  50, NEQ,  100,   80,   80,   60, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091411, 15, BEST,   0, 342N,  778W,   80,  956, HU,  64, NEQ,   70,   60,   60,   40, 1012,  200,  25,   0,   0,    ,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  34, NEQ,  170,  150,  140,   80, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  50, NEQ,  100,   80,   80,   40, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091412,   , BEST,   0, 341N,  779W,   80,  957, HU,  64, NEQ,   60,   60,   60,   20, 1012,  200,  25,  90,   0,   L,   0,    , 220,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  34, NEQ,  150,  130,  120,   70, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  50, NEQ,   90,   70,   60,   30, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091418,   , BEST,   0, 340N,  784W,   65,  969, HU,  64, NEQ,    0,   30,   30,    0, 1012,  200,  30,  80,   0,   L,   0,    , 257,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091500,   , BEST,   0, 339N,  788W,   60,  978, TS,  34, NEQ,  150,  150,  100,   60, 1013,  210,  30,  65,   0,   L,   0,    , 253,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091500,   , BEST,   0, 339N,  788W,   60,  978, TS,  50, NEQ,   70,   70,   50,   30, 1013,  210,  30,  65,   0,   L,   0,    , 253,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091506,   , BEST,   0, 337N,  793W,   55,  986, TS,  34, NEQ,  150,  150,   90,   50, 1013,  210,  50,  60,   0,   L,   0,    , 245,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091506,   , BEST,   0, 337N,  793W,   55,  986, TS,  50, NEQ,   70,   70,    0,    0, 1013,  210,  50,  60,   0,   L,   0,    , 245,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091512,   , BEST,   0, 336N,  795W,   55,  992, TS,  34, NEQ,  150,  130,   80,   40, 1013,  220,  60,  55,   0,   L,   0,    , 239,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091512,   , BEST,   0, 336N,  795W,   55,  992, TS,  50, NEQ,   60,  100,    0,    0, 1013,  220,  60,  55,   0,   L,   0,    , 239,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091518,   , BEST,   0, 336N,  798W,   50,  997, TS,  34, NEQ,  140,  130,    0,    0, 1013,  220, 110,  50,   0,   L,   0,    , 270,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091518,   , BEST,   0, 336N,  798W,   50,  997, TS,  50, NEQ,    0,  110,    0,    0, 1013,  220, 110,  50,   0,   L,   0,    , 270,   1,   FLORENCE,,,,,,,,
AL, 06, 2018091600,   , BEST,   0, 336N,  802W,   45,  998, TS,  34, NEQ,  130,  130,    0,    0, 1013,  240, 110,  50,   0,   L,   0,    , 270,   2,   FLORENCE,,,,,,,,
AL, 06, 2018091606,   , BEST,   0, 336N,  808W,   40,  999, TS,  34, NEQ,  130,  130,    0,    0, 1013,  260, 110,  40,   0,   L,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091612,   , BEST,   0, 336N,  815W,   35, 1002, TS,  34, NEQ,    0,  140,    0,    0, 1013,  280, 140,  40,   0,   L,   0,    , 270,   3,   FLORENCE,,,,,,,,
AL, 06, 2018091618,   , BEST,   0, 341N,  821W,   30, 1006, TD,   0,    ,    0,    0,    0,    0, 1013,  300, 140,  40,   0,   L,   0,    , 315,   4,   FLORENCE,,,,,,,,
AL, 06, 2018091700,   , BEST,   0, 350N,  822W,   25, 1007, TD,   0,    ,    0,    0,    0,    0, 1013,  320, 150,  35,   0,   L,   0,    , 355,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091706,   , BEST,   0, 364N,  826W,   25, 1008, TD,   0,    ,    0,    0,    0,    0, 1013,  340, 160,  35,   0,   L,   0,    , 347,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091712,   , BEST,   0, 378N,  822W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,  30,   0,   L,   0,    ,  13,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091718,   , BEST,   0, 388N,  820W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,  30,   0,   L,   0,    ,   9,   5,   FLORENCE,,,,,,,,
AL, 06, 2018091800,   , BEST,   0, 395N,  805W,   25, 1008, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 160,   0,   0,    ,   0,    ,  59,   7,   FLORENCE,,,,,,,,
AL, 06, 2018091806,   , BEST,   0, 413N,  768W,   25, 1007, EX,   0,    ,    0,    0,    0,    0, 1013,  360, 170,   0,   0,    ,   0,    ,  56,  17,   FLORENCE,,,,,,,,
AL, 06, 2018091812,   , BEST,   0, 422N,  733W,   25, 1006, EX,  34, NEQ,    0,    0,    0,    0, 1013,  360, 180,  30,   0,   L,   0,    ,  70,  14,   FLORENCE,,,,,,,,

I'm still investigating what is actually causing the error in SCHISM, but is there anything that you see in the above example? I'm assuming the wrong pressures would maybe sufficient to throw the simulation off!

When you get a change, can you please review the correction done to the OFCL using CARQ here and see if it makes sense? Maybe we should review the StormEvents correction scheme together soon. If you notice anything obvious there, can you please create and Issue on that repo to follow up?

UPDATE

I just saw the issue in the logs of SCHISM from PAHM:

InitLogging not called :: ProcessAsymmetricVortexData: Central pressure set to zero on first line/record when processing the best track file: hurricane-track.dat

WPringle · 2023-03-10T17:23:20Z

@SorooshMani-NOAA This looks very promising, nice job!
The problem you are encountering is associated with the fact that the advisories only provide the pressures at the 3-hr forecast and I think the Rmax also needs to be taken from initial time!

In my methodology the initial Rmax, maybe this is from CARQ, is assumed for all forecast times in the advisory.
The 0-hr pressure is also taken from the initial CARQ value.
The >3hr forecast pressures are computed as a function of Vmax by keeping Holland B constant. i.e., calculate Holland B for the 3-hr forecast and use that to get the pressures for all the forecast times in the future.

WPringle · 2023-03-10T17:25:41Z

@SorooshMani-NOAA This is one of big reasons why the Rmax methodology sucks because we although have a lot of information of Vmax and r34, r50, r64 we have no estimate of Rmax in the future. So better off to use r34, r50, r64 in a GAHM implementation

SorooshMani-NOAA · 2023-03-10T19:38:34Z

@WPringle, thanks. Do you have any suggestion for how to resolve this issue based on what we get above? I looked at the CARQ for the same track_start_time and the pressure is 0 there too.

WPringle · 2023-03-10T19:46:17Z

@SorooshMani-NOAA Do you get Rmax in the CARQ?

I think I remember now that actually for the pressure at the 0-hr I used the pressure from the 3-hr in the previous advisory (3 hours prior). If that is too complicated then we can fill in 0-hr using same methodology as the forecast times.

SorooshMani-NOAA · 2023-03-10T20:08:09Z

@WPringle I believe that should be what's happening. I'm just using StormEvents and I know that the correction code referred to above is being triggered. I believe based on the current StormEvents logic the pressure is indeed taken from 0-hr (the first entry of CARQ for the same track start time).

I can manually modify some of the values in my workflow script to try different things., e.g. I can try setting it to 3-hr forecast pressure. Do you want us to change the logic in the StormEvents repo as well if it works when I try here?

SorooshMani-NOAA · 2023-03-10T20:12:05Z

Interesting ... I just noticed this in my dataframe, looking only at certain columns:

8909     CARQ            -24 2018-09-13 06:00:00 2018-09-12 06:00:00               0.0
8910     CARQ            -18 2018-09-13 06:00:00 2018-09-12 12:00:00               0.0
8911     CARQ            -12 2018-09-13 06:00:00 2018-09-12 18:00:00               0.0
8912     CARQ             -6 2018-09-13 06:00:00 2018-09-13 00:00:00               0.0
8913     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0
8914     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0
8915     CARQ              0 2018-09-13 06:00:00 2018-09-13 06:00:00             956.0                                                                                                                    9020     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9021     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9022     OFCL              0 2018-09-13 06:00:00 2018-09-13 06:00:00               0.0                                                                                                                    9023     OFCL              3 2018-09-13 06:00:00 2018-09-13 09:00:00             956.0                                                                                                                    9024     OFCL              3 2018-09-13 06:00:00 2018-09-13 09:00:00             956.0

I think the logic in StormEvents gets the first entry meaning -24 hour. It needs to be changed to take the hour 0 forecast instead ...

WPringle · 2023-03-10T20:12:31Z

@SorooshMani-NOAA yes, what you just showed sounds good if that can be corrected

SorooshMani-NOAA · 2023-03-10T22:13:02Z

@WPringle I just realized that in the original unprocessed ATCF files, the pressure fields are not NA or empty, they are just 0. That's why StormEvents doesn't actually fix them, since it only updates the fields that are NA! I'll create a ticket on StormEvents side to keep track of updating this issue.

Update
These are the tickets:

OFCL correction by CARQ doesn't work for non-empty 0 fields oceanmodeling/StormEvents#64
OFCL correction with CARQ should use "hr-0" forecast oceanmodeling/StormEvents#65

Update 2
This is another culprit:

OFCL correction by CARQ bug of discarding corrected dataframe oceanmodeling/StormEvents#66

The correction is calculated, but never stored.

SorooshMani-NOAA changed the title ~~Bayesian analysis for calibration of input perturbation~~ Bayesian analysis for calibration of input parameters Sep 13, 2022

SorooshMani-NOAA assigned WPringle and SorooshMani-NOAA Sep 13, 2022

SorooshMani-NOAA added the enhancement New feature or request label Sep 13, 2022

SorooshMani-NOAA unassigned WPringle Sep 13, 2022

Bayesian analysis for calibration of input parameters #88

Bayesian analysis for calibration of input parameters #88

Comments

SorooshMani-NOAA commented Sep 13, 2022 • edited Loading

SorooshMani-NOAA commented Sep 13, 2022

WPringle commented Sep 13, 2022

WPringle commented Sep 13, 2022

SorooshMani-NOAA commented Oct 3, 2022

SorooshMani-NOAA commented Oct 3, 2022

WPringle commented Oct 3, 2022

SorooshMani-NOAA commented Oct 3, 2022

SorooshMani-NOAA commented Oct 3, 2022

WPringle commented Oct 4, 2022

SorooshMani-NOAA commented Oct 4, 2022

SorooshMani-NOAA commented Oct 18, 2022

WPringle commented Oct 18, 2022

SorooshMani-NOAA commented Oct 18, 2022

WPringle commented Oct 18, 2022

SorooshMani-NOAA commented Oct 31, 2022

SorooshMani-NOAA commented Oct 31, 2022

WPringle commented Nov 16, 2022

SorooshMani-NOAA commented Nov 16, 2022

SorooshMani-NOAA commented Nov 16, 2022 • edited Loading

SorooshMani-NOAA commented Nov 16, 2022

SorooshMani-NOAA commented Nov 16, 2022

WPringle commented Nov 16, 2022 • edited Loading

SorooshMani-NOAA commented Nov 18, 2022

SorooshMani-NOAA commented Nov 18, 2022

SorooshMani-NOAA commented Nov 21, 2022 • edited Loading

SorooshMani-NOAA commented Dec 22, 2022 • edited Loading

WPringle commented Dec 22, 2022

WPringle commented Jan 3, 2023 • edited Loading

WPringle commented Jan 17, 2023

WPringle commented Jan 18, 2023 • edited Loading

SorooshMani-NOAA commented Jan 31, 2023

WPringle commented Jan 31, 2023 via email • edited Loading

SorooshMani-NOAA commented Jan 31, 2023

WPringle commented Jan 31, 2023

SorooshMani-NOAA commented Feb 1, 2023

WPringle commented Feb 1, 2023

SorooshMani-NOAA commented Feb 1, 2023

SorooshMani-NOAA commented Feb 21, 2023 • edited Loading

WPringle commented Feb 21, 2023

WPringle commented Mar 7, 2023 • edited Loading

WPringle commented Mar 7, 2023

SorooshMani-NOAA commented Mar 7, 2023 • edited Loading

SorooshMani-NOAA commented Mar 8, 2023

WPringle commented Mar 8, 2023 • edited Loading

SorooshMani-NOAA commented Mar 8, 2023

WPringle commented Mar 8, 2023

SorooshMani-NOAA commented Mar 10, 2023 • edited Loading

WPringle commented Mar 10, 2023

WPringle commented Mar 10, 2023

SorooshMani-NOAA commented Mar 10, 2023

WPringle commented Mar 10, 2023 • edited Loading

SorooshMani-NOAA commented Mar 10, 2023

SorooshMani-NOAA commented Mar 10, 2023

WPringle commented Mar 10, 2023 • edited Loading

SorooshMani-NOAA commented Mar 10, 2023 • edited Loading

SorooshMani-NOAA commented Sep 13, 2022 •

edited

Loading

SorooshMani-NOAA commented Nov 16, 2022 •

edited

Loading

WPringle commented Nov 16, 2022 •

edited

Loading

SorooshMani-NOAA commented Nov 21, 2022 •

edited

Loading

SorooshMani-NOAA commented Dec 22, 2022 •

edited

Loading

WPringle commented Jan 3, 2023 •

edited

Loading

WPringle commented Jan 18, 2023 •

edited

Loading

WPringle commented Jan 31, 2023 via email •

edited

Loading

SorooshMani-NOAA commented Feb 21, 2023 •

edited

Loading

WPringle commented Mar 7, 2023 •

edited

Loading

SorooshMani-NOAA commented Mar 7, 2023 •

edited

Loading

WPringle commented Mar 8, 2023 •

edited

Loading

SorooshMani-NOAA commented Mar 10, 2023 •

edited

Loading

WPringle commented Mar 10, 2023 •

edited

Loading

WPringle commented Mar 10, 2023 •

edited

Loading

SorooshMani-NOAA commented Mar 10, 2023 •

edited

Loading