Suggest changing status history into non-masked array always showing actual status #1420

nordam · 2024-10-06T18:27:03Z

At present, the array o.history['status'] is a masked array, and for a given particle, the status is always 0 (active), except in the timestep in which it became deactivated, where the status will be for example 1 for seafloor or 2 for stranded. For all later times, the status is -999 (and also masked).

When plotting a mass balance, it is convenient to know the status of a particle also at times after it was deactivated. It would therefore be handy if the status array actually held the status at all times. The status array doesn't have to be a masked array either, as any status other than 0 means the particle is not active. It could also a status number (-1, for example) representing particles that are not seeded yet.

One can of course create the array I describe post hoc, like so:

# Hack the status array to show actual status
firstlast = np.ma.notmasked_edges(o.history['status'], axis=1)
first = firstlast[0][1]
last  = firstlast[1][1]
status = o.history['status'].data
for i in range(Np):
    status[i,last[i]:] = status[i,last[i]]
status[status == -999] = -1

but is there any reason not to create the array like this in the first place?

The text was updated successfully, but these errors were encountered:

knutfrode · 2024-10-07T10:31:46Z

Yes, this is a bit awkward, but for legacy reasons:

Today, an Xarray Dataset would be an obvious choice for the datamodel. However, at the birth of OpenDrift, Xarray was not very mature, and a masked array was chosen.
In the inner datamodel, (self.environment, self.elements as available inside the update() method in any module), these are arrays corresponding to only the elements presently active. Thus the model developer does not need to worry about the bookeeping of which particles have (not yet) been seeded and deactivated etc. This helps keeping the physics/processes of the modules clean and readable - by separating this from the dirty bookeeping (in Basemodel, Basereader etc)..

And only one timestep is kept in memory (though with some caching of 100 timesteps for performance), and flushed to disk after each timestep.
But for performance reason, only active elements are written/appended to disk, thus NaN (or -999) is stored for elements that have been deactivated (not only for status, but for all other properties*).
The internal import-method compensatses by backfilling data, but yes, this is not ideal when using 3rd-pard applications for analysis.

We hope to make a new inner datamodel based on Xarray Dataset quite soon, and then this issue will be eliminated.

gauteh · 2024-10-08T07:18:15Z

@knutfrode Do you still have the experiement from our Mjølfjell-Opendrift-Workshop? I think you created an example with xarray that flushes data to disk on demand and handled by xarray?

knutfrode · 2024-10-08T10:15:30Z

I had almost forgotten that test!

Could it be this test-script? Here I am writing to zarr.
https://gist.github.com/knutfrode/20c9a55d564b934ba0a783f476a65ef3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest changing status history into non-masked array always showing actual status #1420

Suggest changing status history into non-masked array always showing actual status #1420

nordam commented Oct 6, 2024

knutfrode commented Oct 7, 2024 •

edited

Loading

gauteh commented Oct 8, 2024

knutfrode commented Oct 8, 2024

Suggest changing status history into non-masked array always showing actual status #1420

Suggest changing status history into non-masked array always showing actual status #1420

Comments

nordam commented Oct 6, 2024

knutfrode commented Oct 7, 2024 • edited Loading

gauteh commented Oct 8, 2024

knutfrode commented Oct 8, 2024

knutfrode commented Oct 7, 2024 •

edited

Loading