Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest changing status history into non-masked array always showing actual status #1420

Open
nordam opened this issue Oct 6, 2024 · 3 comments

Comments

@nordam
Copy link

nordam commented Oct 6, 2024

At present, the array o.history['status'] is a masked array, and for a given particle, the status is always 0 (active), except in the timestep in which it became deactivated, where the status will be for example 1 for seafloor or 2 for stranded. For all later times, the status is -999 (and also masked).

When plotting a mass balance, it is convenient to know the status of a particle also at times after it was deactivated. It would therefore be handy if the status array actually held the status at all times. The status array doesn't have to be a masked array either, as any status other than 0 means the particle is not active. It could also a status number (-1, for example) representing particles that are not seeded yet.

One can of course create the array I describe post hoc, like so:

# Hack the status array to show actual status
firstlast = np.ma.notmasked_edges(o.history['status'], axis=1)
first = firstlast[0][1]
last  = firstlast[1][1]
status = o.history['status'].data
for i in range(Np):
    status[i,last[i]:] = status[i,last[i]]
status[status == -999] = -1

but is there any reason not to create the array like this in the first place?

@knutfrode
Copy link
Collaborator

knutfrode commented Oct 7, 2024

Yes, this is a bit awkward, but for legacy reasons:

Today, an Xarray Dataset would be an obvious choice for the datamodel. However, at the birth of OpenDrift, Xarray was not very mature, and a masked array was chosen.
In the inner datamodel, (self.environment, self.elements as available inside the update() method in any module), these are arrays corresponding to only the elements presently active. Thus the model developer does not need to worry about the bookeeping of which particles have (not yet) been seeded and deactivated etc. This helps keeping the physics/processes of the modules clean and readable - by separating this from the dirty bookeeping (in Basemodel, Basereader etc)..

And only one timestep is kept in memory (though with some caching of 100 timesteps for performance), and flushed to disk after each timestep.
But for performance reason, only active elements are written/appended to disk, thus NaN (or -999) is stored for elements that have been deactivated (not only for status, but for all other properties*).
The internal import-method compensatses by backfilling data, but yes, this is not ideal when using 3rd-pard applications for analysis.

We hope to make a new inner datamodel based on Xarray Dataset quite soon, and then this issue will be eliminated.

@gauteh
Copy link
Member

gauteh commented Oct 8, 2024

@knutfrode Do you still have the experiement from our Mjølfjell-Opendrift-Workshop? I think you created an example with xarray that flushes data to disk on demand and handled by xarray?

@knutfrode
Copy link
Collaborator

I had almost forgotten that test!

Could it be this test-script? Here I am writing to zarr.
https://gist.github.com/knutfrode/20c9a55d564b934ba0a783f476a65ef3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants