s04_stack2 bug fixes + mov_stack improvement #372

asyates · 2024-09-27T13:24:47Z

The following things are fixed with these changes (mentioned in issue #371)

bugfix: reference function creation no longer based only on Todo jobs, now calls get_results_all using datelist generated from build_ref_datelist.
bugfix: boolean overwrite set to false in xr_save_ccf to avoid overwriting of previous STACKS output with only Todo jobs.
when computing moving stacks (-m), gaps in the day list (generated from current jobs) are identified and filled with dates necessary to correctly apply rolling mean, taking into account the mov_stack size.

e.g. for days = ['2006-02-01', '2006-02-02', '2006-02-03', '2006-02-10'] and stack size (max(mov_rolling)) of two days, the days '2006-01-30', '2006-01-31', '2006-02-08', and '2006-02-09' would be added prior to calling get_results_all.

ie.

                # Calculate the maximum mov_rolling value (in days)
                max_mov_rolling = max(pd.to_timedelta(mov_stack[0]).total_seconds() for mov_stack in mov_stacks)
                max_mov_rolling_days = max(1, max_mov_rolling / 86400)

                days = list(days)
                days.sort()
                days = [datetime.datetime.strptime(day, '%Y-%m-%d') for day in days]
                day_diffs = np.diff(days)
                gaps = [i+1 for i, diff in enumerate(day_diffs) if diff.days > 1] #get index of days with gaps
                gaps.insert(0,0) #zero index also 'gap' (need previous data for stacking)

                all_days = list(days)
                added_days = [] #keep track of days included for eventual removal pre-saving CCFs

                for gap_idx in gaps:
                    start = days[gap_idx]
                    #Add preceding days
                    for j in range(1, max_mov_rolling_days+1):
                        preceding_day = start - datetime.timedelta(days=j)
                        if preceding_day not in all_days:
                            all_days.append(preceding_day)
                            added_days.append(preceding_day)

                added_dates = pd.to_datetime(added_days).values
                c = get_results_all(db, sta1, sta2, filterid, components, all_days, format="xarray") #get ccfs needed for -m stacking

These additional days are then removed prior to saving via:

                   mask = xx.times.dt.floor('D').isin(added_dates)
                   xx_cleaned = xx.where(~mask, drop=True) #remove days not associated with current jobs

Will use this same branch to implement wiener filter... but these changes/fixes more fundamental.

asyates · 2024-10-02T09:36:49Z

Added:

reference function no longer uses jobs i.e. running msnoise cc stack -r will always compute reference function regardless of whether STACK jobs are Todo or Done.

To do:

Add wiener filter. I think for now will do this without SVD component and test... as im not sure how to implement SVD decomp cleanly in a way that all data is processed 'equally' i.e. number of eigenvectors available could vary a lot depending how many CCFs being processed.
update documentation

ThomasLecocq · 2024-10-02T13:14:44Z

re:doc : this documentation should work - but you'll need to set up a biggy test folder with some data (esp. for the examples to build) - My idea (didn't have time yet) is to provide two big "pooch" payloads 1: with only the raw test data & the recipe to build then same as 2: the environment having all the data processed for doing the examples etc

asyates · 2024-10-03T16:02:26Z

Okay think working nicely now. Did quite a different tests to check working properly with gaps, processing jobs in stages rather than all together, etc.

As said, no svd component for now as unsure how will work cleanly if not processing all data at same time (i.e. to ensure data processed equaly.

Functionality of wiener filter right now is:

if wiener filter applied, will also pad with previous/future 2*M of CCFs in addition to current jobs, where M is the smoothing duration in datetime axis.
checks for 'continuous' CCFs to apply wiener to. Gaps less than length of M will be 'ignored' i.e. wiener filter will consider CCFs adjacent. Otherwise, wiener applied separately on different groups of adjacent stacks. Note, gaps are restored post-wiener (pre-stacking).
for saving, the first M duration of pre-job stacks (previously pulled in for purpose of stacking) are removed, as at 'edge' of 'image', so less neighbouring points to use in filter. The second M duration of pre-job stacks stay, however, and overwrite the previous saved stacks. Idea is that this previous data, if processing real-time for example, may not have had neighbouring (future) points previously... but now can be updated based on data that has come in (to be consistent with other data processed).

Users set three params in config:

wienerfilt: bool, False by default)
wiener_Mlen: str (timedelta), smoothing in datetime axis
wiener_Nlen: str (timedelta), smoothing in lagtime axis

Quick example showing final dv/v for one month of data at Ruapehu, 2d stacks

Still to do: documentation

asyates · 2024-10-04T14:36:47Z

A couple smaller changes, and started adding documenting (just in s04_stack2 for now).

Ended up going down rabbit hole regarding how much padding with data outside of Todo jobs actually reduces edge influence (even if you pad well belong the width of the filter, you can still get subtle differences propagate to the middle of the 'image' despite neighbouring points not changing).

Given that, I added a warning in the documentation that caution should be exercised if processing data in steps (i.e. not all at the same time).

Below is an example where i tested processing dv/v after applying wiener to all data together, and data in different stages (cutoff indicated by dashed-lines, so for example.... end of month is simulating reading in 1-day of new data each time). Similar, but some difference.

asyates · 2024-10-04T14:50:24Z

Demonstration that padding doesn't fully prevent values further away from edge of image slightly changing. Top row i am starting adding 2 rows each time (of 1s), e.g. to reflect new CCF data coming in, and then applying wiener filter of length (2,2). Can see that the values corresponding to the original pattern have subtle differences even when adding four or six rows of constant value.

So not sure there is a 'perfect' way to do it, other than maybe having a fixed moving window where we are applying the wiener filter i.e. every N days, so that it is consistent if processing in different stages. But... that would be pretty horrid for computation time i imagine so maybe just having the warning is best for cases where not processing all data together.

asyates added 8 commits September 26, 2024 18:15

stacking including previous N-1 days

7297201

bugs fixed from prev. commit

350cb5a

fix ref stacking to not use only Todo jobs

2301439

fix non-int max_mov_rolling

b48853d

check if daylist already datetime before conversion

1da76a1

fix ref to datetime class

3cd839b

pulled REF stack out of job world

78cf760

minor fix

fe36a7e

asyates added 4 commits October 2, 2024 16:45

wiener filter added pre-stack

1343b64

overwrite stacks that were at edge of prev filter

e08bafb

bugfix

df8ec0c

better handling of gaps wiener

be60c14

better padding + added documentation

909a16f

adding changes to default.csv

2b39e9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s04_stack2 bug fixes + mov_stack improvement #372

s04_stack2 bug fixes + mov_stack improvement #372

asyates commented Sep 27, 2024

asyates commented Oct 2, 2024

ThomasLecocq commented Oct 2, 2024

asyates commented Oct 3, 2024 •

edited

Loading

asyates commented Oct 4, 2024

asyates commented Oct 4, 2024

s04_stack2 bug fixes + mov_stack improvement #372

Are you sure you want to change the base?

s04_stack2 bug fixes + mov_stack improvement #372

Conversation

asyates commented Sep 27, 2024

asyates commented Oct 2, 2024

ThomasLecocq commented Oct 2, 2024

asyates commented Oct 3, 2024 • edited Loading

asyates commented Oct 4, 2024

asyates commented Oct 4, 2024

asyates commented Oct 3, 2024 •

edited

Loading