Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zproc run group_spectra vs. coadd_spectra #2383

Merged
merged 1 commit into from
Oct 9, 2024
Merged

zproc run group_spectra vs. coadd_spectra #2383

merged 1 commit into from
Oct 9, 2024

Conversation

sbailey
Copy link
Contributor

@sbailey sbailey commented Oct 8, 2024

This PR implements #2382 to have zproc select whether it needs to run desi_group_spectra (to generate both spectra and coadd files) or just run desi_coadd_spectra (to generate a coadd file from a pre-existing spectra file).

I tested this with a micro-prod in $CFS/desi/users/sjbailey/spectro/redux/l1.

In healpix/main/dark/100 I created dirs for healpix 10000 linking to the kibo spectra file, and healpix 10001 which did not link the spectra file.

In tiles/cumulative/1000/20210517 I created links to spectra files for all petals except petal 0.

I then rank desi_zproc --batch --nosubmit ... to create batch files and ran them for this tile / these healpix. The jobs correctly used the links if they existed (by calling only desi_coadd_spectra), and if they didn't exist it correctly called desi_group_spectra.

This is needed for the Loa production run.

@sbailey sbailey requested a review from akremin October 8, 2024 19:28
@coveralls
Copy link

Coverage Status

coverage: 30.214% (+0.02%) from 30.194%
when pulling c628433 on zproc-coadd
into d8630a4 on main.

Copy link
Member

@akremin akremin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good as always and spot checking the test directory looks good.

The resolution here solves the issue, however, it seems like it's working around the more fundamental issue that group_spectra is overwriting files instead of skipping over that step when a file exists. Other steps of the pipeline don't overwrite files. And if it were resolved with checks in group_spectra then the edits here would be unnecessary.

That being said, this is the more explicit implementation choice since calling group_spectra when we don't need to actually group the spectra is a bit silly. And the changes to group_spectra would need to then read in the spectra file which is then just a reimplementarion of coadd_spectra.

The only potential "harms" I see in this implementation are (1) it's adds complexity to the step and reduces readability and (2) that it doesn't solve the issue of overwriting files if group_spectra is called directly.

I don't have a strong opinion about the above, so I am happy to move forward with this implementation, but I'd like to get a response on the above first.

@sbailey
Copy link
Contributor Author

sbailey commented Oct 9, 2024

Thanks for the comments and raising the issue of when/where do we skip a step due to pre-existing files.

The processing wrapper scripts desi_proc and desi_zproc do input/output checks and skip steps if a file already exists, but we haen't been consistent/explicit about a policy for lower level algorithmic steps like extract spectra, sky subtraction, flux calibration, redrock, and group spectra. I think most of them will overwrite a pre-existing file just like group spectra does, though I wouldn't be surprised if some of them require a --force/--overwrite/--clobber option instead. But even in those cases, they raise an error if asked to write a pre-existing file and require an option to override it, rather than "skipping a step" which I think is only done in desi_proc and desi_zproc. i.e. adding "skip generating the spectra file if it already exists" to group_spectra would be inconsistent with how other lower-level scripts are implemented, thus I implemented that logic in desi_zproc.

The difference here is that in most cases the step is skipped via runcmd input/output checks, while in this case we actually want to call a different function depending upon whether the specta file exists. In an earlier version of the pipeline, we called group_spectra to write the spectra file and then separately called coadd_spectra to coadd it, and in that version of the pipeline this extra logic would not have been necessary since zproc+runcmd would have automatically skipped over pre-existing spectra files. But the time to re-read giant spectra files was non-trivial, so writing the coadd at the same time as grouping was added as a special case, but then also requires this additional special case logic to chose which function to call depending upon whether the spectra files exist.

@akremin
Copy link
Member

akremin commented Oct 9, 2024

Thank you, that was nice summary for posterity of how this differs from other situations. I will merge this now.

@akremin akremin merged commit 41da70a into main Oct 9, 2024
26 checks passed
@akremin akremin deleted the zproc-coadd branch October 9, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants