-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fragment LC tests #947
Fragment LC tests #947
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #947 +/- ##
===========================================
- Coverage 78.42% 26.42% -52.00%
===========================================
Files 10 9 -1
Lines 1794 1680 -114
===========================================
- Hits 1407 444 -963
- Misses 387 1236 +849 ☔ View full report in Codecov by Sentry. |
mocker.patch( | ||
'cpg_workflows.inputs.deprecated_create_cohort', | ||
lambda: _mock_cohort(conf.workflow.dataset), | ||
) | ||
|
||
# skip can_reuse, implicit skip of existence checks | ||
mocker.patch('cpg_workflows.large_cohort.combiner.can_reuse', lambda x: False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to create a cohort for all these downstream tests? It's very hard (for me) to work out what's going on with the whole query_command
rubbish and what actually need to be passed to the run
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair point... I don't think so. Ideally all the interactions with cohorts/sequencing groups should be done in the Stages, so the code is run during the runtime when the workflow
is initiated. The changes you made recently to combiner
implement that logic. that makes the interface with the methods way easier (run QC on these samples, instead of find some samples and run QC on them), and makes testing way simpler.
From scraping off additional bits of code from this test suite, I've found another instance of get_multicohort()
being called inside the child/job logic in SampleQC. I'd refactor this to take that info as input, instead of to be determined at runtime.
If you went a step further, the logic under test here usually takes no more than 2/3 distinct entries from configuration, often taking a default value. Those could easily be made into arguments, then the stages don't even need a live configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I did a lot of complaining about the LC tests, here's something productive:
I don't like the LC tests
Improvement
dense.mt
)Benefits
pytest-xdist
intosetup.py
andrequirements-dev.txt
, and set the tests to run in parallel usingpytest -n auto
. This wasn't useful before because 99% of the runtime was this single test, but now it's useful.generate a dense.mt from the stored VDS and compare it to the stored dense.mt, which should be derived from the same source
. Bonus points for digging into the contents of the various MTs and HTs and making sure the content is actually legit before doing those comparisons 😬Hiccups
check-added-large-files
pre-commit hookstart_query_context()
can only be called once, so I've wrapped it in atry: except
so whichever of the parallel processes gets there first creates it, and the rest gracefully failtmp_path
directory generated intest_seqr_loader_dry
is too long to be valid in UNIX, so I've inserted a handrolledtmp_path
which creates a unique folder name, then deletes it upon test completion.tmp_path
includes a bunch of unique character strings. This is then used as the base for the Hailtmp_path
which loads on even more unique strings. The breaking point here was that when running multiple tests in parallel, the pytesttmp_path
also references the worker ID and python instance being used.Other
logging.info(f'Using VEP {vep_version}')
. It's unrelated to anything, but if the seqr_loader test fails it prints the logging to terminal, and this is logged SO MANY times.