-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic printCovStats
#29
Merged
michaelgruenstaeudl
merged 20 commits into
michaelgruenstaeudl:master
from
alephnull7:master
Feb 10, 2024
Merged
Generic printCovStats
#29
michaelgruenstaeudl
merged 20 commits into
michaelgruenstaeudl:master
from
alephnull7:master
Feb 10, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 tasks
michaelgruenstaeudl
approved these changes
Feb 10, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic changes! This makes the code more readable!
michaelgruenstaeudl
merged commit Feb 10, 2024
7918dcb
into
michaelgruenstaeudl:master
5 checks passed
Hi Greg,
Your recent edits are excellent! Yes, breaking printCovValsAsTable into smaller, more readable functions is a great, which will make debugging – should it ever become necessary – much easier. Also, your interpretation about the region information (i.e., the quadripartite structure info) is correct: it is only used for referencing, naming, and – especially – grouping the coverage values.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The functions
verboseInformation
andwriteTables
have been renamedprintCovStats
andprintCovValsAsTable
, respectively.Due to previous work I had done, specifically with the creation and use of the source sequence as
quadripRegions
when no region partitioning is done, the extension ofprintCovValsAsTable
to include such cases fit in with the way data was already being passed around. To better understand the processes performed in the function and facilitate better maintenance in the future, I refactored and broke upprintCovValsAsTable
into smaller helper functions, and in the process migrated all of these components ofprintCovStats
into a new fileverboseInformation.R
. As of now, this is the only function called fromPACVr.verboseInformation
besidescheckIREquality
(currently located inIROperations.R
), but due to the purpose and amount of code involved, it seemed necessary. On a similar note, a long overdue migration of code related to transformingread.gb
data into forms used byPACVr
has occurred, and is now housed inread.gb2PACVr.R
.In addition to the above refactoring of
printCovValsAsTable
, minor changes to the code involved were done, in the service of having the outputted coverage data's column names relate to what "regions" are being analyzed. As of now, this handles the case of quadripartite regions, where the terminology "Chromosome" is used, and when the entire source is used, the term "Source" is applied.When testing these changes to
printCovValsAsTable
and the above two cases, apart from the changes in column names and the names used for the sequences, the only difference I could see was in thelowCoverage
column forir_regions
. This aligns with my understanding of the coverage analysis done, that except for the assignment ofir_regions$lowCoverage
usingcov_regions
, the region data appears to be used for only referencing, naming, and grouping. That is, it seems that is the only place where the respective subset a sequence is a part of is taken into account or could be taken into account. As a result, the threshold used forir_regions$lowCoverage
is different when the subset is the entire source, compared to what the threshold is when the subset is the quadripartite region the sequence is a part of.Edit: Due to factoring the creation of the verbose files directory into a separate function
getVerbosePath
, a distinction betweenprintCovStats
andprintCovValsAsTable
no longer made sense, so the helper functionprintCovValsAsTable
no longer exists.