Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic printCovStats #29

Merged
merged 20 commits into from
Feb 10, 2024

Conversation

alephnull7
Copy link
Collaborator

@alephnull7 alephnull7 commented Feb 9, 2024

The functions verboseInformation and writeTables have been renamed printCovStats and printCovValsAsTable, respectively.

Due to previous work I had done, specifically with the creation and use of the source sequence as quadripRegions when no region partitioning is done, the extension of printCovValsAsTable to include such cases fit in with the way data was already being passed around. To better understand the processes performed in the function and facilitate better maintenance in the future, I refactored and broke up printCovValsAsTable into smaller helper functions, and in the process migrated all of these components of printCovStats into a new file verboseInformation.R. As of now, this is the only function called from PACVr.verboseInformation besides checkIREquality (currently located in IROperations.R), but due to the purpose and amount of code involved, it seemed necessary. On a similar note, a long overdue migration of code related to transforming read.gb data into forms used by PACVr has occurred, and is now housed in read.gb2PACVr.R.
In addition to the above refactoring of printCovValsAsTable, minor changes to the code involved were done, in the service of having the outputted coverage data's column names relate to what "regions" are being analyzed. As of now, this handles the case of quadripartite regions, where the terminology "Chromosome" is used, and when the entire source is used, the term "Source" is applied.
When testing these changes to printCovValsAsTable and the above two cases, apart from the changes in column names and the names used for the sequences, the only difference I could see was in the lowCoverage column for ir_regions. This aligns with my understanding of the coverage analysis done, that except for the assignment of ir_regions$lowCoverage using cov_regions, the region data appears to be used for only referencing, naming, and grouping. That is, it seems that is the only place where the respective subset a sequence is a part of is taken into account or could be taken into account. As a result, the threshold used for ir_regions$lowCoverage is different when the subset is the entire source, compared to what the threshold is when the subset is the quadripartite region the sequence is a part of.

Edit: Due to factoring the creation of the verbose files directory into a separate function getVerbosePath, a distinction between printCovStats and printCovValsAsTable no longer made sense, so the helper function printCovValsAsTable no longer exists.

@alephnull7 alephnull7 linked an issue Feb 9, 2024 that may be closed by this pull request
3 tasks
@alephnull7 alephnull7 changed the title Generic printCovValsAsTable Generic printCovStats Feb 9, 2024
Copy link
Owner

@michaelgruenstaeudl michaelgruenstaeudl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic changes! This makes the code more readable!

@michaelgruenstaeudl michaelgruenstaeudl merged commit 7918dcb into michaelgruenstaeudl:master Feb 10, 2024
5 checks passed
@michaelgruenstaeudl
Copy link
Owner

michaelgruenstaeudl commented Feb 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extension/Revision of function writeTables()
2 participants