Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

Open
mattssca opened this issue Dec 12, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@mattssca
Copy link
Contributor

It was discovered that these two data sets are compiled differently.

grch37_gene_coordinates
This object is being compiled using a bundled .tsv file with gene coordinates in the stated projection. This file can be found here. In DATASET.R under data-raw the bundled R object is created by simply reading in the .tsv file and saving the .Rda to the data folder.

hg38_gene_coordinates
For this object, there is no .tsv file available in the repo. Instead, for this projection, this package creates the bundled data object by first "curling" the gtf.gz file from ensembl's FTP, then using rtracklayer to import it to R, perform some data wrangling and lastly saving the .Rda object to the data folder. See these lines in DATASET.R for more info on how this is done.

Suggested Solution
I think the latter approach should be implemented to compile the grch37_gene_coordinates object as well. This adds reproducibility, traceability and lastly, minimizes the local footprint of the repo (avoiding bundling big .tsv files with gene coordinates). Currently, there is no information as of where the grch37_gene_coordinates.tsv is coming from, what version it is or how it was compiled.

@mattssca mattssca added the enhancement New feature or request label Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant