Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

mattssca · 2023-12-12T18:41:29Z

It was discovered that these two data sets are compiled differently.

grch37_gene_coordinates
This object is being compiled using a bundled .tsv file with gene coordinates in the stated projection. This file can be found here. In DATASET.R under data-raw the bundled R object is created by simply reading in the .tsv file and saving the .Rda to the data folder.

hg38_gene_coordinates
For this object, there is no .tsv file available in the repo. Instead, for this projection, this package creates the bundled data object by first "curling" the gtf.gz file from ensembl's FTP, then using rtracklayer to import it to R, perform some data wrangling and lastly saving the .Rda object to the data folder. See these lines in DATASET.R for more info on how this is done.

Suggested Solution
I think the latter approach should be implemented to compile the grch37_gene_coordinates object as well. This adds reproducibility, traceability and lastly, minimizes the local footprint of the repo (avoiding bundling big .tsv files with gene coordinates). Currently, there is no information as of where the grch37_gene_coordinates.tsv is coming from, what version it is or how it was compiled.

The text was updated successfully, but these errors were encountered:

mattssca added the enhancement New feature or request label Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

mattssca commented Dec 12, 2023

Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

Discrepancies in how the two objects grch37_gene_coordinates and hg38_gene_coordinates are compiled. #48

Comments

mattssca commented Dec 12, 2023