You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was discovered that these two data sets are compiled differently.
grch37_gene_coordinates
This object is being compiled using a bundled .tsv file with gene coordinates in the stated projection. This file can be found here. In DATASET.R under data-raw the bundled R object is created by simply reading in the .tsv file and saving the .Rda to the data folder.
hg38_gene_coordinates
For this object, there is no .tsv file available in the repo. Instead, for this projection, this package creates the bundled data object by first "curling" the gtf.gz file from ensembl's FTP, then using rtracklayer to import it to R, perform some data wrangling and lastly saving the .Rda object to the data folder. See these lines in DATASET.R for more info on how this is done.
Suggested Solution
I think the latter approach should be implemented to compile the grch37_gene_coordinates object as well. This adds reproducibility, traceability and lastly, minimizes the local footprint of the repo (avoiding bundling big .tsv files with gene coordinates). Currently, there is no information as of where the grch37_gene_coordinates.tsv is coming from, what version it is or how it was compiled.
The text was updated successfully, but these errors were encountered:
It was discovered that these two data sets are compiled differently.
grch37_gene_coordinates
This object is being compiled using a bundled
.tsv
file with gene coordinates in the stated projection. This file can be found here. In DATASET.R under data-raw the bundled R object is created by simply reading in the.tsv
file and saving the .Rda to the data folder.hg38_gene_coordinates
For this object, there is no
.tsv
file available in the repo. Instead, for this projection, this package creates the bundled data object by first "curling" thegtf.gz
file from ensembl's FTP, then usingrtracklayer
to import it to R, perform some data wrangling and lastly saving the .Rda object to the data folder. See these lines in DATASET.R for more info on how this is done.Suggested Solution
I think the latter approach should be implemented to compile the
grch37_gene_coordinates
object as well. This adds reproducibility, traceability and lastly, minimizes the local footprint of the repo (avoiding bundling big.tsv
files with gene coordinates). Currently, there is no information as of where thegrch37_gene_coordinates.tsv
is coming from, what version it is or how it was compiled.The text was updated successfully, but these errors were encountered: