Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of GeneProtein conflations and fell in 2024oct1 #355

Open
gaurav opened this issue Oct 4, 2024 · 1 comment
Open

Number of GeneProtein conflations and fell in 2024oct1 #355

gaurav opened this issue Oct 4, 2024 · 1 comment

Comments

@gaurav
Copy link
Collaborator

gaurav commented Oct 4, 2024

2024oct1 has fewer gene-protein conflations (19,701,538) than 2024aug18 (21,431,316) and slightly fewer info-content values (3,345,015) than 2024aug18 (3,346,582). We should figure out why this is.

@gaurav
Copy link
Collaborator Author

gaurav commented Oct 7, 2024

I traced one example back, and found that the gene identifier (NCBIGene:9736071) was no longer present in gene_info.gz (as downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz), presumably because it was "discontinued on 23-Aug-2024". We previously associated this with UniProtKB:E0SDS8, which is still present in our database (but trying a GeneProtein conflation on this will simply return the gene). More information about prokaryotic genes discontinued by NCBI: https://www.ncbi.nlm.nih.gov/refseq/about/prokaryotes/faq/#FAQ1

I randomly checked around ~10 genes that have been deleted and they were all bacterial genes discontinued in late August.

We could try to use a previous gene_info.gz file so that we don't lose this information, but that seems dumb. If these identifiers are mostly to do with prokaryotic identifiers, they're unlikely to have an effect on Translator, and I assume eventually UniProtKB will update their IDs. But when we have a bit of free time it might be worth looking into the gene history files to see if we can find new mappings for those identifiers for a future Babel release.

@gaurav gaurav added this to the Issues needing investigation milestone Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant