-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which system codes for ChEMBL? #7
Comments
The rule I always applied to datasource system codes was
So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code. As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones Christian From: Stian Soiland-Reyes [notifications@github.com] In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11: ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl? See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 This (luckily) causes the IdentifersOrgReaderTest test to fail with: Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent. At Identifiers.org we find the names
(but nothing for molecules, assays or target component) Cc is already used by CCDS. After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
CamelCasing here mimics other entries like EnMm (Ensembl Mouse). Views? — |
While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here: http://sourceforge.net/p/identifiers-org/new-collection/new/ Anders ----- Original Message -----
|
I also support the use of identifiers.orghttp://identifiers.org codes here. Alasdair On 9 September 2015 at 19:25:25, Christian Y. Brenninkmeijer (notifications@github.commailto:notifications@github.com) wrote: While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here: http://sourceforge.net/p/identifiers-org/new-collection/new/ Anders ----- Original Message -----
— Alasdair J G Gray We invite research leaders and ambitious early career researchers to Heriot-Watt University is a Scottish charity |
and ensure main URL is correct This is a workaround for #16
also adds chembl.target This fixes issue #16 and brings (at least these) system codes in line with identifiers.org It is a bit longer than "Cl" and is arguably not a "short code", but at least now it also is a bit more recognizable. Is this controversial/breaking change (e.g. new major version)?
also adds chembl.target This fixes issue #16 and brings (at least these) system codes in line with identifiers.org It is a bit longer than "Cl" and is arguably not a "short code", but at least now it also is a bit more recognizable. Is this controversial/breaking change (e.g. new major version)?
My proposed pull request bridgedb/BridgeDb#20 is raised as discussion point to settle this according to what you said:
and adding the two first of these to datasource.txt of org.bridgedb.bio For the http://linkedchemistry.info/ identifiers I can't find have any direct equivalent in Chembl, so I've renamed the confusing "chemblTarget" and "chemblMolecule" etc to |
As a side-aspect of #16 - the legacy http://linkedchemistry.info/ identifiers don't have an equivalent on https://www.ebi.ac.uk/chembl/ so I've renamed their system codes to linkedchemistry.chembl.*
See also bridgedb/BridgeDb#21 - I really struggle to do any kind of change on this. |
@stain let's Skype chat in the coming week? |
In commit ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt:
Using system code
Cl
here clashes with the equivalent entry in org.bridgedb.rdf, which usesChEMBLCompound
- what was the reason for going withCl
?See both IdentifiersOrgDataSource.ttl and in IdentifiersOrgDataSource.txt
This (luckily) causes the
IdentifersOrgReaderTest
test to fail with:The system codes used for ChEMBL within IdentifiersOrgDataSource.txt are not ideal:
Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.
At Identifiers.org we find the names
(but nothing for molecules, assays or target component)
Cc
is already used by CCDS.After discussing this with @egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:
ChC
(ChEMBL compound)ChT
(ChEMBL target)ChTC
(ChEMBL Target Component) -- orChP
for "protein"?CamelCasing here mimics other entries like
EnMm
(Ensembl Mouse).Views?
The text was updated successfully, but these errors were encountered: