-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long InChI codes crash the database refresh #401
Comments
Do we have any idea what is the longest InChI possible ? Todays InChI is defined as max, 1000 atoms, I read somewhere about extension to 65K. The longest in |
How is that InChIKey valid? It has too many sections? (copy paste issue?) The URL redirects OK tho (DFYPFJSPLUVPFJ-QJEDTDQSSA-N) I thought we trimmed PCL to ~2000 but it seems that's sneaking through (MW 8000)? @PaulThiessen might be able to answer the InChI length question for you, I am not sure ... |
I'm not actually sure about atom limits in regular InChI, but PubChem has a limit of 999 atoms (including H) for compounds (historically because that's the limit of the MOL/SDF V2000 format). I don't think there's any particular length limit for the full InChI string. The longest one in PubChem is 4789 characters (CID 160332983). |
Indeed the visible InChIkey was cut&paste leftover. Fixed now. |
That number is surely not coincidental ... @PaulThiessen do you know if that changed in more recent versions (that documentation was 1.04, you're now on 1.06 or 1.07 right?). I never get those log files when generating InChIs ... |
We're using 1.06, although 1.07 is in the works and will be out soon. I'll ask the InChI folks directly what the current atom limit is. |
Ok yes standard InChI in current versions still has a limit of 1024 atoms. |
Thanks Paul! |
For records with very long InChI codes, the importer doesn't fail gracefully. No validation problems are encountered, but the import crashes while trying to write the InChI code to the DB. As a result, zero records end up in the DB. CH_IUPAC is a VARCHAR(1200).
Expected behaviour:
Find attached a record set of five records where one causes this problem. Note: This is a work in progress dataset used in-house and derived from Florian Huber's dataset https://zenodo.org/records/10160791 (I hope this note and the CC BY in the records fulfill the CC BY requirements...)
records.tar.gz
The text was updated successfully, but these errors were encountered: