Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple problems with the indexcard output format #723

Closed
enoriega opened this issue Jan 12, 2021 · 15 comments
Closed

Multiple problems with the indexcard output format #723

enoriega opened this issue Jan 12, 2021 · 15 comments
Assignees
Labels

Comments

@enoriega
Copy link
Member

There are a few bugs in the indexcard output format which I think are because of changes made to data structures posterior to the creation of the output format.

Below are all the exceptions that appear in the log after a few days running.

java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities!
java.lang.NegativeArraySizeException
java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: ERROR: argument type 'event' not supported!
java.lang.RuntimeException: ERROR: event type conversion not supported!
java.lang.RuntimeException: ERROR: unknown event type: Disease in event:
java.lang.RuntimeException: ERROR: unknown event type: Family in event:
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event:
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event:
java.util.NoSuchElementException: key not found: controlled
java.util.NoSuchElementException: key not found: controller
java.util.NoSuchElementException: key not found: theme
java.util.NoSuchElementException: next on empty iterator

Please find the log file and a couple papers to reproduce it attached to the issue.
error.log
PMC4543788.nxml.txt
PMC5809884.nxml.txt
PMC6086911.nxml.txt

@MihaiSurdeanu Since this output format doesn't seem relevant today, are the errors worth fixing?

@enoriega enoriega added the bug label Jan 12, 2021
@MihaiSurdeanu
Copy link
Contributor

This is a format that really nobody uses anymore. I propose to remove it.
@kwalcock : can you please do it when you get a chance?

@kwalcock
Copy link
Member

Yes. Not having looked at it yet, I wonder whether it is easier to fix than to remove. Last time something was removed, it had to be added back. However, I can certainly follow instructions

@MihaiSurdeanu
Copy link
Contributor

Either way...
But, for historic background, this was a format that was used in an early DARPA eval, and was abandoned after.

@kwalcock
Copy link
Member

Here are some more details about the exceptions that were thrown. Some don't seem connected to the output but were problems encountered before the output, which I call reading here. Some seem to have been for non-indexcard formats.

Exception Format Plan
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities! fries  
java.lang.NegativeArraySizeException serial-json  
java.lang.reflect.InvocationTargetException reading  
java.lang.RuntimeException: ERROR: argument type 'event' not supported! indexcard  
java.lang.RuntimeException: ERROR: event type conversion not supported! indexcard Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Disease in event: indexcard Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Family in event: indexcard Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event: indexcard Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event: indexcard Convert error to warning
java.util.NoSuchElementException: key not found: controlled reading  
java.util.NoSuchElementException: key not found: controller reading  
java.util.NoSuchElementException: key not found: theme reading & fries  
java.util.NoSuchElementException: next on empty iterator cmu  

@MihaiSurdeanu
Copy link
Contributor

Hmm. Some of these seem errors in the format code. Some are legit exceptions in the data that should be handled.
To make this more manageable to fix, @enoriega: can you please create a unit test for each of these exceptions, ideally using a single sentence per test? Than I can take a look at each, and hopefully either fix them or tell you what needs to be done.

Thanks!

@enoriega enoriega self-assigned this Jan 16, 2021
@kwalcock
Copy link
Member

If I had had the corresponding files, I would have already volunteered to do it. Feel free to reassign.

@enoriega
Copy link
Member Author

enoriega commented Jan 18, 2021

I updated the table with a reference to a file for each error kind. The attached zip contains all the referenced input files. I'll get some example sentences for each error.

Exception Format Plan PMCID Sentence
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities! fries   PMC4265014 N/A
java.lang.NegativeArraySizeException serial-json   PMC7176272  
java.lang.reflect.InvocationTargetException reading   PMC7040422  
java.lang.RuntimeException: ERROR: argument type 'event' not supported! indexcard   PMC3822968  
java.lang.RuntimeException: ERROR: event type conversion not supported! indexcard Convert error to warning PMC3822968  
java.lang.RuntimeException: ERROR: unknown event type: Disease in event: indexcard Convert error to warning PMC6539695  
java.lang.RuntimeException: ERROR: unknown event type: Family in event: indexcard Convert error to warning PMC5327768  
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event: indexcard Convert error to warning PMC5985311  
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event: indexcard Convert error to warning PMC6213605  
java.util.NoSuchElementException: key not found: controlled reading   PMC5504966 Bacteria in the human gut can produce hydrogen gas , and hydrogen can be converted to methane in the gut by methane producing bacteria [ 15 ] .
java.util.NoSuchElementException: key not found: controller reading   PMC5809884 ( 2 ) Noise exposure led to enhanced JNK phosphorylation and IRS1 serine phosphorylation as well as reduced Akt phosphorylation in skeletal muscles in response to exogenous insulin stimulation .
java.util.NoSuchElementException: key not found: theme reading & fries   PMC6940835 Activated ANP is a peptide hormone consisting of 28 amino acids that binds to NPR1 , a receptor in target organs such as the kidneys and peripheral blood vessels , converting intracellular GTP into cGMP to promote the excretion of Na  , inhibit Na   reuptake , and induce vasodilation [ 16,17 ] .
java.util.NoSuchElementException: next on empty iterator cmu   PMC6681624 N/A

nxml.zip

@kwalcock
Copy link
Member

kwalcock commented Jan 20, 2021

Thank you. I'll get to them soon.

@enoriega
Copy link
Member Author

Thanks @kwalcock. Some comments: The errors referred by the rows with N/A in the sentence column are not triggered by a sentence, but by the assembly procedure, which I believe is a form of aggregation of multiple interactions. The corresponding documents trigger the error.
For the rows where the plan is to convert the error to a warning, I didn't bother to find a sentence.
For the rows where the sentence field is empty, I haven't been able to reproduce the error yet. Maybe some of the most recent changes fixed them, but I am still trying to locate a culprit.

@kwalcock
Copy link
Member

kwalcock commented Jan 20, 2021

I'll update this as they are figured out.

Exception Format Plan PMCID Sentence
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities! fries  Allow Regulations as controllers of an Activation PMC4265014 Related to these sentences: ADP promotes platelet activation through its receptors (P2Y1 and P2Y12. A novel finding is that nifedipine greatly inhibits the release of PPAR-β/-γ from activated platelets, thereby increasing the intracellular availability of PPAR-β/-γ which may enhance its cellular functions like the regulation of platelet activation.
java.util.NoSuchElementException: key not found: theme fries Just return the BioEventMention itself if there is no theme. PMC6940835 Activated ANP is a peptide hormone consisting of 28 amino acids that binds to NPR1 , a receptor in target organs such as the kidneys and peripheral blood vessels , converting intracellular GTP into cGMP to promote the excretion of Na  , inhibit Na   reuptake , and induce vasodilation [ 16,17 ] .
java.lang.NegativeArraySizeException serial-json unsolved PMC7176272 There seems to be an infinite loop somewhere.
java.lang.RuntimeException: ERROR: argument type 'event' not supported! indexcard Convert error to warning  PMC3822968  The platelet glycoprotein Ibα (GPIbα) and P-selectin glycoprotein ligand (PSGL-1) receptors bind to the endothelial P-selectin initiating platelet rolling, whereas the subsequent firm adhesion is mediated through αIIbβ3 integrin and P-selectin.
java.lang.RuntimeException: ERROR: unknown event type: Disease in event: indexcard Convert error to warning PMC6539695 Among them, nucleotide anti-mir21 drugs inhibit colon cancer cell metastasis up-regulating PDCD4-protein levels in in vitro experiments [100].
java.lang.RuntimeException: ERROR: unknown event type: Family in event: indexcard Convert error to warning PMC5327768 **TGF-β signaling proceeds through two pathways, canonically through Smad7 to activate the Smad2/3 and Smad4 binding to activate transcription and a Smad independent pathway that proceeds through the p38 MAPK and JNK [52,53]. ** 
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event: indexcard Convert error to warning PMC5985311 Inhibition of α-klotho using a neutralizing antibody specifically blocks FGF23-mediated activation of AKT/eNOS and consequently the release of NO.
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event: indexcard Convert error to warning PMC6213605  
java.util.NoSuchElementException: next on empty iterator cmu Use "NONE" as mechanism type when there is no evidence   PMC6681624 Ankrd2 expression in the heart is potentially regulated by cardiac specific transcription factors Nkx2.5 , Hand2 , and Ankrd1 as demonstrated by their interaction with the ANKRD2 promoter or by dual luciferase assay [ 20 , 21 ] .
java.lang.reflect.InvocationTargetException reading Catch exception generated when trigger has no head  PMC7040422 Other mechanisms involved in asthma physiopathology are the inhalation of drugs , as well as respiratory viruses [ 8] , which promote an immune response mediated by IgG antibodies .
java.util.NoSuchElementException: key not found: controlled reading Account for missing controller and controlled  PMC5504966 Bacteria in the human gut can produce hydrogen gas , and hydrogen can be converted to methane in the gut by methane-producing bacteria [ 15 ] .
java.util.NoSuchElementException: key not found: controller reading Account for missing controller and controlled  PMC5809884 ( 2 ) Noise exposure led to enhanced JNK phosphorylation and IRS1 serine phosphorylation as well as reduced Akt phosphorylation in skeletal muscles in response to exogenous insulin stimulation .

@kwalcock
Copy link
Member

This java.lang.NegativeArraySizeException for PMC7176272 is very suspicious. It doesn't occur anywhere near any of our code that could be subtracting wrong. The input file of 300KB takes a very, very long time to process. My computer ran overnight and I see in the log that Enrique worked on it for 22 hours. When I paused it periodically I noticed that the stack was very, very long. It looked like there was about one stack frame for every single one of some 4000+ mentions and it was building up some monster json structure. I couldn't easily tell if there was some kind of loop, but I wonder if there are some Mentions linked to each other in a circle. In generating the output there are buffers involved which are resizing. If something is resized to Integer.MAX_VALUE + 1, which is only 2,147,483,648 or 2GB, this exception can be thrown. I think the program is trying to build 2GB of json output in a string. It might take all night to do that. Has something like this happened before? I'll accept hints that anyone can offer before looking again.

@enoriega
Copy link
Member Author

What you say sounds plausible and I think that this is a corner case too bizarre, so probably it's not worth fixing. We can instead keep this note in a "Knowledge Base" somewhere in the wiki in case it happens again eventually.

@kwalcock
Copy link
Member

I haven't yet noticed in the serialization code anything that is looking out for loops, like a list of already visited Mentions being passed around. Perhaps a short unit test can at least show what would result if that were ever to happen.

@MihaiSurdeanu
Copy link
Contributor

I have seen this in the past, but very infrequently...
I agree that this sounds like an infinite loop. but not sure where it's coming from...

@kwalcock
Copy link
Member

Addressed by PR #724 with one moved to issue #736.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants