Multiple problems with the indexcard output format #723

enoriega · 2021-01-12T21:35:41Z

There are a few bugs in the indexcard output format which I think are because of changes made to data structures posterior to the creation of the output format.

Below are all the exceptions that appear in the log after a few days running.

java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities!
java.lang.NegativeArraySizeException
java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: ERROR: argument type 'event' not supported!
java.lang.RuntimeException: ERROR: event type conversion not supported!
java.lang.RuntimeException: ERROR: unknown event type: Disease in event:
java.lang.RuntimeException: ERROR: unknown event type: Family in event:
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event:
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event:
java.util.NoSuchElementException: key not found: controlled
java.util.NoSuchElementException: key not found: controller
java.util.NoSuchElementException: key not found: theme
java.util.NoSuchElementException: next on empty iterator

Please find the log file and a couple papers to reproduce it attached to the issue.
error.log
PMC4543788.nxml.txt
PMC5809884.nxml.txt
PMC6086911.nxml.txt

@MihaiSurdeanu Since this output format doesn't seem relevant today, are the errors worth fixing?

The text was updated successfully, but these errors were encountered:

MihaiSurdeanu · 2021-01-12T21:58:21Z

This is a format that really nobody uses anymore. I propose to remove it.
@kwalcock : can you please do it when you get a chance?

kwalcock · 2021-01-12T22:04:00Z

Yes. Not having looked at it yet, I wonder whether it is easier to fix than to remove. Last time something was removed, it had to be added back. However, I can certainly follow instructions

MihaiSurdeanu · 2021-01-12T22:06:41Z

Either way...
But, for historic background, this was a format that was used in an early DARPA eval, and was abandoned after.

kwalcock · 2021-01-16T02:15:06Z

Here are some more details about the exceptions that were thrown. Some don't seem connected to the output but were problems encountered before the output, which I call reading here. Some seem to have been for non-indexcard formats.

Exception	Format	Plan
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities!	fries
java.lang.NegativeArraySizeException	serial-json
java.lang.reflect.InvocationTargetException	reading
java.lang.RuntimeException: ERROR: argument type 'event' not supported!	indexcard
java.lang.RuntimeException: ERROR: event type conversion not supported!	indexcard	Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Disease in event:	indexcard	Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Family in event:	indexcard	Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event:	indexcard	Convert error to warning
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event:	indexcard	Convert error to warning
java.util.NoSuchElementException: key not found: controlled	reading
java.util.NoSuchElementException: key not found: controller	reading
java.util.NoSuchElementException: key not found: theme	reading & fries
java.util.NoSuchElementException: next on empty iterator	cmu

MihaiSurdeanu · 2021-01-16T02:40:12Z

Hmm. Some of these seem errors in the format code. Some are legit exceptions in the data that should be handled.
To make this more manageable to fix, @enoriega: can you please create a unit test for each of these exceptions, ideally using a single sentence per test? Than I can take a look at each, and hopefully either fix them or tell you what needs to be done.

Thanks!

kwalcock · 2021-01-16T04:10:31Z

If I had had the corresponding files, I would have already volunteered to do it. Feel free to reassign.

enoriega · 2021-01-18T21:40:54Z

I updated the table with a reference to a file for each error kind. The attached zip contains all the referenced input files. I'll get some example sentences for each error.

Exception	Format	Plan	PMCID	Sentence
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities!	fries		PMC4265014	N/A
java.lang.NegativeArraySizeException	serial-json		PMC7176272
java.lang.reflect.InvocationTargetException	reading		PMC7040422
java.lang.RuntimeException: ERROR: argument type 'event' not supported!	indexcard		PMC3822968
java.lang.RuntimeException: ERROR: event type conversion not supported!	indexcard	Convert error to warning	PMC3822968
java.lang.RuntimeException: ERROR: unknown event type: Disease in event:	indexcard	Convert error to warning	PMC6539695
java.lang.RuntimeException: ERROR: unknown event type: Family in event:	indexcard	Convert error to warning	PMC5327768
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event:	indexcard	Convert error to warning	PMC5985311
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event:	indexcard	Convert error to warning	PMC6213605
java.util.NoSuchElementException: key not found: controlled	reading		PMC5504966	Bacteria in the human gut can produce hydrogen gas , and hydrogen can be converted to methane in the gut by methane producing bacteria [ 15 ] .
java.util.NoSuchElementException: key not found: controller	reading		PMC5809884	( 2 ) Noise exposure led to enhanced JNK phosphorylation and IRS1 serine phosphorylation as well as reduced Akt phosphorylation in skeletal muscles in response to exogenous insulin stimulation .
java.util.NoSuchElementException: key not found: theme	reading & fries		PMC6940835	Activated ANP is a peptide hormone consisting of 28 amino acids that binds to NPR1 , a receptor in target organs such as the kidneys and peripheral blood vessels , converting intracellular GTP into cGMP to promote the excretion of Na , inhibit Na reuptake , and induce vasodilation [ 16,17 ] .
java.util.NoSuchElementException: next on empty iterator	cmu		PMC6681624	N/A

nxml.zip

kwalcock · 2021-01-20T01:25:24Z

Thank you. I'll get to them soon.

enoriega · 2021-01-20T01:34:09Z

Thanks @kwalcock. Some comments: The errors referred by the rows with N/A in the sentence column are not triggered by a sentence, but by the assembly procedure, which I believe is a form of aggregation of multiple interactions. The corresponding documents trigger the error.
For the rows where the plan is to convert the error to a warning, I didn't bother to find a sentence.
For the rows where the sentence field is empty, I haven't been able to reproduce the error yet. Maybe some of the most recent changes fixed them, but I am still trying to locate a culprit.

kwalcock · 2021-01-20T20:33:31Z

I'll update this as they are figured out.

Exception	Format	Plan	PMCID	Sentence
java.lang.IllegalArgumentException: requirement failed: Controllers of an Activation must be Entities!	fries	Allow Regulations as controllers of an Activation	PMC4265014	Related to these sentences: ADP promotes platelet activation through its receptors (P2Y1 and P2Y12. A novel finding is that nifedipine greatly inhibits the release of PPAR-β/-γ from activated platelets, thereby increasing the intracellular availability of PPAR-β/-γ which may enhance its cellular functions like the regulation of platelet activation.
java.util.NoSuchElementException: key not found: theme	fries	Just return the BioEventMention itself if there is no theme.	PMC6940835	Activated ANP is a peptide hormone consisting of 28 amino acids that binds to NPR1 , a receptor in target organs such as the kidneys and peripheral blood vessels , converting intracellular GTP into cGMP to promote the excretion of Na , inhibit Na reuptake , and induce vasodilation [ 16,17 ] .
java.lang.NegativeArraySizeException	serial-json	unsolved	PMC7176272	There seems to be an infinite loop somewhere.
java.lang.RuntimeException: ERROR: argument type 'event' not supported!	indexcard	Convert error to warning	PMC3822968	The platelet glycoprotein Ibα (GPIbα) and P-selectin glycoprotein ligand (PSGL-1) receptors bind to the endothelial P-selectin initiating platelet rolling, whereas the subsequent firm adhesion is mediated through αIIbβ3 integrin and P-selectin.
java.lang.RuntimeException: ERROR: unknown event type: Disease in event:	indexcard	Convert error to warning	PMC6539695	Among them, nucleotide anti-mir21 drugs inhibit colon cancer cell metastasis up-regulating PDCD4-protein levels in in vitro experiments [100].
java.lang.RuntimeException: ERROR: unknown event type: Family in event:	indexcard	Convert error to warning	PMC5327768	TGF-β signaling proceeds through two pathways, canonically through Smad7 to activate the Smad2/3 and Smad4 binding to activate transcription and a Smad independent pathway that proceeds through the p38 MAPK and JNK [52,53].
java.lang.RuntimeException: ERROR: unknown event type: Gene_or_gene_product in event:	indexcard	Convert error to warning	PMC5985311	Inhibition of α-klotho using a neutralizing antibody specifically blocks FGF23-mediated activation of AKT/eNOS and consequently the release of NO.
java.lang.RuntimeException: ERROR: unknown event type: Simple_chemical in event:	indexcard	Convert error to warning	PMC6213605
java.util.NoSuchElementException: next on empty iterator	cmu	Use "NONE" as mechanism type when there is no evidence	PMC6681624	Ankrd2 expression in the heart is potentially regulated by cardiac specific transcription factors Nkx2.5 , Hand2 , and Ankrd1 as demonstrated by their interaction with the ANKRD2 promoter or by dual luciferase assay [ 20 , 21 ] .
java.lang.reflect.InvocationTargetException	reading	Catch exception generated when trigger has no head	PMC7040422	Other mechanisms involved in asthma physiopathology are the inhalation of drugs , as well as respiratory viruses [ 8] , which promote an immune response mediated by IgG antibodies .
java.util.NoSuchElementException: key not found: controlled	reading	Account for missing controller and controlled	PMC5504966	Bacteria in the human gut can produce hydrogen gas , and hydrogen can be converted to methane in the gut by methane-producing bacteria [ 15 ] .
java.util.NoSuchElementException: key not found: controller	reading	Account for missing controller and controlled	PMC5809884	( 2 ) Noise exposure led to enhanced JNK phosphorylation and IRS1 serine phosphorylation as well as reduced Akt phosphorylation in skeletal muscles in response to exogenous insulin stimulation .

kwalcock · 2021-01-21T21:18:25Z

This java.lang.NegativeArraySizeException for PMC7176272 is very suspicious. It doesn't occur anywhere near any of our code that could be subtracting wrong. The input file of 300KB takes a very, very long time to process. My computer ran overnight and I see in the log that Enrique worked on it for 22 hours. When I paused it periodically I noticed that the stack was very, very long. It looked like there was about one stack frame for every single one of some 4000+ mentions and it was building up some monster json structure. I couldn't easily tell if there was some kind of loop, but I wonder if there are some Mentions linked to each other in a circle. In generating the output there are buffers involved which are resizing. If something is resized to Integer.MAX_VALUE + 1, which is only 2,147,483,648 or 2GB, this exception can be thrown. I think the program is trying to build 2GB of json output in a string. It might take all night to do that. Has something like this happened before? I'll accept hints that anyone can offer before looking again.

enoriega · 2021-01-21T21:23:42Z

What you say sounds plausible and I think that this is a corner case too bizarre, so probably it's not worth fixing. We can instead keep this note in a "Knowledge Base" somewhere in the wiki in case it happens again eventually.

kwalcock · 2021-01-21T21:31:40Z

I haven't yet noticed in the serialization code anything that is looking out for loops, like a list of already visited Mentions being passed around. Perhaps a short unit test can at least show what would result if that were ever to happen.

MihaiSurdeanu · 2021-01-22T02:30:42Z

I have seen this in the past, but very infrequently...
I agree that this sounds like an infinite loop. but not sure where it's coming from...

kwalcock · 2021-01-27T17:53:43Z

Addressed by PR #724 with one moved to issue #736.

enoriega added the bug label Jan 12, 2021

enoriega self-assigned this Jan 16, 2021

kwalcock mentioned this issue Jan 27, 2021

Possible infinite loop in serial-json output #736

Open

kwalcock closed this as completed Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple problems with the indexcard output format #723

Multiple problems with the indexcard output format #723

enoriega commented Jan 12, 2021

MihaiSurdeanu commented Jan 12, 2021

kwalcock commented Jan 12, 2021

MihaiSurdeanu commented Jan 12, 2021

kwalcock commented Jan 16, 2021

MihaiSurdeanu commented Jan 16, 2021

kwalcock commented Jan 16, 2021

enoriega commented Jan 18, 2021 •

edited

Loading

kwalcock commented Jan 20, 2021 •

edited

Loading

enoriega commented Jan 20, 2021

kwalcock commented Jan 20, 2021 •

edited

Loading

kwalcock commented Jan 21, 2021

enoriega commented Jan 21, 2021

kwalcock commented Jan 21, 2021

MihaiSurdeanu commented Jan 22, 2021

kwalcock commented Jan 27, 2021

Multiple problems with the indexcard output format #723

Multiple problems with the indexcard output format #723

Comments

enoriega commented Jan 12, 2021

MihaiSurdeanu commented Jan 12, 2021

kwalcock commented Jan 12, 2021

MihaiSurdeanu commented Jan 12, 2021

kwalcock commented Jan 16, 2021

MihaiSurdeanu commented Jan 16, 2021

kwalcock commented Jan 16, 2021

enoriega commented Jan 18, 2021 • edited Loading

kwalcock commented Jan 20, 2021 • edited Loading

enoriega commented Jan 20, 2021

kwalcock commented Jan 20, 2021 • edited Loading

kwalcock commented Jan 21, 2021

enoriega commented Jan 21, 2021

kwalcock commented Jan 21, 2021

MihaiSurdeanu commented Jan 22, 2021

kwalcock commented Jan 27, 2021

enoriega commented Jan 18, 2021 •

edited

Loading

kwalcock commented Jan 20, 2021 •

edited

Loading

kwalcock commented Jan 20, 2021 •

edited

Loading