Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible infinite loop in serial-json output #736

Open
kwalcock opened this issue Jan 27, 2021 · 4 comments
Open

Possible infinite loop in serial-json output #736

kwalcock opened this issue Jan 27, 2021 · 4 comments
Assignees

Comments

@kwalcock
Copy link
Member

As noted in #723, reach seems to hang and never make it through this particular file with this output format, at least until some internal part overflows, positive numbers become negative, and an exception is thrown a day later.

Exception Format Plan PMCID Sentence
java.lang.NegativeArraySizeException serial-json unsolved PMC7176272 There seems to be an infinite loop somewhere.

PMC7176272.nxml.txt

@kwalcock
Copy link
Member Author

kwalcock commented Feb 5, 2021

There seems to be something strange going on with the Antecedents related to the Anaphoric trait. There are very long chains of them. I stopped when they got to 100. When printing output for one of them, the 100 antecedents must be printed. For the 99th, its 98 must be printed, for the 98th, its 97, etc. This quickly explodes. There is a loop detector now and if it is trustworthy, there are no loops, but these longs chains are probably causing problems. Who knows anything about them? I am counting them as they are being output here:

case hasAntecedents if hasAntecedents.nonEmpty => hasAntecedents.map(m => m.asInstanceOf[CorefMention].jsonAST)

@MihaiSurdeanu
Copy link
Contributor

This was probably done by @danebell. @danebell: any chance you can look into this probable infinite loop thing?
Thank you!

@herongrove
Copy link

herongrove commented Feb 5, 2021

Absolutely. Let me take a look at it and get back to you.

@kwalcock
Copy link
Member Author

kwalcock commented Feb 5, 2021

@danebell, please see branch kwalcock-loop and the test TestLoop (which I just updated). It should access PMC7176272.nxml which is one of the test resources. It is a large file in which something like 4000 mentions are found. Finding them is not a big problem, but outputting them will take overnight and break when a buffer exceeds 2gb, it seems. I can figure it out eventually, but if someone has a large head start, they should be consulted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants