Switch to SciSpacy #136

JohnGiorgi · 2019-05-20T15:26:48Z

Currently, we are using SpaCy to do low level NLP tasks (like tokenization, sentence segmentation, POS tagging and parsing). However, these models were trained on general domain text.

The folks at AllenNLP have release SciSpaCy, a SpaCy model trained on biomedical text. We should check if this model improves performance, and if so, switch to it. This would also allow us to drop our custom tokenizer (less code).

Preliminary results look good, and with SciSpaCy appearing to boost performance of coreference resolution (NeuralCoref relies on the underlying SpaCy model for preprocessing).

JohnGiorgi self-assigned this May 20, 2019

JohnGiorgi added enhancement New feature or request optimization production labels May 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to SciSpacy #136

Switch to SciSpacy #136

JohnGiorgi commented May 20, 2019

Switch to SciSpacy #136

Switch to SciSpacy #136

Comments

JohnGiorgi commented May 20, 2019