Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to SciSpacy #136

Open
JohnGiorgi opened this issue May 20, 2019 · 0 comments
Open

Switch to SciSpacy #136

JohnGiorgi opened this issue May 20, 2019 · 0 comments
Assignees
Labels

Comments

@JohnGiorgi
Copy link
Contributor

Currently, we are using SpaCy to do low level NLP tasks (like tokenization, sentence segmentation, POS tagging and parsing). However, these models were trained on general domain text.

The folks at AllenNLP have release SciSpaCy, a SpaCy model trained on biomedical text. We should check if this model improves performance, and if so, switch to it. This would also allow us to drop our custom tokenizer (less code).

Preliminary results look good, and with SciSpaCy appearing to boost performance of coreference resolution (NeuralCoref relies on the underlying SpaCy model for preprocessing).

@JohnGiorgi JohnGiorgi self-assigned this May 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant