You can easily install discopy-data by using pip:
pip install git+https://github.com/rknaebel/discopy-data
or you just clone the repository. The you can either install discopy-data through pip
pip install -e path/to/discopy-data
Discopy-data is the discopy backend that handles datastructures, preprocessing, and dataset extraction.
The first script uses trankit for tokenization, tagging, and dependency parsing.
In addition, the second script is used, to add constituency trees with the supar parser.
If dependency trees should be added by super as well, add the flag -d
.
discopy-tokenize -i /some/examples/wsj_0336 | discopy-add-parses -c
This might be useful for neural pipeline that does not rely on language features.
cat /some/text | discopy-tokenize --tokenize-only
This is still experimental. A list of possible datasets is listed under cli/extract.py
.
discopy-extract pdtb /data/discourse/conll2016/ --use-gpu --limit 2 | discopy-add-annotations pdtb /data/discourse/conll2016/ --simple-connectives --sense-level 2 | discopy-update-parses --dependency-parser ''