This package wraps the Trankit library, so you can use trankit models in a spaCy pipeline.
Using this wrapper, you'll be able to use the following annotations, computed by
your pretrained trankit
pipeline/model:
- Statistical tokenization (reflected in the
Doc
and its tokens) - Lemmatization (
token.lemma
andtoken.lemma_
) - Part-of-speech tagging (
token.tag
,token.tag_
,token.pos
,token.pos_
) - Morphological analysis (
token.morph
) - Dependency parsing (
token.dep
,token.dep_
,token.head
) - Named entity recognition (
doc.ents
,token.ent_type
,token.ent_type_
,token.ent_iob
,token.ent_iob_
) - Sentence segmentation (
doc.sents
)
As of v0.1.0 spacy-trankit
is only compatible with spaCy v3.x. To install
the most recent version:
pip install git+https://github.com/imvladikon/spacy-trankit
or from pypi:
pip install spacy-trankit
Load pre-trained trankit
model into a spaCy pipeline:
import spacy_trankit
# Initialize the pipeline
nlp = spacy_trankit.load("en")
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)
Load it from the path:
import spacy_trankit
# Initialize the pipeline
nlp = spacy_trankit.load_from_path(name="en", path="./cache")
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)