AzTreeBank is a syntactically annotated treebank for the Azerbaijani language, following the Universal Dependencies guidelines.
The data in AzTreeBank was collected from a variety of sources, including:
- Books
- Wikipedia
- News websites (sports, politics, and other topics)
- Scientific and literary articles
The annotations in AzTreeBank were generated automatically, providing a broad coverage of syntactic structures in the Azerbaijani language.
AzTreeBank was developed and maintained by the LocalDoc team.
This dataset is licensed under the Creative Commons NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to share and adapt the material, provided it is not used for commercial purposes.
The corpus is entirely in Azerbaijani.
- Sentences: 75,225
- Tokens: 1,167,589
The annotations include parts of speech (POS) tags, morphological features, and syntactic dependency relations following the Universal Dependencies schema.
For any inquiries or further information, please contact the LocalDoc team at [v.resad.89@gmail.com].