Skip to content
/ exmldoc Public

Standalone Python library for processing ExportXMLv2 files

License

Notifications You must be signed in to change notification settings

yv/exmldoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExmlDoc

https://travis-ci.org/yv/exmldoc.svg?branch=master

exmldoc is a library for loading .exml.xml files either produced by PyTree or by the ExportXMLv2 Java library and assorted tools. The EXML file format is one of the file formats used for the TüBa-D/Z treebank of German and offers the possibility to store multilayer linguistic annotations in a (mostly) human-readable format.

As long as you are working with small documents, usage is relatively simple: load a document with

::

import exmldoc from exmldoc.tree import Tree

doc = exmldoc.load('file.exml.xml')

you can then (for example) enumerate all sentences with:

::
for sent in doc.get_objects_by_class(Tree):
print doc.words[sent.span[0]:sent.span[1]]

or access the token objects with

::
for sent in doc.get_objects_by_class(Tree):
for token in doc.w_objs[sent.span[0]:sent.span[1]]
print token.word, token.cat, token.lemma

You can change a document and then save it with

::
doc.save('file_processed.exml.xml')

About

Standalone Python library for processing ExportXMLv2 files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages