carex-climate-morpho

Methods

We generated our morphological dataset using Flora of China and Flora of North America Cyperaceae species treatments (including varieties and subspecies). These treatments are composed by botanical experts who construct composite morphological descriptions (see http://floranorthamerica.org/Introduction for more information) by examining a plethora of specimens. Flora of China (FoC; eFloras, 2008; http://www.efloras.org/flora_page.aspx?flora_id=2) and Flora of North America (FNA; Flora of North America Editorial Committee, 1993+; http://floranorthamerica.org) treatments were encoded into XML (Extensible Markup Language) files with minimal document structure added.

The XML files containing the treatments were processed by CharaParser (Cui, 2012; FoC parsed in 2013; FNA parsed using CharaParser version 0.1.196 in 2020). CharaParser is a tool built explicitly to annotate morphological descriptions using an unsupervised machine learning algorithm and a general purpose syntactic parser (Cui, 2012). CharaParser turns morphological descriptions into fine-grained annotated XML documents by identifying organs, characters of organs, measurements, etc. For example, the sentence "Rhizomes 3–5 mm thick." is transformed into the following markup "<biological_entity id="o20841" name="rhizome" name_original="rhizomes" src="d0_s0" type="structure"> </biological_entity>". The XML files used in this project can be seen in our GitHub repo: https://github.com/jocelynpender/carex-climate-morpho/tree/master/data/external).

We used custom built Python scripts to extract the data we needed from the annotated XML files generated by CharaParser (https://github.com/jocelynpender/carex-climate-morpho). Using lxml 4.5.2 and pandas 1.1.2 (The pandas development team, 2021) packages, we transformed data from 612 parsed FNA XML files and 721 parsed FoC files into two CSV morphological datasets (one per flora). We mapped CharaParser structure and character names to our own using a structure name mapping (https://github.com/jocelynpender/carex-climate-morpho/blob/master/data/interim/fna_recode_property_names.csv; https://github.com/jocelynpender/carex-climate-morpho/blob/master/data/interim/foc_recode_property_names.csv). We included atypical measurements in our dataset (e.g., 62mm was extracted from the sentence "leaf 13–38 (–62) mm wide" and included in the dataset). We omitted relative measurements for convenience (e.g., "proximal nonbasal bracts usually equaling or shorter than inflorescences").

References

Cui H. 2012. CharaParser for fine-grained semantic annotation of organism morphological descriptions. Journal of the American Society for Information Science and Technology 63: 738–754.

Flora of North America Editorial Committee, eds. 1993+. Flora of North America North of Mexico. 21+ vols. New York and Oxford.

eFloras (2008). Published on the Internet http://www.efloras.org [accessed 2013]. Missouri Botanical Garden, St. Louis, MO & Harvard University Herbaria, Cambridge, MA.

The pandas development team. 2021. pandas-dev/pandas: Pandas 1.2.2. Zenodo. 10.5281/ZENODO.4524629.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
data		data
packrat		packrat
src		src
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
carex-climate-morpho.Rproj		carex-climate-morpho.Rproj
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

carex-climate-morpho

Methods

References

About

Releases 1

Packages

Contributors 2

Languages

jocelynpender/carex-climate-morpho

Folders and files

Latest commit

History

Repository files navigation

carex-climate-morpho

Methods

References

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages