This repository is for the cleanup, organization, and archiving of demographic survey data collected as part of the Heliconia Demography Project. These procedures are carried out by executing two R scripts (see Workflow, below). An overview of the 1998-2002 surveys and the associated metadata have been submitted to Ecology for publication as a data paper; upon acceptance the demographic data will be archived in the Dryad Digital Repository. There is a separate Github repository for the 2023 Ecology Data Paper; that repo includes the final version of the paper (in .pdf
format) there and the .Rmd
files used containing the text and code for analyses, data summaries, figures, and tables.
This repository includes the following:
-
R Code used to:
- process raw data files and correct / make changes to individual records (
code/survey_cleaning
) - review the clean data for anomalies, unusual records for review, etc. (
code/survey_review
) - prepare the version of the data to be archived at Dryad (
code/survey_archive
) - prepare figures for the data summaries and publications (
code/figures
)
- process raw data files and correct / make changes to individual records (
-
Data:
- .csv files of raw data (
data/survey_raw
) - .csv files of clean demographic data and plot descriptors (
data/survey_clean
) - .csv files of any records suggested for further review (
data/survey_review
) - .csv files of the datasets archived at Dryad (
data/survey_archive
).
- .csv files of raw data (
-
Data validation algorithms and their output algorithms
-
Summaries of the demographic data (e.g., total number of plants, total number of plants per plot, total number of seedlings per year).
-
Methodological information and records, including:
- scanned copies of the original datasheets,
- an overview and downloadable record of plants for which id tags were replaced during field surveys,
- records of treefalls in plots and any damage they caused to plants,
- maps of the demographic plots that can be downloaded in different formats.
Code: The functions in 01_clean_survey_data.R
will consolidate the 'raw' survey data, clean it, organize it in tidy form, and conduct a series of validation procedures.
-
ha_data<-clean_heliconia_data()
calls several other functions found in the foldercode/survey_cleaning
. These functions include an.R
script for cleaning and correcting the records for plants found in each demographic plot and producing `csv files of 'clean' data and any records recommended for follow-up review. -
create_plot_info_file()
will create a.csv
file of plot-level descriptors. -
create_tag_changes_file()
creates a.csv
of all the plants whose tags were replaced during the field survey (necessary only if one is reviewing the survey history of individual plants using the original data sheets) -
create_plot_treefalls_file()
creates a.csv
with records of any new tree falls and gaps noted in the demographic plots during the survey. (NB: review of these records is currently in progress.) -
create_plant_damage_file()
creates a.csv
with any observations by the survey team of plants that were damaged by fallen branches or trees. (NB: review of these records is currently in progress.)
Output: The .csv
files produced by these functions are saved to the folder data/survey_clean
. Executing the code also creates or edits .txt
files with the relevant file's version number and date of most recent update (see 'File Versioning', below).
File Versioning: To ensure reproducibility, users must know the precise version of a data set they used in their analyses. Below each function is a snippet of code entitled create version files
; uncommenting and running this code will create or update the file recording the version number of the file being created (see 'Frictionless Standards').
The first time the files are 'cleaned' or 'created' a .txt
file will automatically be created assigning the version number 1.0.0
with the date of file creation. If a file already exists, the user will be asked if the file being created is an updated version. 'N' will execute the code without changing the version number or date; 'Y' will trigger a follow-up question of whether the new version is a major
, minor
, or patch
update. The version number will be appriopriately incremented by 1 (e.g., major: 1.0.0 -> 2.0.0, minor: 1.0.0 -> 1.1.0, patch: 1.0.0 -> 1.0.1).
- [NB: this was automated but is temporarily manual to allow automated validation, see details here].
Data Validation & Review: Once the file heliconia_survey_clean.csv
has been saved to the the data/survey_clean
folder, the function review_heliconia_data()
conducts a series of data validation procedures to flag any records to review before preparing the files to be archived at the Dryad Digital Repository.
-
The functions for this review are in the folder
code/survey_review
. -
These and other validations are also carried out using the
pointblank
package; The output of the data validation process suggesting records for review is here. -
Any individual plant records that are flagged for review by
review_heliconia_data()
will be saved as.csv
files in the folderdata/survey_review
. They can also be downloaded as .csv files from the Data Validation page.
**Code: **02_create_survey_archive.R
will prepare the version of the 'clean' survey data and file of plot descriptors that are archived in Dryad.
-
Uncommenting and running the snippet of code entitled
create version files
will prompt the user to answer if they are creating an updated version of the data set, and if so, if the version is amajor
,minor
, orpatch
update. -
create_dryad_file()
will then create .csv files of (1) plot descriptors and (2) the survey data that were archived in Dryad (NB: The demographic data file uploaded to Dryad excludes some of the redundant plot identification codes and the x-y coordinates of individual plants). The function generating and saving these files is found in the foldercode/survey_archive
, as is thecreate_version_file.R
script used toupdate theversion_info.txt
file.
- These resulting .csv files are saved to the folder
data/survey_archive
.
We welcome any suggestions for package improvement or ideas for features to include in future versions. If you have Issues, Feature Requests and Pull Requests, here is how to contribute. We expect everyone contributing to the package to abide by our Code of Conduct.
- Emilio M. Bruna, University of Florida
- Eric R. Scott, University of Arizona
Please cite both the Data Paper and Dryad Repository when using these data for research, publications, teaching, etc.
If you wish to cite this repository, please cite as follows:
@misc{BrunaSurveys2023,
author = {Bruna, E.M., Eric R. Scott},
title = {Heliconia Demography Project},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
note = {data v1.0.0.},
url={https://github.com/BrunaLab/HeliconiaSurveys}
}