Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review climate data #212

Open
teixeirak opened this issue Jan 14, 2021 · 26 comments
Open

review climate data #212

teixeirak opened this issue Jan 14, 2021 · 26 comments
Assignees

Comments

@teixeirak
Copy link
Member

Our latest plot of the climate data has some obviously wrong values, including a strange line of data and a handful that are out of range. I assume these came in with GROA or maybe SRDB (I think it's been a long time since I've looked at this figure).

image

@bpbond
Copy link
Contributor

bpbond commented Jan 14, 2021

Ummmm I've never seen anything like that in the SRDB data 🤔 but 🤷‍♂️

@teixeirak
Copy link
Member Author

I just checked and it was not there after we added SRDB. So, it presumably comes from GROA.

@bpbond
Copy link
Contributor

bpbond commented Jan 14, 2021

Phew :)

teixeirak added a commit that referenced this issue Mar 28, 2021
@teixeirak teixeirak added the bug label Mar 28, 2021
@teixeirak
Copy link
Member Author

teixeirak commented Mar 28, 2021

There appears to be some bad data from Taylor_2017_tari: very low precip for at least 4 sites in India (e.g., Kodayar I, IV...). Data came from here: https://data.nceas.ucsb.edu/view/knb.1274.1. @beckybanbury , do you have an inntermediary data file with this climate info?

@teixeirak
Copy link
Member Author

teixeirak commented Mar 28, 2021

I believe that strange line of data comes mostly or entirely from Xu_2015_proa, imported via GROA. I don't see an explanation in the paper as to climate data source, but study system includes 164 plots spanning elevation gradient, so presumably climate is extrapolated as a function of elevation. I'm not sure if this is the most accurate possible for that location, but at least we have an explanation. I haven't verified against original data yet, but it seems unlikely that this is an error in GROA or ForC.

Here's

MAP vs MAT for Xu_2015_proa sites:

image

@beckybanbury
Copy link
Collaborator

@teixeirak yes, that's an error - the intermediate data sheet is in this folder - it's the litterfall data file, and by the looks of things I just accidentally copied across the values from the adjacent column.

I've corrected in ForC_sites

Are there any others that look out?

@teixeirak
Copy link
Member Author

Many thanks, @beckybanbury , and sorry to ping you on a weekend. No need to respond right away. (I'm pushing to solve some problems for a deadline this week, but we can just avoid questionable records at this point.)

It's hard to say if everything looks right now. Taylor 2017 has a number of records with very high precip, and so I started trying to check some. I verified that one was correct (Swer) but found one error (Wooroonooran National Park Bellenden Ker). Then, I got caught up trying to understand what's going on with the La Fortuna Forest Reserve, which has 5 sites with identical coordinates but different climate entered, but only seems to have one site when you go to the original pub. That will need to be solved, but I have to drop it for now.

teixeirak added a commit that referenced this issue Mar 28, 2021
teixeirak added a commit that referenced this issue Mar 29, 2021
@teixeirak
Copy link
Member Author

I have reviewed the most egregious outliers. However, there are almost certainly some errors. It would be good to check the ForC climate data against a global database to identify values that are way off (e.g., units error during data entry).

@teixeirak teixeirak changed the title there seems to be some bad climate data review climate data Mar 29, 2021
@teixeirak teixeirak added enhancement and removed bug labels Mar 29, 2021
@beckybanbury
Copy link
Collaborator

@teixeirak happy to help with this if you'd like, particularly the data from Taylor 2017 that I entered - I remember reviewing some of the C flux values that looked off at the time, but didn't check climate data so closely. Happy to spend some time reviewing if you'd like - just let me know how you want to approach this!

@teixeirak
Copy link
Member Author

Thanks, @beckybanbury. I sent an email about this. More narrowly, figuring out what's going on with La Fortuna (see here) would be helpful.

teixeirak added a commit that referenced this issue Apr 15, 2021
#212
@beckybanbury , I added a field to flag suspect climate data.
@teixeirak
Copy link
Member Author

teixeirak commented Apr 16, 2021

@beckybanbury , thanks for working on this!

Based on the plots, let's flag sites "climate.data.suspect" if any of the following are true:

  • temperature difference > 5C
  • warmest or coldest month difference > 7.5C
  • log(precip) difference >1

You could just flag with a "1", or better yet list the variable(s) that is/are off.

@teixeirak
Copy link
Member Author

@beckybanbury , if you're able to complete the step above this week while @Troger4 is still with us, she could check the climate values that are way off.

beckybanbury added a commit that referenced this issue Apr 27, 2021
added column to flag suspect climate data issue #212
@beckybanbury
Copy link
Collaborator

@teixeirak sorry - somehow I missed your previous comment! I've flagged with the name of the variable that is suspect. It doesn't look like there's too many.

@teixeirak
Copy link
Member Author

@Troger4 , please use the climate.data.suspect field in this file to identify the sites with suspicious climate data. It is coded to indicate which value is bad. When one value is bad, but please double check the others. In case the original pub does not report climate data, please replace the bad value with "NI".

@teixeirak
Copy link
Member Author

Also note: this file and the master ForC_sites DO NOT MATCH because sites missing coordinates are not included in the former.

Also, please create a new column in this file to note when you've reviewed the climate data.

@Troger4
Copy link
Collaborator

Troger4 commented Apr 30, 2021

Okay, I see there are 284 climate.data.suspect entries with MAP, MAT, min temp, and max temp. What do MAP and MAT represent in columns R and O?
Thank you

@ValentineHerr
Copy link
Member

Not sure what file you are working with exactly but it must be Mean Annual Precipitation and Mean Annual Temperature.

Metadata for the SITES table is here: https://github.com/forc-db/ForC/blob/master/metadata/sites_metadata.csv

@Troger4
Copy link
Collaborator

Troger4 commented Apr 30, 2021

Hi Valentine, I'm looking in ForC_sites_climate_data within extracted_sites_data, mean annual precip and mean annual temp makes sense. Thanks very much!

@teixeirak
Copy link
Member Author

Those correspond to columns in ForC_sites, and indicate which have large deviations from the value pulled form the global database (WorldCLim). Be sure to put fixes in ForC_sites (the msater), not in extracted_sites_data.

@Troger4
Copy link
Collaborator

Troger4 commented Apr 30, 2021

Also note: this file and the master ForC_sites DO NOT MATCH because sites missing coordinates are not included in the former.

Also, please create a new column in this file to note when you've reviewed the climate data.

Which file is the "this file" you referred to? And which should I be looking in to find climate.data.suspect records?
Thank you!

@teixeirak
Copy link
Member Author

Sorry, I guess the links were confusing. Here it is with the file names:

Also note: [this file] (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) and the master ForC_sites DO NOT MATCH because sites missing coordinates are not included in the former.

Also, please create a new column in [this file] (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) to note when you've reviewed the climate data.

@teixeirak
Copy link
Member Author

@mawilliams99 , this is an issue that you can get started on as an intro to the ForC data work.

There's a lot of discussion above, but summarizing here--

I'll message you separately to make sure this makes sense.

@teixeirak
Copy link
Member Author

@mawilliams99 or @ValentineHerr , this should be a quick task-- could one of you please merge the field climate.data.suspect field in this file (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) into the corresponding field in the master sites file? (The difference between the two is that the master includes a few sites with no coordinates.) The motivation for this is that reviewing the climate data doesn't have to happen with high priority, but we want to be sure to review any suspect data on sites we may send to EFDB.

@ValentineHerr
Copy link
Member

I'll work on this now

@ValentineHerr
Copy link
Member

Looks like Yenisei 2lu and Yenisei 26lh/lw are missing from the file (beside sites without latitudes).
I'll go ahead and merge anyways as I believe that you (@teixeirak) have been working with this site recently.

ValentineHerr added a commit that referenced this issue Sep 29, 2021
@teixeirak
Copy link
Member Author

Thanks! This is very helpful.

Those two sites would be very similar to the other Yenisei sites, which don't have suspect data, so this is fine.

@mawilliams99 , please be sure to check the climate.data.suspect field in sites for all the studies that you review. We should avoid sending any of those values to EFDB (better to replace with NA, unless there's some really good reason to believe that the current data are correct (e.g., steep topographic gradients that would make the site quite different from most of the surrounding areas)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants