Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal metadata for unlabelled images #4

Open
metazool opened this issue Jul 1, 2024 · 1 comment
Open

Minimal metadata for unlabelled images #4

metazool opened this issue Jul 1, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@metazool
Copy link
Collaborator

metazool commented Jul 1, 2024

The current catalogue record rewrites the CSV index of tagged images to add an absolute path to the location in s3 - but for experimenting with unsupervised approaches we want an index of the whole set.

Don't have more to go on than filename and perhaps dimensions, but one to revisit later with the opportunity to get some spatio-temporal metadata out of the FlowCam instrument

Edit: adding these notes which were originally an issue on the internal project, for consultation with local experts

For pipeline workflows it makes sense to have a standards oriented interface to the image collections, there are different modern options and a

  • STAC with appropriate extensions (given some spatio-temporal metadata)
  • Intake for ease of access to a whole collection

Marine sampling standards background

There's lots of work on plankton sample data sharing in the marine world, one ideal of the project originators is to establish an equivalent service for freshwater ecosystems. There's a well worked out body of standards but a lot of it has a kind of pre-internet feel to it, with semi-manual workflows for data linkage and cleaning, e.g. https://manual.obis.org/name_matching.html#taxon-matching-workflow

https://hal.science/hal-03958791/document

"Establishing Plankton Imagery Dataflows Towards
International Biodiversity Data Aggregators"

"We developed recommendations for plankton
imagery data management, which can promote the ability to make these datasets as FAIR". This is a very high-level description of workflow without automation / implementation specifics.

The aggregation goes through here https://ipt.vliz.be/eurobis/ which is oriented to marine ecosystems.

https://github.com/EMODnet/EMODnetBiocheck - this R-based tool is used for quality control: "It helps users to Quality Control their (marine) biological datasets ... the analysis reaches its full potential using an IPT resource with OBIS-ENV data format", a heavyweight looking data model - https://manual.obis.org/formatting.html

References

@DylanCarbone
Copy link

Hi Jo, @metazool

As I mentioned in my email the following guides and documents may be of interest to your work:

The TDWG guide - An older document listing established metadata standards and based on those standards terminology important for the monitoring of insects.
Camtrap DP - A richer metadata standard tailored for methods of monitoring that captures images. This can describe images captured in the lab under flow cytometry but there will be certain fields that will not be relevant to flow cytometry methods
A guide to camera trap surveying - If you are considering metadata standards for the purpose of publication to GBIF, this has some nice discussions on the event-occurence structure and the limits of Darwin Core star schema structure. This is under the section 4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants