Skip to content
Ramona Walls edited this page Apr 7, 2016 · 8 revisions

#Creating and using identifiers in IDS (Identifier Services) Using Agave, we will create UUIDs for every entity in IDS. Users will have the opportunity to register additional existing identifiers for any entity and to request permanent identifiers (DOI or ARK) for some entity types. For some entity types, we may create additional identifiers for research purposes.

#Identifier types in use in this project

##UUID - Universally Unique Identifier

Within Agave: Every object created with Agave gets a UUID. Does this also include systems that are registered with Agave? For some object types, UUIDs can be turned into URLs using a formula such as https://api.example.com/meta/v2/data/$UUID.

Within CyVerse: The UUID of an object is its identifier inside the data store. It is invariant under all operations on that object. This means, for example if you have an CSV file in the data store, and you edit that file, it keeps the same UUID. This can be a problem though. If you overwrite that csv file with an mp3 file, it is assumed to be the same object, but with different contents, keeping its original UUID.

##ARK - Archival Resource Key An ARK is an identifier originating from the library, archive and museum community. ARKs become persistent when the objects and identifier forwarding information are maintained. Unlike DOIs, ARKs can be deleted. They can be converted to DOIs. Users may request ARKs for any type of entity in IDS. These will be issued by CyVerse, and IDS will maintain a landing page (or use the EZID page). At the end of this project, any ARKs that have not been converted to DOIs will be deleted.

See [this page] (http://ezid.cdlib.org/home/understanding) for more information on DOIs and ARKs.

##DOI - Digital Object Identifier A DOI is an identifier originating from the publishing world and in widespread use for journal articles. DOIs are becoming more common for datasets. DOIs become persistent when the objects and identifier forwarding information are maintained. Users of IDS will be able to request DOIs for datasets. These will be issued and maintained by CyVerse. For the life of this project, a landing page will be maintained by IDS.

DOIs can be resolved via URLs by prepending http://doi.org/ to the DOI.

##URL/URI URLs can be used on their own as identifiers. They may be permanent and redirectable, like PURLs. URLs also can be used to resolve some identifer types, by prepending a standard string (e.g., see below, or for DOIs).

##NCBI SRA Any sequence registered in NCBI's Sequence Read Archive will have an SRA identifier. SRA identifier can begin with SRR, SRX, and possibly other strings. SRA identifiers do not resolve to the actual data files, but rather to a landing page, which has links to download instructions. Here are the instructions for [downloading using command line utilities] (http://www.ncbi.nlm.nih.gov/books/NBK158899/#SRA_download.downloading_sra_data_using).

Before you can submit sequences to SRA, you must create a BioProject and a BioSample. CyVerse has tools to help with this process.

An SRA identifier can be converted to a URL for the landing page by prending http://www.ncbi.nlm.nih.gov/sra/?term= to the ID.

##NCBI BioProject The identifier for a project in NCBI.

For a bioproject URL, prepend http://www.ncbi.nlm.nih.gov/bioproject/ to the bioproject ID.

##NCBI BioSample The identifier for a specimen registered with NCBI. There are standard metadata templates for Biosamples, which we will use for IDS.

For a biosample URL, prepend http://www.ncbi.nlm.nih.gov/biosample/ to the biosample ID.

##Local identifier This is any string supplied by a user that they have used to identify an entity in their own lab. Local identifiers are not globally unique and not resolvable. Often, they encode information about the organism or experiment related to the entity.

Related information

Comment from @mbjones: The GeoLink and DataONE projects have been compiling a list of well-known identifier types, along with how to canonically represent those. This list may inform this project. Contributions to this list are welcome.