Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Really partial references when harvesting #8632

Open
tjouneau opened this issue Apr 21, 2022 · 1 comment
Open

Really partial references when harvesting #8632

tjouneau opened this issue Apr 21, 2022 · 1 comment

Comments

@tjouneau
Copy link

tjouneau commented Apr 21, 2022

What steps does it take to reproduce the issue?

What happens?
Results can be seen on our test instance here : https://tested-dataverse5.univ-lorraine.fr/dataverse/ortolang. If the link does not work I attach a picture below for reference.
The references are horribly partial and ugly. Most only have the title right. The date is the date of the harvesting, not the date of publication.
The following request shows a very clean XML output which should not create any problem.
https://repository.ortolang.fr/api/oai/?verb=ListRecords&set=producer:atilf&metadataPrefix=oai_dc
I'm attaching said output here :
repository.ortolang.fr.xml.zip

Which version of Dataverse are you using?
5.10

Any related open or closed issues to this bug report?

Screenshots:
image

@mreekie mreekie added NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... and removed NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... labels Oct 25, 2022
@mreekie mreekie removed the NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... label Nov 2, 2022
@pdurbin pdurbin added Type: Bug a defect User Role: API User Makes use of APIs labels Oct 9, 2023
@luddaniel
Copy link
Contributor

Hello everyone, we are working with @tjouneau and the recent in-depth analyses on v6.2+ have allowed us to put our finger on the explanations of why the rendering was questionable and the harvesting partial.

  • First, there are many issues with controlled vocabularies and this can be circumvented by activating the allowHarvestingMissingCVV parameter.

  • Second, there are data quality anomalies related to the Ortolang data repository, such as the lack of title. Also, the non-management of the dc:contributor tags and the date that does not correspond to dc:date but to dc:datestamp in the headers.

A general work will be done for https://entrepot.recherche.data.gouv.fr/ harvesting and we will contribute all possible improvements to dataverse, in particular the use of the oai_dc and oai_ddi metadata formats. (Already started with #10772 #10837)

@tjouneau I suggest you close this ticket, we will open specific tickets if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ⚠️ Needed/Important
Development

No branches or pull requests

4 participants