Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When someone uploads a spreadsheet what version of the schema do we assume it is? #35

Closed
odscjames opened this issue Jul 14, 2019 · 8 comments · Fixed by #54
Closed
Assignees

Comments

@odscjames
Copy link
Collaborator

In OCDS, it looks like we just assume it's 1.1

The problem is this bit in view.py:

   schema_bods = SchemaBODS(lib_cove_bods_config=lib_cove_bods_config)
    context.update(convert_spreadsheet(upload_dir, upload_url, file_name, file_type, lib_cove_bods_config,
                                       schema_url=schema_bods.release_pkg_schema_url))
    with open(context['converted_path'], encoding='utf-8') as fp:
        json_data = json.load(fp, parse_float=Decimal)

We need to pass the json_data to SchemaBODS so it can select the right version, but we don't have that yet ... and we need to pass the schema to convert_spreadsheet, but we don't know which version of the schema to use until we open the spreadsheet!

@kd-ods
Copy link
Collaborator

kd-ods commented Jul 15, 2020

There is an issue here to be fixed. I have a spreadsheet with BODS 0.2 data. The version info is included per statement (publicationDetails/bodsVersion), but the version is not detected and used for conversion and validation.

(So there are validation errors like 'interests/0/type contains an unrecognised value. Check the related codelist for allowed code values.', with 'other-influence-or-control' being the unrecognised value. And the new 0.2 fields are treated as additional fields.)

@kd-ods
Copy link
Collaborator

kd-ods commented Jul 15, 2020

When it comes to conversion from the spreadsheet to JSON, is the (correct version of the) schema used to interpret data formats correctly from the spreadsheet to JSON? If so, there are a number of possible issues with conversion of 0.2 data from spreadsheets which might disappear once this version detection issue is sorted out. Or they may be unrelated issues. For reference, they are:

Date format
Currently dates in the Excel sheet are being rendered in the following way in the JSON:

"statementDate": "2020-03-01T00:00:00+00:00"

(I'm currently using this regex to find and replace them: ([0-9]{4}-[0-9]{2}-[0-9]{2})T[^"]*)

Boolean values
These are being detected and rendered as strings:

"isComponent": "False"

componentStatementIDs field
This should be an array of strings, instead of a single string value:

"componentStatementIDs": "ameiti2020dec0000001entity0000003,ameiti2020dec0000001owncontrol0000005,ameiti2020dec0000001entity0000004,ameiti2020dec0000001owncontrol0000007,ameiti2020dec0000001owncontrol0000009,ameiti2020dec0000001entity0000005,ameiti2020dec0000001owncontrol0000010"

@Bjwebb
Copy link
Collaborator

Bjwebb commented Aug 18, 2020

I think the best solution to this is to run the conversion twice, once to pick up the schema version, and then a second time to do the conversion with the correct schema.

Here's a dev instance to try: https://dev.datareview.openownership.org/

This fixes everything except for:

  • componentStatementIDs: These should be ; separated, instead of ,
  • Dates: flatten-tool doesn't handle "format": "date" properly

@kd-ods
Copy link
Collaborator

kd-ods commented Aug 18, 2020

@Bjwebb That looks great to me. It even works on a .ods file that's saved in LibreOffice as .xlsx.

What would it take to have flatten-tool handle "format": "date" properly? Is there a related issue logged in the flatten-tool library?

@Bjwebb
Copy link
Collaborator

Bjwebb commented Aug 19, 2020

No, I think this is the first time we've come across it, all the other standards allow date-time in all date related fields.

I've made an issue OpenDataServices/flatten-tool#357 now, and I have a suggested fix already OpenDataServices/flatten-tool#358

@Bjwebb
Copy link
Collaborator

Bjwebb commented Aug 20, 2020

I've deployed the flatten-tool fix to the dev site https://dev.datareview.openownership.org/

Bjwebb added a commit to OpenDataServices/flatten-tool that referenced this issue Aug 20, 2020
Bjwebb added a commit to OpenDataServices/flatten-tool that referenced this issue Aug 20, 2020
Bjwebb added a commit to OpenDataServices/flatten-tool that referenced this issue Aug 20, 2020
Bjwebb added a commit to OpenDataServices/flatten-tool that referenced this issue Aug 20, 2020
@kd-ods
Copy link
Collaborator

kd-ods commented Aug 20, 2020

@Bjwebb - well that's working as expected now - at least with the ZCMC data. And I can round trip from Excel-> BODS JSON and back again. Lovely.

@Bjwebb
Copy link
Collaborator

Bjwebb commented Sep 8, 2020

This is now deployed to the live instance https://datareview.openownership.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants