Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Emitter not working pypy v0.14.1.3 throws unrecognized field found but not allowed #11679

Open
jakobhanna opened this issue Oct 21, 2024 · 5 comments
Labels
bug Bug report

Comments

@jakobhanna
Copy link
Contributor

jakobhanna commented Oct 21, 2024

Describe the bug
Running Datahub Server on Version 14.0.1 and Test on 14.1 we run into following bug since today. Probably a Problem with the pip acryl-datahub Project. Friday there was a shift from v0.14.1.2 to v0.14.1.3.

When I emit Dashboards as MCP events to the Datahub GMS Server in get this error:

ERROR:root:('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422})
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 332, in _emit_generic
    response.raise_for_status()
  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: http://172.16.16.246:8080/aspects?action=ingestProposal

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/src/server.py", line 89, in emit
    emitter.emit(element)
  File "/opt/conda/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 228, in emit
    self.emit_mcp(item, async_flag=async_flag)
  File "/opt/conda/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 275, in emit_mcp
    self._emit_generic(url, payload)
  File "/opt/conda/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 349, in _emit_generic
    raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422})

To Reproduce
Steps to reproduce the behavior:

  1. Use Python SDK API Emitter
  2. Emit Entity like:
MetadataChangeProposalWrapper(entityType='dashboard', changeType='UPSERT', entityUrn='urn:li:dashboard:(dash,XXX)', entityKeyAspect=None, auditHeader=None, aspectName='domains', aspect=DomainsClass({'domains': ['urn:li:domain:XXX']}), systemMetadata=SystemMetadataClass({'lastObserved': 1729512345052, 'runId': '__DEFAULT_RUN_ID', 'lastRunId': 'no-run-id-provided', 'pipelineName': None, 'registryName': None, 'registryVersion': None, 'properties': None, 'version': None}))
  1. See error

Expected behavior
As before emitting Dashboard with the SDK should work.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version Datahub Server on Version 14.0.1 and Test on 14.1, SDK acryl-datahub 0.14.1.3
@jakobhanna jakobhanna added the bug Bug report label Oct 21, 2024
@jakobhanna jakobhanna changed the title A short description of the bug Python Emitter not working Oct 21, 2024
@jakobhanna
Copy link
Contributor Author

jakobhanna commented Oct 21, 2024

Can confirm setting pip packages to: - acryl-datahub[datahub-rest, json-schema, metabase] <= 0.14.1.2 error does not persist. With Datahub Server version 14.0.1 and 14.1.

@jakobhanna jakobhanna changed the title Python Emitter not working Python Emitter not working pypy v0.14.1.3 throws unrecognized field found but not allowed Oct 22, 2024
@remisalmon
Copy link
Contributor

Same issue here with datahub v0.14.1, acryl-datahub==0.14.1 is the only pip package that does not throw those errors. All the packages from acryl-datahub==0.14.1.1 to acryl-datahub==0.14.1.3 do, and that latest version produces a lot more errors...

MatMoore added a commit to ministryofjustice/data-catalogue that referenced this issue Oct 30, 2024
For some reason 0.14.1.5 is not compatable with 0.14.1 of the server.

Example errors:

    2024-10-30 14:59:01,029] ERROR    {datahub.ingestion.run.pipeline:77} -  failed to write record with workunit urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties with ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:c56575847879a9c23df584253cd14d8f', 'workunit_id': 'urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:c56575847879a9c23df584253cd14d8f', 'workunit_id': 'urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties'}

    [2024-10-30 15:07:06,765] ERROR    {datahub.ingestion.run.pipeline:77} -  failed to write record with workunit urn:li:tag:dc_display_in_catalogue-tagKey with ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:tag:dc_display_in_catalogue', 'workunit_id': 'urn:li:tag:dc_display_in_catalogue-tagKey'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:tag:dc_display_in_catalogue', 'workunit_id': 'urn:li:tag:dc_display_in_catalogue-tagKey'}

Similar issues are being reported here datahub-project/datahub#11679

0.14.1.2 seems to work.
MatMoore added a commit to ministryofjustice/data-catalogue that referenced this issue Oct 30, 2024
For some reason 0.14.1.5 is not compatable with 0.14.1 of the server.

Example errors:

    2024-10-30 14:59:01,029] ERROR    {datahub.ingestion.run.pipeline:77} -  failed to write record with workunit urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties with ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:c56575847879a9c23df584253cd14d8f', 'workunit_id': 'urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:c56575847879a9c23df584253cd14d8f', 'workunit_id': 'urn:li:container:c56575847879a9c23df584253cd14d8f-containerProperties'}

    [2024-10-30 15:07:06,765] ERROR    {datahub.ingestion.run.pipeline:77} -  failed to write record with workunit urn:li:tag:dc_display_in_catalogue-tagKey with ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:tag:dc_display_in_catalogue', 'workunit_id': 'urn:li:tag:dc_display_in_catalogue-tagKey'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.dashboard.DashboardInfo: ERROR :: /dashboards :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:tag:dc_display_in_catalogue', 'workunit_id': 'urn:li:tag:dc_display_in_catalogue-tagKey'}

Similar issues are being reported here datahub-project/datahub#11679

0.14.1.2 seems to work.
MatMoore added a commit to ministryofjustice/data-catalogue that referenced this issue Oct 31, 2024
No official tag was published for the v0.14.1 release
datahub-project/datahub#11655

However, I suspect that leaving this at the previous version may have
been the wrong thing to do, as we are now experiencing compatability
issues with the python package, similar to
datahub-project/datahub#11679

I'm going to try updating to the latest tag and see if it resolves the
comptability issues on v0.14.1.5 of the python package.
@MatMoore
Copy link

We are encountering a similar issue with a custom source we've written using the python package.

We create the container metadata using the mcp_builder like this

            yield from mcp_builder.gen_containers(
                container_key=database_container_key,
                name=database_name,
                sub_types=sub_types,
                domain_urn=domain_urn,
                external_url=None,
                description=database_description,
                created=None,
                last_modified=last_modified,
                tags=display_tag,
                owner_urn=owner_urn,
                qualified_name=None,
                extra_properties=db_meta_dict,
            )

On the most recent version of the python package, it logs the following, even though the server is updated to version v0.14.1.

[2024-10-31 09:29:24,753] ERROR    {datahub.ingestion.run.pipeline:77} -  failed to write record with workunit urn:li:container:47cc53f1073124f8906f48248324a4f6-containerProperties with ('Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client: Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:47cc53f1073124f8906f48248324a4f6', 'workunit_id': 'urn:li:container:47cc53f1073124f8906f48248324a4f6-containerProperties'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.container.ContainerProperties: ERROR :: /env :: unrecognized field found but not allowed\n', 'status': 422, 'urn': 'urn:li:container:47cc53f1073124f8906f48248324a4f6', 'workunit_id': 'urn:li:container:47cc53f1073124f8906f48248324a4f6-containerProperties'}

We have the same results as @remisalmon - v0.14.1 works as expected, but any later versions trigger these validation errors.

@david-leifker
Copy link
Collaborator

Next server side release will address this as well as add a way to ignore unknown fields from future consumers.

@zilnus
Copy link

zilnus commented Nov 1, 2024

Same issue with dbt core ingestion = "Unable to emit metadata to DataHub GMS". Please use acryl-datahub==0.14.0 and acryl-datahub[dbt]==0.14.0 so the dbt core ingestion can work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

5 participants