You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running ingestion for gcs with stateful_ingestion: enabled: true, I get the error
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (gcs): Checkpointing provider DatahubIngestionCheckpointingProvider already registered.
Even if disabling stateful ingestion, the log will show the lines
INFO {datahub.ingestion.source.state.stateful_ingestion_base:241} - Stateful ingestion will be automatically enabled, as datahub-rest sink is used or `datahub_api` is specified
[...]
INFO {datahub.ingestion.run.pipeline:571} - Processing commit request for DatahubIngestionCheckpointingProvider. Commit policy = CommitPolicy.ALWAYS, has_errors=False, has_warnings=False
WARNING {datahub.ingestion.source.state_provider.datahub_ingestion_checkpointing_provider:95} - No state available to commit for DatahubIngestionCheckpointingProvider
INFO {datahub.ingestion.run.pipeline:591} - Successfully committed changes for DatahubIngestionCheckpointingProvider.
To Reproduce
Prerequisites:
gcs bucket
service account with Storage Object Viewer on the bucket
Expected behavior
The pipeline should run without errors and write the state correctly, removing any stale metadata if configured.
Additional context
I think I have already found a fix that I will commit in the near future. The root cause is the creation of a equivalent s3 source that leads to re-registering of the checkpointing provider. Also, the platform attribute is missing which leads to the state being written to the platform "default".
The text was updated successfully, but these errors were encountered:
Describe the bug
When running ingestion for gcs with stateful_ingestion: enabled: true, I get the error
Even if disabling stateful ingestion, the log will show the lines
To Reproduce
Prerequisites:
Steps to reproduce the behavior:
datahub ingest run -c <my minimal recipe>.yml
Expected behavior
The pipeline should run without errors and write the state correctly, removing any stale metadata if configured.
Additional context
I think I have already found a fix that I will commit in the near future. The root cause is the creation of a equivalent s3 source that leads to re-registering of the checkpointing provider. Also, the platform attribute is missing which leads to the state being written to the platform "default".
The text was updated successfully, but these errors were encountered: