Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support time travel in Hudi connector #15084

Closed

Conversation

albericgenius
Copy link
Contributor

@albericgenius albericgenius commented Nov 17, 2022

Description

Fix #15003

  • This PR only implement the incremental query for COW table.
  • Will discuss and implement the MOR table after MOR reading merged.
  • This PR depend on the TPCH modification, and the PR have not passed yet. so the build probably will be failed. :)

Additional context and related issues

https://issues.apache.org/jira/browse/HUDI-2692
https://issues.apache.org/jira/browse/HUDI-2693

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`15003 `)

@cla-bot cla-bot bot added the cla-signed label Nov 17, 2022
@ebyhr
Copy link
Member

ebyhr commented Nov 18, 2022

Fix #15003

This isn't a time travel feature. Please take a look at Iceberg page in the issue.

plugin/trino-hudi/pom.xml Outdated Show resolved Hide resolved
@albericgenius albericgenius changed the title Support HUDI incremental query Support HUDI incremental query for COW table Nov 18, 2022
@codope
Copy link
Contributor

codope commented Nov 18, 2022

Fix #15003

This isn't a time travel feature. Please take a look at Iceberg page in the issue.

@ebyhr Thanks for pointing to the commit. Time travel is just a fancy name for a versioned snapshot query. While we're using _hoodie_commit_time as a filter in this PR, AS OF semantics in SQL is much better suited for this feature. I couldn't find SELECT ... AS OF in Trino SQL docs. Is it only supported for Iceberg connector?

@ebyhr
Copy link
Member

ebyhr commented Nov 18, 2022

Is it only supported for Iceberg connector?

@codope Yes.

@codope
Copy link
Contributor

codope commented Nov 18, 2022

Got it. I think this commit is more relevant for us 1c23c69 which adds the necessary connector metadata APIs to support versioned snapshot queries. If we implement those APIs in HudiMetadata, we can run SELECT ... AS OF queries through Hudi connector. cc @albericgenius

@albericgenius albericgenius force-pushed the incremental_query branch 2 times, most recently from a6c49d2 to 87553d8 Compare December 1, 2022 08:18
@ebyhr ebyhr changed the title Support HUDI incremental query for COW table Support time travel in Hudi connector Dec 6, 2022
@maddy2u
Copy link

maddy2u commented Jan 3, 2023

When are we expecting this to be merged ?

@ebyhr
Copy link
Member

ebyhr commented Jan 3, 2023

It depends on @albericgenius. I've waited that @albericgenius addresses comments.

@albericgenius
Copy link
Contributor Author

It depends on @albericgenius. I've waited that @albericgenius addresses comments.

Got you, I will continue to work on this.

@maddy2u
Copy link

maddy2u commented Jan 28, 2023

Hey @albericgenius - Any update on this?

@albericgenius
Copy link
Contributor Author

Hey @albericgenius - Any update on this?

I updated a new revision, removed the Hudi specific code in the planner

@ebyhr @electrum @codope Thanks for your coaching and time.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add "Time travel queries" section to hudi.rst likes Iceberg documentation? It would be nice to mention the way to find versions from existing Hudi tables.

@github-actions github-actions bot added the docs label Feb 16, 2023
@albericgenius
Copy link
Contributor Author

@ebyhr i used the UI merge tool to resolve conflict issue, after that there is CI error: PR requires a rebase. Found: 1 merge.commit. and the cassandra CI test is not relative to my modification.

@ebyhr
Copy link
Member

ebyhr commented Apr 6, 2023

Please rebase on master instead.

@albericgenius
Copy link
Contributor Author

Please rebase on master instead.

Got you

@nikoshet
Copy link

I guess this is close to getting merged 👀

@albericgenius
Copy link
Contributor Author

@ebyhr still need your help :)

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase on master to resolve confclits.

@@ -220,6 +220,46 @@ The output of the query has the following columns:
- ``varchar``
- Current state of the instant

Rolling back to a previous version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosabua Could you please review this docs?

String after = formatter.format(new Date());

try {
assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + before)).hasNoRows();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to get the version from $timeline table.

onHudi().executeQuery("ALTER TABLE default." + tableName + " DROP COLUMNS (new_col)");
onHudi().executeQuery("INSERT INTO default." + tableName + " VALUES (4, 'a4', 20, 1000)");
String afterDropColumn = formatter.format(new Date());
assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + afterDropColumn))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions having FOR VERSION AS OF specifies the latest version. That's not a common scenario when using time travel. I would recommend verifying with the past version. Also, it should verify the table definition.

builder.addConnector(
"hive",
forHostPath(configDir.getPath("hive.properties")),
CONTAINER_TRINO_ETC + "/catalog/hive.properties");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Revert unrelated change.

@Test(groups = {HUDI, PROFILE_SPECIFIC_TESTS})
public void testTimeTravelQuery()
{
String tableName = "test_hudi_cow_select_session_props" + randomNameSuffix();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the table name contains select_session_props?

internally used for providing the previous state of the table::

SELECT *
FROM example.testdb.customer_orders FOR TIMESTAMP AS OF TIMESTAMP '2022-03-23 09:59:29.803 Europe/Vienna'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR contain a test case for FOR TIMESTAMP AS OF TIMESTAMP?

formatter.setTimeZone(TimeZone.getTimeZone(ZoneOffset.UTC));
String before = formatter.format(new Date());
onHudi().executeQuery("SET hoodie.schema.on.read.enable=true");
createNonPartitionedTable(tableName, COW_TABLE_TYPE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another test for MOR table type.

@ravjot28
Copy link

ravjot28 commented Jun 12, 2023

@ebyhr @albericgenius I have also implemented the time travel feature in my forked repo and the branch was cut from 418 tag, should i raise the PR to incorporate the feature?

Please let me know if it sounds good i will raise the PR with the desired changes

@tooptoop4
Copy link
Contributor

needs rebase

@jlucking2023
Copy link

any word on this getting released - we've built an EntityMatch feature into InsuranceLake which stores data in an an Entity Primary Table that uses Hudi. We'd like to be able to use Athena to do 'Time Travel'...

@aviral-nayya
Copy link

Is this getting any love anytime soon?

@mosabua
Copy link
Member

mosabua commented Jan 11, 2024

👋 @albericgenius - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

Also fyi @codope @brandyml

Copy link

github-actions bot commented Sep 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 4, 2024
@mosabua
Copy link
Member

mosabua commented Sep 6, 2024

Closing this PR due to inactivity. Anyone interested please feel free to pick it back up and continue work on it here or in a new PR.

@mosabua mosabua closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support time travel in Hudi connector