-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support time travel in Hudi connector #15084
Conversation
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiErrorCode.java
Outdated
Show resolved
Hide resolved
...in/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedDirectoryLister.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiCopyOnWriteConnectorTest.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestQueryFramework.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/H2QueryRunner.java
Outdated
Show resolved
Hide resolved
This isn't a time travel feature. Please take a look at Iceberg page in the issue. |
@ebyhr Thanks for pointing to the commit. Time travel is just a fancy name for a versioned snapshot query. While we're using |
@codope Yes. |
Got it. I think this commit is more relevant for us 1c23c69 which adds the necessary connector metadata APIs to support versioned snapshot queries. If we implement those APIs in |
a6c49d2
to
87553d8
Compare
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
When are we expecting this to be merged ? |
It depends on @albericgenius. I've waited that @albericgenius addresses comments. |
Got you, I will continue to work on this. |
Hey @albericgenius - Any update on this? |
87553d8
to
4c6c0d6
Compare
I updated a new revision, removed the Hudi specific code in the planner |
4c6c0d6
to
147db05
Compare
147db05
to
4b5412d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add "Time travel queries" section to hudi.rst
likes Iceberg documentation? It would be nice to mention the way to find versions from existing Hudi tables.
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/BaseHudiConnectorTest.java
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java
Outdated
Show resolved
Hide resolved
4b5412d
to
719cd8a
Compare
719cd8a
to
7c7cac5
Compare
7c7cac5
to
56975af
Compare
56975af
to
e0a084c
Compare
e0a084c
to
da2b230
Compare
@ebyhr i used the UI merge tool to resolve conflict issue, after that there is CI error: PR requires a rebase. Found: 1 merge.commit. and the cassandra CI test is not relative to my modification. |
Please rebase on master instead. |
f13e1d7
to
980d11a
Compare
Got you |
I guess this is close to getting merged 👀 |
@ebyhr still need your help :) |
980d11a
to
350bf9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase on master to resolve confclits.
@@ -220,6 +220,46 @@ The output of the query has the following columns: | |||
- ``varchar`` | |||
- Current state of the instant | |||
|
|||
Rolling back to a previous version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mosabua Could you please review this docs?
String after = formatter.format(new Date()); | ||
|
||
try { | ||
assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + before)).hasNoRows(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to get the version from $timeline
table.
onHudi().executeQuery("ALTER TABLE default." + tableName + " DROP COLUMNS (new_col)"); | ||
onHudi().executeQuery("INSERT INTO default." + tableName + " VALUES (4, 'a4', 20, 1000)"); | ||
String afterDropColumn = formatter.format(new Date()); | ||
assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + afterDropColumn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions having FOR VERSION AS OF
specifies the latest version. That's not a common scenario when using time travel. I would recommend verifying with the past version. Also, it should verify the table definition.
builder.addConnector( | ||
"hive", | ||
forHostPath(configDir.getPath("hive.properties")), | ||
CONTAINER_TRINO_ETC + "/catalog/hive.properties"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Revert unrelated change.
@Test(groups = {HUDI, PROFILE_SPECIFIC_TESTS}) | ||
public void testTimeTravelQuery() | ||
{ | ||
String tableName = "test_hudi_cow_select_session_props" + randomNameSuffix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the table name contains select_session_props
?
internally used for providing the previous state of the table:: | ||
|
||
SELECT * | ||
FROM example.testdb.customer_orders FOR TIMESTAMP AS OF TIMESTAMP '2022-03-23 09:59:29.803 Europe/Vienna' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR contain a test case for FOR TIMESTAMP AS OF TIMESTAMP
?
formatter.setTimeZone(TimeZone.getTimeZone(ZoneOffset.UTC)); | ||
String before = formatter.format(new Date()); | ||
onHudi().executeQuery("SET hoodie.schema.on.read.enable=true"); | ||
createNonPartitionedTable(tableName, COW_TABLE_TYPE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add another test for MOR table type.
@ebyhr @albericgenius I have also implemented the time travel feature in my forked repo and the branch was cut from 418 tag, should i raise the PR to incorporate the feature? Please let me know if it sounds good i will raise the PR with the desired changes |
needs rebase |
any word on this getting released - we've built an EntityMatch feature into InsuranceLake which stores data in an an Entity Primary Table that uses Hudi. We'd like to be able to use Athena to do 'Time Travel'... |
Is this getting any love anytime soon? |
👋 @albericgenius - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this PR due to inactivity. Anyone interested please feel free to pick it back up and continue work on it here or in a new PR. |
Description
Fix #15003
Additional context and related issues
https://issues.apache.org/jira/browse/HUDI-2692
https://issues.apache.org/jira/browse/HUDI-2693
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: