Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for setting column type in Clickhouse. #16702

Closed
wants to merge 30 commits into from

Conversation

subkanthi
Copy link
Member

@subkanthi subkanthi commented Mar 24, 2023

Added support for setting column type in clickhouse
Relates to #15515

Release notes

(x) Release notes are required, with the following suggested text:

# ClickHouse
* Add support for changing column types. ({issue}`15515`)

@cla-bot
Copy link

cla-bot bot commented Mar 24, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you submit CLA if you haven't sent it?

@cla-bot
Copy link

cla-bot bot commented Mar 25, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Mar 25, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

.matches("VALUES bigint '123'");
}
String tableName = "test_set_column_type";
assertUpdate("CREATE TABLE " + tableName + " (a bigint, b double NOT NULL, c varchar(50)) WITH (order_by=ARRAY['b'], engine = 'MergeTree')");
Copy link
Member

@ebyhr ebyhr Mar 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't put ClickHouse specific code to BaseConnectorTest. You can add a new test method to BaseClickHouseConnectorTest.

Copy link
Member Author

@subkanthi subkanthi Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my mistake, I didnt realize the inheritance, will try to override, but in general most of these tests are failing because MODIFY COLUMN is not supported in the default engine type Log. So Im trying to create a table of engine MergeTree.

assertUpdate("ALTER TABLE " + tableName + " ALTER COLUMN a SET DATA TYPE varchar(50)");

assertEquals(getColumnType(tableName, "a"), "varchar");
assertThat((String) computeScalar("show create table " + tableName)).contains("CREATE TABLE clickhouse.tpch.test_set_column_type (\n" +
Copy link
Member

@ebyhr ebyhr Mar 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Uppercase show create table. I would recommend verifying the table definition on ClikcHouse instead of Trino because the connector doesn't support reading all column definitions.

String tableName = "test_set_column_type";
assertUpdate("CREATE TABLE " + tableName + " (a bigint, b double NOT NULL, c varchar(50)) WITH (order_by=ARRAY['b'], engine = 'MergeTree')");

assertUpdate("ALTER TABLE " + tableName + " ALTER COLUMN a SET DATA TYPE varchar(50)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column a doesn't have any definition. The new doesn't ensure if SET DATA TYPE preserved a column definition correctly or not because it created the table with table level property. I would recommend creating a table on Clickhouse so that we can specify column definitions (default_expr, codec, TTL).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing if the NOT NULL is retained, will check for default and others.

@cla-bot
Copy link

cla-bot bot commented Mar 29, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link

cla-bot bot commented Mar 29, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link

cla-bot bot commented Mar 29, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link

cla-bot bot commented Mar 29, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ebyhr
Copy link
Member

ebyhr commented Mar 30, 2023

Please feel free to ping me in Slack https://trino.io/slack.html if you need a help.

@cla-bot cla-bot bot added the cla-signed label Apr 8, 2023
@subkanthi
Copy link
Member Author

Please feel free to ping me in Slack https://trino.io/slack.html if you need a help.

Hi @ebyhr , the testSetColumnTypes tests are failing , I couldn't figure out the right syntax to do a

AS SELECT CAST() Engine=MergeTree

in ClickHouse, not even sure if its supported in CH, the parser fails because of the brackets. it works without Engine as it defaults to Log, but Log does not support MODIFY COLUMN.
Thoughts?

@ebyhr
Copy link
Member

ebyhr commented Apr 9, 2023

I couldn't figure out the right syntax to do a
AS SELECT CAST() Engine=MergeTree

Why the cast doesn't contain value and type?

By the way, please set code style configuration in your IDE.
https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#code-style

.add(new SetColumnTypeSetup("decimal(5,3)", "12.345", "decimal(10,3)")) // short decimal -> short decimal
.add(new SetColumnTypeSetup("decimal(28,3)", "12.345", "decimal(38,3)")) // long decimal -> long decimal
.add(new SetColumnTypeSetup("decimal(5,3)", "12.345", "decimal(38,3)")) // short decimal -> long decimal
.add(new SetColumnTypeSetup("decimal(5,3)", "12.340", "decimal(5,2)"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time, timestamp, addrow were removed since it looks like the translation to clickhouse is not implemented.

@subkanthi subkanthi requested a review from ebyhr April 26, 2023 23:42
assertUpdate("ALTER TABLE " + tableName + " ALTER COLUMN col SET DATA TYPE integer");

assertEquals(getColumnType(tableName, "col"), "integer");
assertQuery("SELECT col FROM " + tableName, "VALUES -1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug. We may need to deny such operations. Could you find a GitHub issue of ClickHouse about this?

Copy link
Member Author

@subkanthi subkanthi Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int seems to map to Int32 in ClickHouse, the max value in ClickHouse for Int32 is 2147483647.
https://clickhouse.com/docs/en/sql-reference/data-types/int-uint
Clickhouse version: 21.11.10.1

┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE tpch.test_set_column_out_of_rangekgarmodrdz
(
    `col` Int32,
    `col2` Int32
)
ENGINE = MergeTree
ORDER BY col2
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's deny changing such types.


@DataProvider
@Override
public Object[][] setColumnTypesDataProvider()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to override this method? Please override filterSetColumnTypesDataProvider instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overriding filterSetColumnTypesDataProvider doesnt seem to work.
Because the setColumnTypeSetupData is called from setColumnTypesDataProvider

    public Object[][] setColumnTypesDataProvider()
    {
        return setColumnTypeSetupData().stream()
                .map(this::filterSetColumnTypesDataProvider)
                .flatMap(Optional::stream)
                .collect(toDataProvider());
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at other connectors. e.g. TestIcebergParquetConnectorTest

@subkanthi subkanthi requested a review from ebyhr May 1, 2023 22:22
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into one.


@Test
@Override
public void testSetColumnType()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder.

public void testSetColumnType()
{
String tableName = "test_set_column_type" + randomNameSuffix();
assertUpdate("CREATE TABLE " + tableName + " (a bigint, b double NOT NULL, c varchar(50)) WITH (order_by=ARRAY['b'], engine = 'MergeTree')");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use TestTable instead. Other places are also the same.

assertUpdate("CREATE TABLE " + tableName + " (a bigint, b double NOT NULL, c varchar(50)) WITH (order_by=ARRAY['b'], engine = 'MergeTree')");
assertUpdate("ALTER TABLE " + tableName + " ALTER COLUMN a SET DATA TYPE varchar(50)");
assertEquals(getColumnType(tableName, "a"), "varchar");
assertThat((String) computeScalar("show create table " + tableName)).contains("CREATE TABLE " + "clickhouse.tpch." + tableName + " (\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this assertion which doesn't exist in the base test method was added. Please extract a test method.

Comment on lines +955 to +956
assertUpdate("CREATE TABLE " + tableName + " (col bigint, col2 int not null) WITH (order_by=ARRAY['col2'], engine = 'MergeTree')");
assertUpdate("ALTER TABLE " + tableName + " ALTER COLUMN col SET DATA TYPE integer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How ClickHouse handle values which can't cast to integer? e.g. test string
Probably, we shouldn't allow changing the type from varchar to numeric types in the connector.

@Override
protected Optional<SetColumnTypeSetup> filterSetColumnTypesDataProvider(SetColumnTypeSetup setup)
{
if (setup.sourceColumnType().equals("tinyint")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use switch and leave the reason for those change to each block.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 17, 2024
@mosabua
Copy link
Member

mosabua commented Jan 17, 2024

👋 @subkanthi @ebyhr - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@github-actions github-actions bot removed the stale label Jan 18, 2024
Copy link

github-actions bot commented Feb 9, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 9, 2024
Copy link

github-actions bot commented Mar 1, 2024

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants