Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql-server Source #18

Merged
merged 17 commits into from
Aug 9, 2023
Merged

sql-server Source #18

merged 17 commits into from
Aug 9, 2023

Conversation

maksenius
Copy link
Contributor

@maksenius maksenius commented Dec 21, 2022

Description

add source connector

Fixes #38

Quick checks:

  • I have followed the Code Guidelines.
  • There is no other pull request for the same update/change.
  • I have written unit tests.
  • I have made sure that the PR is of reasonable size and can be easily reviewed.

* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock
* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock
* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock

* add combined iterator
* add configs

* Destination (#5)

* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock

* Destination (#7)

* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock

* Source (#8)

* add destination

* add update operation

* add tests and coltypes conversion

* remove primary key field

* add destination readme

* add mock

* add combined iterator

* add cdc iterator
* change key logic add tests

* change key logic, add tests
@maha-hajja maha-hajja mentioned this pull request Apr 12, 2023
@maha-hajja maha-hajja changed the title Source sql-server Source Apr 12, 2023
@AndryHTC
Copy link

AndryHTC commented Jul 5, 2023

@maksenius I'm really looking into this. What is missing?

@maha-hajja
Copy link
Contributor

Hello @AndryHTC
Our team didn't get to review this PR yet, since we had more important issues and features to work on.
We will revisit this in our sync meeting next week and update the PR accordingly.

However, if you'd like to test it out, you can checkout this branch, build the connector binary and use it in conduit. refer to our docs for more info about that.

@AndryHTC
Copy link

AndryHTC commented Jul 8, 2023

Great @maha-hajja

I'm going to play with it next week. I'm trying to track table wide changes, and query subscribed changes.

Thank you for your incredible contribution, keep up 🔥

@AndryHTC
Copy link

@maha-hajja

I attempted to test out the connector as you suggested, but encountered an error when adding it as a source. The error message is as follows:

/home/runner/work/conduit/conduit/pkg/connector/source.go:350 - create combined iterator: setup cdc: add id column: mssql: Per la tabella 'CONDUIT_TRACKING_STRUCTURE' sono state specificate più colonne Identity. È consentita una sola colonna Identity per tabella.: github.com/conduitio/conduit/pkg/plugin/standalone/v1.unwrapGRPCError /home/runner/work/conduit/conduit/pkg/plugin/standalone/v1/client.go:165

It appears to be an issue with multiple identity columns being specified for the 'CONDUIT_TRACKING_STRUCTURE' table. As far as I understand, only a single identity column is allowed per table in MS SQL.

Do you have any suggestions or workarounds for this issue?

Thank you in advance for your assistance.

@maha-hajja
Copy link
Contributor

@AndryHTC I'm not the one who developed this connector, but I did take a quick look.
the tracking table is created first using SELECT TOP 0 * INTO %s FROM %s query, which should copy the column names for the table that you provided to the connector, I did some research and this query shouldn't copy any restrictions like "identity", but maybe I'm wrong? so check your original table and maybe try without having any identity columns.
after creating the tracking table, some columns are added, and the failure is happening in line 170, file source/iterator/iterator.go

// add id column.
_, err = tx.ExecContext(ctx, fmt.Sprintf(queryAddIDColumn, c.trackingTable, columnTrackingID))
if err != nil {
return fmt.Errorf("add id column: %w", err)
}

which is trying to add the identity column using this query ALTER TABLE %s ADD %s BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY
hopefully this helps somehow, I'm aware this wouldn't solve the problem, but could help you debug if that's what you're aiming for.
this connector's code hasn't been reviewed by our team yet, but hopefully we'll get to it soon.
let me know how it goes!

@AndryHTC
Copy link

AndryHTC commented Jul 13, 2023

Hello @maha-hajja,

Thank you for the detailed explanation and the insights. I've done some more investigation based on your suggestions and here's what I found.

You're correct that the SELECT TOP 0 * INTO %s FROM %s query is used to create the tracking table. While this does create a table with the same structure, it actually does copy the identity property of the columns in SQL Server, which seems to be the root cause of the problem I'm experiencing.

The original table has an identity column, and the tracking table is being created with an identical structure, including the identity property on the same column. After that, the code tries to add another identity column to the tracking table, which is not allowed in SQL Server since a table can have only one identity column.

To resolve this issue, the creation of the tracking table must be adjusted to not copy the identity property from the original table's identity column. A potential solution could involve a series of SQL operations to "move" the identity property to another column in the table:

  1. Create a new non-identity column in the tracking table.
  2. Copy the data from the original identity column to the new column.
  3. Delete the original identity column.
  4. Rename the new column to the name of the original column.

Here's an example in SQL:

-- Step 1: Add new column
ALTER TABLE YourTable ADD NewColumn INT;

-- Step 2: Copy data
UPDATE YourTable SET NewColumn = IdentityColumn;

-- Step 3: Drop identity column
ALTER TABLE YourTable DROP COLUMN IdentityColumn;

-- Step 4: Rename new column
EXEC sp_rename 'YourTable.NewColumn', 'IdentityColumn', 'COLUMN';

I understand that this might involve some substantial changes to the code. I hope my findings and suggestions will be helpful for your team when you get to review this connector.

Thank you again for your assistance and I'm looking forward to future developments on this.

@maha-hajja
Copy link
Contributor

@AndryHTC
Thank you for the detailed description! now we know where the problem is.

as for the solution, I think the fact that not all tables have an identity column(which is probably the case the developers for this connector tested), makes us need to get to a new solution, because we can't assume that the identity column exists.

We could replace the logic for creating the tracking table with a method that copies only the column names and types, so a bit more lengthy but should work.

Thanks again for looking into this, and will let you know when any progress is made here. Also, any contributions to this repo or any other repos are welcome ;D

@maha-hajja maha-hajja self-assigned this Aug 7, 2023
@maha-hajja maha-hajja merged commit b8df26c into main Aug 9, 2023
2 checks passed
@maha-hajja maha-hajja deleted the source branch August 9, 2023 11:13
@maha-hajja
Copy link
Contributor

Hi @AndryHTC !
this PR was recently added to our board, reviewed and merged.
about the bug, we already have a PR to fix it #55 which will be merged soon.
you can try it out yourself! and let us know if any help is needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sql-server source
3 participants