Skip to content

Commit

Permalink
add source docs
Browse files Browse the repository at this point in the history
  • Loading branch information
maksenius committed Nov 22, 2022
1 parent 57ffb3d commit f3e0486
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ jobs:
- name: golangci-lint
uses: golangci/golangci-lint-action@v3.1.0
with:
version: v1.48.0
version: v1.50.1
85 changes: 84 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Run `make test` to run all the unit and integration tests.

## Destination

The DB2 Destination takes a `sdk.Record` and parses it into a valid SQL query.
The SQL Server Destination takes a `sdk.Record` and parses it into a valid SQL query.

### Configuration Options

Expand All @@ -37,3 +37,86 @@ The DB2 Destination takes a `sdk.Record` and parses it into a valid SQL query.
If a record contains a `sqlserver.table` property in its metadata it will be inserted in that table, otherwise it will fall back
to use the table configured in the connector. Thus, a Destination can support multiple tables in a single connector,
as long as the user has proper access to those tables.

### Source

The source connects to the database using the provided connection and starts creating records for each table row
and each detected change.

### Configuration options

| Name | Description | Required | Example |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|------------------------------------------------|
| `connection` | String line for connection to SQL SERVER. More information about it [Connection](https://github.com/denisenkom/go-mssqldb#the-connection-string-can-be-specified-in-one-of-three-formats) | **true** | sqlserver://sa:password@0.0.0.0?database=mydb |
| `table` | The name of a table in the database that the connector should write to, by default. | **true** | users |
| `primaryKey` | Column name that records should use for their `Key` fields. | **true** | id |
| `orderingColumn` | The name of a column that the connector will use for ordering rows. Its values must be unique and suitable for sorting, otherwise, the snapshot won't work correctly. | **true** | id |
| `column` | Comma separated list of column names that should be included in each Record's payload. If the field is not empty it must contain values of the `primaryKey` and `orderingColumn` fields. By default: all rows | false | id,name,age |
| `batchSize` | Size of rows batch. By default is 1000 | false | 100 |

### Snapshot
First time when the snapshot iterator starts work, it is get max value from `orderingColumn` and saves this value to position.
The snapshot iterator reads all rows, where `orderingColumn` values less or equal maxValue, from the table in batches.


Values in the ordering column must be unique and suitable for sorting, otherwise, the snapshot won't work correctly.
Iterators saves last processed value from `orderingColumn` column to position to field `SnapshotLastProcessedVal`.
If snapshot stops it will parse position from last record and will try gets row where `{{orderingColumn}} > {{position.SnapshotLastProcessedVal}}`


When all records are returned, the connector switches to the CDC iterator.

### Change Data Capture (CDC)

This connector implements CDC features for DB2 by adding a tracking table and triggers to populate it. The tracking
table has the same name as a target table with the prefix `CONDUIT_TRACKING_`. The tracking table has all the
same columns as the target table plus three additional columns:

| name | description |
|---------------------------------|------------------------------------------------------|
| `CONDUIT_TRACKING_ID` | Autoincrement index for the position. |
| `CONDUIT_OPERATION_TYPE` | Operation type: `insert`, `update`, or `delete`. |
| `CONDUIT_TRACKING_CREATED_DATE` | Date when the event was added to the tacking table. |


Triggers have name pattern `CONDUIT_TRIGGER_{{operation_type}}_{{table}}`.


Queries to retrieve change data from a tracking table are very similar to queries in a Snapshot iterator, but with
`CONDUIT_TRACKING_ID` ordering column.

CDC iterator periodically clears rows which were successfully applied from tracking table.
It collects `CONDUIT_TRACKING_ID` inside the `Ack` method into a batch and clears the tracking table every 5 seconds.

Iterator saves the last `CONDUIT_TRACKING_ID` to the position from the last successfully recorded row.

If connector stops, it will parse position from the last record and will try
to get row where `{{CONDUIT_TRACKING_ID}}` > `{{position.CDCLastID}}`.


### CDC FAQ

#### Is it possible to add/remove/rename column to table?

Yes. You have to stop the pipeline and do the same with conduit tracking table.
For example:
```sql
ALTER TABLE CLIENTS
ADD COLUMN address VARCHAR(18);

ALTER TABLE CONDUIT_TRACKING_CLIENTS
ADD COLUMN address VARCHAR(18);
```

#### I accidentally removed tracking table.

You have to restart pipeline, tracking table will be recreated by connector.

#### I accidentally removed table.

You have to stop the pipeline, remove the conduit tracking table, and then start the pipeline.

#### Is it possible to change table name?

Yes. Stop the pipeline, change the value of the `table` in the Source configuration,
change the name of the tracking table using a pattern `CONDUIT_TRACKING_{{TABLE}}`

0 comments on commit f3e0486

Please sign in to comment.