-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5570311
commit 78e8d51
Showing
1 changed file
with
41 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: Deduplication | ||
--- | ||
|
||
The `Deduplication` feature in Fluvio is the way to deduplicate records based on their keys. | ||
|
||
Deduplication can be used on a topic directly, or it can be used on connectors. It will drop the duplicate records on a configured window. | ||
|
||
This window is defined by the parameters passed to the filter smartmodule. | ||
|
||
## Example topic config | ||
|
||
Example configuration on topic: | ||
|
||
%copy% | ||
```yaml | ||
# topic.yaml | ||
version: 0.1.0 | ||
meta: | ||
name: topic-with-dedup | ||
deduplication: | ||
bounds: | ||
count: 5 # remember at least 5 last records | ||
age: 5s # remember records for at least 5 seconds | ||
filter: | ||
transform: | ||
uses: infinyon-labs/dedup-filter@0.0.1 | ||
``` | ||
A topic can be created using this config file like so: | ||
%copy first-line% | ||
```bash | ||
$ fluvio topic create -c topic.yaml | ||
``` | ||
|
||
## Parameters | ||
|
||
| Parameter | default | type | optional | description | | ||
|:-------------|:--------| :--- | :--- |:------------------------------------------------------| | ||
| count | - | Integer | false | Minimum number of records the filter will remember. It doesn't guarantee to remember records that came `count` records before now. | | ||
| age | - | Integer | true | Minimum amount of time this filter will remember a record for. It can be specified using this format: `15days 2min 2s`, or `2min 5s`, or `15ms` | | |