Skip to content

Commit

Permalink
Update documentation for new AD settings (#4835)
Browse files Browse the repository at this point in the history
* Update documentation for new AD settings

Signed-off-by: Jonah Calvo <caljonah@amazon.com>

* update wording for verbose

Signed-off-by: Jonah Calvo <caljonah@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Jonah Calvo <jonah.calvo@gmail.com>

* Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Jonah Calvo <jonah.calvo@gmail.com>

* Update _data-prepper/pipelines/configuration/processors/anomaly-detector.md

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Jonah Calvo <jonah.calvo@gmail.com>

* Remove 'few' from description

Signed-off-by: Jonah Calvo <caljonah@amazon.com>

---------

Signed-off-by: Jonah Calvo <caljonah@amazon.com>
Signed-off-by: Jonah Calvo <jonah.calvo@gmail.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
  • Loading branch information
JonahCalvo and vagimeli authored Sep 21, 2023
1 parent 2eb81a3 commit 7eceb2b
Showing 1 changed file with 5 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ You can configure the anomaly detector processor by specifying a key and the opt
| :--- | :--- | :--- |
| `keys` | Yes | A non-ordered `List<String>` that is used as input to the ML algorithm to detect anomalies in the values of the keys in the list. At least one key is required.
| `mode` | Yes | The ML algorithm (or model) used to detect anomalies. You must provide a mode. See [random_cut_forest mode](#random_cut_forest-mode).
| `identification_keys` | No | If provided, anomalies will be detected within each unique instance of this key. For example, if you provide the `ip` field, anomalies will be detected separately for each unique IP address.
| `cardinality_limit` | No | If using the `identification_keys` settings, a new ML model will be created for every degree of cardinality. This can cause a large amount of memory usage, so it is helpful to set a limit on the number of models. Default limit is 5000.
| `verbose` | No | RCF will try to automatically learn and reduce the number of anomalies detected. For example, if latency is consistently between 50 and 100, and then suddenly jumps to around 1000, only the first one or two data points after the transition will be detected (unless there are other spikes/anomalies). Similarly, for repeated spikes to the same level, RCF will likely eliminate many of the spikes after a few initial ones. This is because the default setting is to minimize the number of alerts detected. Setting the `verbose` setting to `true` will cause RCF to consistently detect these repeated cases, which may be useful for detecting anomalous behavior that lasts an extended period of time.


### Keys

Expand Down Expand Up @@ -69,4 +73,4 @@ ad-pipeline:
When you run the anomaly detector processor, the processor extracts the value for the `latency` key, and then passes the value through the RCF ML algorithm. You can configure any key that comprises integers or real numbers as values. In the following example, you can configure `bytes` or `latency` as the key for an anomaly detector.

`{"ip":"1.2.3.4", "bytes":234234, "latency":0.2}`
`{"ip":"1.2.3.4", "bytes":234234, "latency":0.2}`

0 comments on commit 7eceb2b

Please sign in to comment.