Skip to content

Commit

Permalink
[DOC] Add ingest processors documentation (#4299)
Browse files Browse the repository at this point in the history
Created new documentation to close content gaps

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
  • Loading branch information
vagimeli committed Dec 20, 2023
1 parent a100d92 commit a29159c
Show file tree
Hide file tree
Showing 17 changed files with 1,629 additions and 268 deletions.
100 changes: 100 additions & 0 deletions _api-reference/ingest-apis/create-ingest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
layout: default
title: Create pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 10
redirect_from:
- /opensearch/rest-api/ingest-apis/create-update-ingest/
---

# Create pipeline

Use the create pipeline API operation to create or update pipelines in OpenSearch. Note that the pipeline requires you to define at least one processor that specifies how to change the documents.

## Path and HTTP method

Replace `<pipeline-id>` with your pipeline ID:

```json
PUT _ingest/pipeline/<pipeline-id>
```
#### Example request

Here is an example in JSON format that creates an ingest pipeline with two `set` processors and an `uppercase` processor. The first `set` processor sets the `grad_year` to `2023`, and the second `set` processor sets `graduated` to `true`. The `uppercase` processor converts the `name` field to uppercase.

```json
PUT _ingest/pipeline/my-pipeline
{
"description": "This pipeline processes student data",
"processors": [
{
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
```
{% include copy-curl.html %}

To learn more about error handling, see [Handling pipeline failures]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/pipeline-failures/).

## Request body fields

The following table lists the request body fields used to create or update a pipeline.

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`processors` | Required | Array of processor objects | An array of processors, each of which transforms documents. Processors are run sequentially in the order specified.
`description` | Optional | String | A description of your ingest pipeline.

## Path parameters

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`pipeline-id` | Required | String | The unique identifier, or pipeline ID, assigned to the ingest pipeline.

## Query parameters

Parameter | Required | Type | Description
:--- | :--- | :--- | :---
`cluster_manager_timeout` | Optional | Time | Period to wait for a connection to the cluster manager node. Defaults to 30 seconds.
`timeout` | Optional | Time | Period to wait for a response. Defaults to 30 seconds.

## Template snippets

Some processor parameters support [Mustache](https://mustache.github.io/) template snippets. To get the value of a field, surround the field name in three curly braces, for example, `{% raw %}{{{field-name}}}{% endraw %}`.

#### Example: `set` ingest processor using Mustache template snippet

The following example sets the field `{% raw %}{{{role}}}{% endraw %}` with a value `{% raw %}{{{tenure}}}{% endraw %}`:

```json
PUT _ingest/pipeline/my-pipeline
{
"processors": [
{
"set": {
"field": "{% raw %}{{{role}}}{% endraw %}",
"value": "{% raw %}{{{tenure}}}{% endraw %}"
}
}
]
}
```
{% include copy-curl.html %}
79 changes: 0 additions & 79 deletions _api-reference/ingest-apis/create-update-ingest.md

This file was deleted.

43 changes: 13 additions & 30 deletions _api-reference/ingest-apis/delete-ingest.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,27 @@
---
layout: default
title: Delete a pipeline
parent: Ingest APIs
nav_order: 14
title: Delete pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 13
redirect_from:
- /opensearch/rest-api/ingest-apis/delete-ingest/
---

# Delete a pipeline
# Delete pipeline

If you no longer want to use an ingest pipeline, use the delete ingest pipeline API operation.
Use the following request to delete a pipeline.

## Example
To delete a specific pipeline, pass the pipeline ID as a parameter:

```
DELETE _ingest/pipeline/12345
```json
DELETE /_ingest/pipeline/<pipeline-id>
```
{% include copy-curl.html %}

## Path and HTTP methods

Delete an ingest pipeline based on that pipeline's ID.

```
DELETE _ingest/pipeline/
```

## URL parameters

All URL parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
timeout | time | How long to wait for the request to return.

## Response
To delete all pipelines in a cluster, use the wildcard character (`*`):

```json
{
"acknowledged" : true
}
```
DELETE /_ingest/pipeline/*
```
{% include copy-curl.html %}
71 changes: 37 additions & 34 deletions _api-reference/ingest-apis/get-ingest.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,62 @@
---
layout: default
title: Get ingest pipeline
parent: Ingest APIs
nav_order: 10
title: Get pipeline
parent: Ingest pipelines
grand_parent: Ingest APIs
nav_order: 12
redirect_from:
- /opensearch/rest-api/ingest-apis/get-ingest/
---

## Get ingest pipeline
# Get pipeline

After you create a pipeline, use the get ingest pipeline API operation to return all the information about a specific ingest pipeline.
Use the get ingest pipeline API operation to retrieve all the information about the pipeline.

## Example
## Retrieving information about all pipelines

```
GET _ingest/pipeline/12345
The following example request returns information about all ingest pipelines:

```json
GET _ingest/pipeline/
```
{% include copy-curl.html %}

## Path and HTTP methods
## Retrieving information about a specific pipeline

Return all ingest pipelines.
The following example request returns information about a specific pipeline, which for this example is `my-pipeline`:

```json
GET _ingest/pipeline/my-pipeline
```
GET _ingest/pipeline
```

Returns a single ingest pipeline based on the pipeline's ID.

```
GET _ingest/pipeline/{id}
```

## URL parameters

All parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
master_timeout | time | How long to wait for a connection to the master node.
{% include copy-curl.html %}

## Response
The response contains the pipeline information:

```json
{
"pipeline-id" : {
"description" : "A description for your pipeline",
"processors" : [
"my-pipeline": {
"description": "This pipeline processes student data",
"processors": [
{
"set" : {
"field" : "field-name",
"value" : "value"
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
}
```
```
11 changes: 9 additions & 2 deletions _api-reference/ingest-apis/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ redirect_from:

# Ingest APIs

Before you index your data, OpenSearch's ingest APIs help transform your data by creating and managing ingest pipelines. Pipelines consist of **processors**, customizable tasks that run in the order they appear in the request body. The transformed data appears in your index after each of the processor completes.
Ingest APIs are a valuable tool for loading data into a system. Ingest APIs work together with [ingest pipelines]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-pipelines/) and [ingest processors]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/ingest-processors/) to process or transform data from a variety of sources and in a variety of formats.

Ingest pipelines in OpenSearch can only be managed using ingest API operations. When using ingest in production environments, your cluster should contain at least one node with the node roles permission set to `ingest`. For more information on setting up node roles within a cluster, see [Cluster Formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/).
## Ingest pipeline APIs

Simplify, secure, and scale your OpenSearch data ingestion with the following APIs:

- [Create pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/create-ingest/): Use this API to create or update a pipeline configuration.
- [Get pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/get-ingest/): Use this API to retrieve a pipeline configuration.
- [Simulate pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/simulate-ingest/): Use this pipeline to test a pipeline configuration.
- [Delete pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/delete-ingest/): Use this API to delete a pipeline configuration.
Loading

0 comments on commit a29159c

Please sign in to comment.