Preprocess data with Ingest Pipeline

Advanced Managed Search (OpenSearch) provides the Ingest Pipeline feature, which lets you process, filter, transform, and enrich data before documents are stored in an index.
This page explains the concept and purpose of Ingest Pipeline, as well as the basic way to use it.

What is Ingest Pipeline?

Ingest Pipeline is a preprocessing pipeline that runs before documents are stored in an OpenSearch index.
A single pipeline can contain multiple Processors, and each Processor performs a specific data processing task.

Through this, you can perform tasks such as the following.

Convert and normalize field values
Convert string case
Add or delete fields, and set default values
Organize and enrich data formats

Ingest Pipeline plays an important role in improving data quality and increasing search accuracy.

Requirements for using Ingest Pipeline

To use Ingest Pipeline, the following conditions must be met.

At least one node in the cluster must have the ingest node role.
A security role with permission to manage Ingest Pipeline is required.
- cluster_manage_pipelines

Create Ingest Pipeline

The following example shows how to create an Ingest Pipeline that processes student data.

PUT _ingest/pipeline/my-pipeline
{
  "description": "This pipeline processes student data",
  "processors": [
    {
      "set": {
        "description": "Sets the graduation year to 2023",
        "field": "grad_year",
        "value": 2023
      }
    },
    {
      "set": {
        "description": "Sets graduated to true",
        "field": "graduated",
        "value": true
      }
    },
    {
      "uppercase": {
        "field": "name"
      }
    }
  ]
}

This pipeline performs the following processing.

Set the grad_year value to 2023
Set the graduated value to true
Convert the name field to uppercase

Simulate Ingest Pipeline

Before applying the pipeline to actual data, you can preview the result by using the _simulate API.

POST /_ingest/pipeline/my-pipeline/_simulate
{
  "docs": [
    {
      "_index": "my-index",
      "_id": "1",
      "_source": {
        "grad_year": 2024,
        "graduated": false,
        "name": "John Doe"
      }
    },
    {
      "_index": "my-index",
      "_id": "2",
      "_source": {
        "grad_year": 2025,
        "graduated": false,
        "name": "Jane Doe"
      }
    }
  ]
}

Through this, you can check how the data changes after the pipeline is applied.

Set default pipeline for index

If you set an Ingest Pipeline as the default pipeline for a specific index, preprocessing is automatically applied to all documents that enter that index.

PUT my-index/_settings
{
  "settings": {
    "index.default_pipeline": "my-pipeline"
  }
}

tip

If you want to skip the pipeline for a specific indexing request, you can add ?pipeline=_none to the request query.

Index document and check result

Now, when you index a document, the Ingest Pipeline is applied automatically.

Index request

POST my-index/_doc/1
{
  "name": "open-search user",
  "grad_year": 2025,
  "graduated": false
}

Check result

GET my-index/_search

Final data response

"hits": {
  "hits": [
    {
      "_source": {
        "graduated": true,            // Fixed to true by the pipeline
        "name": "OPEN-SEARCH USER",   // Converted to uppercase by the pipeline
        "grad_year": 2023             // Fixed to 2023 by the pipeline
      }
    }
  ]
}

You can confirm that the Ingest Pipeline has been applied correctly to the indexed document.

Use cases for Ingest Pipeline

Ingest Pipeline is useful in scenarios such as the following.

Normalize log fields (unify case and format)
Standardize user input data
Generate search keywords and auxiliary fields
Clean data and manage data quality

note

For the list of Processors available in Ingest Pipeline and their detailed features, see the OpenSearch official documentation.

What is Ingest Pipeline?​

Requirements for using Ingest Pipeline​

Create Ingest Pipeline​

Simulate Ingest Pipeline​

Set default pipeline for index​

Index document and check result​

Index request​

Check result​

Final data response​

Use cases for Ingest Pipeline​