Preprocess data with Ingest Pipeline
Advanced Managed Search (OpenSearch) provides the Ingest Pipeline feature, which lets you process, filter, transform, and enrich data before documents are stored in an index.
This page explains the concept and purpose of Ingest Pipeline, as well as the basic way to use it.
What is Ingest Pipeline?
Ingest Pipeline is a preprocessing pipeline that runs before documents are stored in an OpenSearch index.
A single pipeline can contain multiple Processors, and each Processor performs a specific data processing task.
Through this, you can perform tasks such as the following.
- Convert and normalize field values
- Convert string case
- Add or delete fields, and set default values
- Organize and enrich data formats
Ingest Pipeline plays an important role in improving data quality and increasing search accuracy.
Requirements for using Ingest Pipeline
To use Ingest Pipeline, the following conditions must be met.
- At least one node in the cluster must have the
ingestnode role. - A security role with permission to manage Ingest Pipeline is required.
cluster_manage_pipelines
Create Ingest Pipeline
The following example shows how to create an Ingest Pipeline that processes student data.
PUT _ingest/pipeline/my-pipeline
{
"description": "This pipeline processes student data",
"processors": [
{
"set": {
"description": "Sets the graduation year to 2023",
"field": "grad_year",
"value": 2023
}
},
{
"set": {
"description": "Sets graduated to true",
"field": "graduated",
"value": true
}
},
{
"uppercase": {
"field": "name"
}
}
]
}
This pipeline performs the following processing.
- Set the grad_year value to 2023
- Set the graduated value to true
- Convert the name field to uppercase
Simulate Ingest Pipeline
Before applying the pipeline to actual data, you can preview the result by using the _simulate API.
POST /_ingest/pipeline/my-pipeline/_simulate
{
"docs": [
{
"_index": "my-index",
"_id": "1",
"_source": {
"grad_year": 2024,
"graduated": false,
"name": "John Doe"
}
},
{
"_index": "my-index",
"_id": "2",
"_source": {
"grad_year": 2025,
"graduated": false,
"name": "Jane Doe"
}
}
]
}
Through this, you can check how the data changes after the pipeline is applied.
Set default pipeline for index
If you set an Ingest Pipeline as the default pipeline for a specific index, preprocessing is automatically applied to all documents that enter that index.
PUT my-index/_settings
{
"settings": {
"index.default_pipeline": "my-pipeline"
}
}
If you want to skip the pipeline for a specific indexing request, you can add ?pipeline=_none to the request query.
Index document and check result
Now, when you index a document, the Ingest Pipeline is applied automatically.
Index request
POST my-index/_doc/1
{
"name": "open-search user",
"grad_year": 2025,
"graduated": false
}
Check result
GET my-index/_search
Final data response
"hits": {
"hits": [
{
"_source": {
"graduated": true, // Fixed to true by the pipeline
"name": "OPEN-SEARCH USER", // Converted to uppercase by the pipeline
"grad_year": 2023 // Fixed to 2023 by the pipeline
}
}
]
}
You can confirm that the Ingest Pipeline has been applied correctly to the indexed document.
Use cases for Ingest Pipeline
Ingest Pipeline is useful in scenarios such as the following.
- Normalize log fields (unify case and format)
- Standardize user input data
- Generate search keywords and auxiliary fields
- Clean data and manage data quality
For the list of Processors available in Ingest Pipeline and their detailed features, see the OpenSearch official documentation.