Monitor Kafka consumer lag using Prometheus
This tutorial demonstrates how to collect consumer lag metrics from Kafka using Prometheus Exporter, and trigger alerts through Alert Center when the lag exceeds a defined threshold.
- Estimated time: 60 minutes
- Recommended OS: MacOS, Ubuntu
- Prerequisites:
About this scenario
In this tutorial, you'll learn how to monitor Kafka consumer lag using Prometheus and Exporter, and how to set up alerts using Alert Center when lag exceeds a certain threshold.
This scenario includes:
- Installing Kafka Exporter and Prometheus Agent
- Collecting and viewing Kafka lag metrics
- Setting threshold-based alerts with Alert Center
- Kafka Lag: Kafka lag represents the number of messages not yet processed by a consumer. It quantifies how far behind a consumer group is for a given topic, which is helpful for identifying system bottlenecks or failures. (
Lag = Latest offset in Kafka partition - Offset committed by the consumer group
) - Kafka Exporter: Kafka Exporter collects metrics (including lag) from Kafka and exposes them in Prometheus-compatible format.
KakaoCloud's Advanced Managed Prometheus cannot access Kafka clusters directly. Therefore, you need to install a Prometheus Agent on the VM running Kafka (or in the same network) to collect metrics and forward them to the Managed Prometheus workspace.
Before you start
Before starting this tutorial, please follow the steps in Message processing through Kafka to set up a working Kafka producer-consumer environment.
Getting started
Step 1. Create consumer group
The kafka_consumergroup_lag
metric collected by Kafka Exporter is measured per consumer group, based on how far each group has consumed messages from each partition.
Run the following command to create a consumer group in Kafka:
# Move to Kafka directory
cd ~/kafka
# Create consumer group
bin/kafka-console-consumer.sh \
--bootstrap-server ${BOOTSTRAP_SERVER} \
--topic ${TOPIC_NAME} \
--group ${GROUP_NAME} \
--from-beginning
환경변수 | 설명 |
---|---|
BOOTSTRAP_SERVER🖌︎ | Kafka cluster bootstrap server from KakaoCloud Console |
TOPIC_NAME🖌︎ | Pre-created Kafka topic name |
GROUP_NAME🖌︎ | Specify consumer group name / e.g. lag-group |
Step 2. Install Kafka Exporter
- Install and run
kafka_exporter
to expose Kafka metrics for Prometheus to scrape.
# Move to install directory
cd ~/Downloads
# Download Kafka Exporter
wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.9.0/kafka_exporter-1.9.0.linux-amd64.tar.gz
tar -xvf kafka_exporter-1.9.0.linux-amd64.tar.gz
cd kafka_exporter-1.9.0.linux-amd64
# Start Kafka Exporter
./kafka_exporter \
--kafka.server=${BROKER1} \
--kafka.server=${BROKER2} \
--log.level=info
환경변수 | 설명 |
---|---|
BROKER1🖌︎ | Kafka broker IP and port from KakaoCloud Console / e.g. 10.0.x.x:9092 |
BROKER2🖌︎ | Another broker IP and port |
-
Check if metrics are exposed correctly:
curl http://localhost:9308/metrics | grep kafka_consumergroup_lag
Step 3. Install and configure local Prometheus Agent
To scrape metrics from Kafka Exporter, install Prometheus locally and configure it as an agent.
Before installation, make sure to create a Prometheus workspace in the KakaoCloud Console.
-
Download and extract Prometheus:
cd ~/kafka
wget https://github.com/prometheus/prometheus/releases/download/v2.33.1/prometheus-2.33.1.linux-amd64.tar.gz
tar xvfz prometheus-2.33.1.linux-amd64.tar.gz
cd prometheus-2.33.1.linux-amd64 -
Create and open a Prometheus Agent config file:
mkdir -p /etc/prometheus
sudo vi /etc/prometheus/prometheus-agent.yaml -
Add the following configuration to the YAML file:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kafka_exporter'
static_configs:
- targets: ['localhost:9308']
remote_write:
- url: "${WRITE_ENDPOINT}"
headers:
Credential-ID: '${CREDENTIAL_ID}'
Credential-Secret: '${CREDENTIAL_SECRET}'환경변수 설명 WRITE_ENDPOINT🖌︎ Write endpoint of the workspace from KakaoCloud Console CREDENTIAL_ID🖌︎ Access Key ID CREDENTIAL_SECRET🖌︎ Secret Access Key -
Start Prometheus with the config file:
cd ~/kafka/prometheus-2.33.1.linux-amd64
./prometheus --config.file=/etc/prometheus/prometheus-agent.yaml > prom.log 2>&1 & -
Verify Prometheus is running:
curl http://localhost:9090
Step 4. Set alerts in Alert Center
Configure alert rules based on Kafka Lag metrics.
At least one notification channel must be registered before creating an alert policy. See Create and manage notification channels for details.
-
Go to KakaoCloud Console > Management > Alert Center.
-
Click the Alert policy (project) tab, then click Create alert policy.
-
Choose Advanced Managed Prometheus for the condition type.
-
Select the previously created workspace.
-
Enter the following alert rule script:
groups:
- name: kafkaConsumergroupAlert
rules:
- alert: HighConsumergroupLag
expr: sum(kafka_consumergroup_lag) by (consumergroup, topic) >= 10
for: 1m
annotations:
summary: "Kafka Consumergroup Lag >= 10"
description: "consumer group: {{ $labels.consumergroup }} / topic: {{ $labels.topic }} / sum of lag: {{ $value }}" -
Click [Next], select the notification channel.
-
Click [Next] again and enter a name for the alert policy.
-
Review and click [Create] to complete the alert setup.
Step 5. Trigger alert for testing
To verify the alert, simulate a lag by stopping the consumer while the producer continues sending messages.
-
Run the consumer and ensure lag is 0:
cd ~/kafka
bin/kafka-console-consumer.sh \
--bootstrap-server ${BOOTSTARP_SERVER} \
--topic ${TOPIC_NAME} \
--group ${GROUP_NAME} \
--from-beginning환경변수 설명 BOOTSTARP_SERVER🖌︎ Kafka cluster bootstrap server TOPIC_NAME🖌︎ Kafka topic name GROUP_NAME🖌︎ Consumer group name / e.g. lag-group -
Press Ctrl+C to stop the consumer, then send new messages:
cd ~/kafka
bin/kafka-console-producer.sh \
--bootstrap-server ${BOOTSTARP_SERVER} \
--topic ${TOPIC_NAME}환경변수 설명 BOOTSTARP_SERVER🖌︎ Kafka cluster bootstrap server TOPIC_NAME🖌︎ Kafka topic name > test-1
> test-2
> test-3
...
> test-12 -
Alerts are triggered only if the lag remains above the threshold for at least 1 minute, as defined in
for: 1m
. Wait a minute to confirm alert reception.