Monitoring Troubleshooting
This document outlines common issues and solutions related to the Monitoring service.
Monitoring Agent Errors
Not found file in path. Please Check path and permission
After installing the monitoring agent, the following error may appear when checking logs using sudo journalctl -u kic_monitor_agent -f
.
Jul 18 09:15:35 host-172-16-2-147 kic_monitor_agent[10046]: 2024-07-18T09:15:35Z W! [inputs.tail] Not found file in path. Please Check path and permission: {specified log file}
This error occurs due to issues with the log file path or permissions, with the main causes as follows:
Cause 1: Incorrect log file path or name
▶️ Solution: Enter the correct file path and name in the KIC_LOG_FILE_PATH
field in /etc/default/kic_monitor_agent
, then restart the monitoring agent.
-
Update the log file path in
/etc/kic_monitor_agent
.# Modify the log file path accessible through the Log Explorer
KIC_LOG_FILE_PATH="" -
Restart the monitoring agent.
sudo systemctl restart kic_monitor_agent
-
Check that the agent is running correctly.
sudo journalctl -u kic_monitor_agent -f
Cause 2: Changing the monitoring agent service user after installation
If the default user (root) is changed during agent installation, permission errors may occur. There are two possible solutions.
▶️ Solution 1: Change the monitoring agent service user to root
[Service]
EnvironmentFile=-/etc/default/kic_monitor_agent
User=root # Change to root
▶️ Solution 2: Grant read
permission if the service user lacks read access to the log file
# Example: Grant read permission to user, group, and others
sudo chmod 444 {log file}
Collection took longer than expected; not complete after interval of 10s
If the following error occurs on an instance where the monitoring agent is installed, it means that the agent was unable to collect data due to a short collection interval.
Apr 01 11:03:20 ${instance where the error occurred} kic_monitor_agent[52839]: 2024-04-01T02:03:20Z W! [inputs.disk] Collection took longer than expected; not complete after interval of 10s
▶️ Solution: Modify the collection interval setting to a value greater than the default (10s
).
-
In the
/etc/kic_monitor_agent/kic_monitor_agent.conf
file, change theinterval
value to something larger than the default (10s
) and save.Modify interval value[agent]
## Default data collection interval for all inputs
interval = "30s" # 기존에 설정된 시간(10s)보다 크게 입력 -
Restart the monitoring agent.
Restart monitoring agentsudo systemctl restart kic_monitor_agent
-
Verify that the monitoring agent is running correctly.
Check monitoring agent statussudo journalctl -u kic_monitor_agent -f
Errors When Using Metric Export API
Default Error Page: Page Not Found
This error can occur in an incorrect access environment when calling the Metric Export API.
<html>
<head>
<meta charset="UTF-8">
<title>
Default Error Page
</title>
</head>
<body>
<div style="text-align:center">
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<h1 className="header">페이지를 찾을 수 없습니다.</h1>
<p>
<span style="color: gray;">
요청한 주소가 변경 또는 삭제되었거나, 잘못된 주소여서 페이지를 찾을 수 없습니다.
</span>
<br/>
<span style="color: gray;">입력한 URL이 올바른지 다시 확인해 주시기 바랍니다.</span>
</p>
</div>
</body>
</html>
Cause 1: The client is not a Public Virtual Machine in the kr-central-2 region
▶️ Solution: Verify that the client calling the Metric Export API is a VM with a public IP in kr-central-2 or a VM in a VPC with an internet gateway in kr-central-2.
Cause 2: User does not exist in the project
▶️ Solution: Invite the user as a project member within the project of the client calling the Metric Export API.
Cause 3: User access key error or expiration
▶️ Solution: Ensure that the access key ID and secret access key are entered correctly, and check if the key has expired.
User access keys can be viewed and issued from Access key in the profile menu at the top right of the KakaoCloud console. For detailed instructions, see the Get access key.
not supported kakaocloud service-type header.
The following response may occur when using the Metric Export API. This error is triggered by an incorrect service-type
entry.
{"status":"error","errorType":"bad_data","error":"not supported kakaocloud service-type header. current header value is {incorrect service-type}"}
▶️ Solution: Ensure that the service-type
header is entered correctly according to the specified values.
curl -X GET 'https://monitoring.kr-central-2.kakaocloud.com/metric-export/grafana/{PROJECT_ID}/prometheus/api/' \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Credential-ID: {ACCESS_KEY_ID}" \
-H "Credential-Secret: {ACCESS_KEY_SECRET}" \
-H "service-type: server" # Header to verify
The available service-type
values in KakaoCloud are as follows:
- Virtual Machine, GPU, Bare metal Server (value:
server
) - MySQL (value:
mysql
) - MemStore (value:
memstore
) - Load Balancing (value:
lb
)
No result in response (Data not available)
When using the Metric Export API, you may receive the following response. This can occur when the requested metric does not match the specified service-type
.
{"status":"success","data":{"resultType":"vector","result":[]}}
Cause 1: The requested metric does not match the specified service-type
In cases of incorrect requests, you may receive a response with no data, as shown below:
curl -vvv -X POST 'https://monitoring.kr-central-2.kakaocloud.com/metric-export/grafana/{PROJECT_ID}/prometheus/api/v1/query' \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Credential-ID: {ACCESS_KEY_ID}" \
-H "Credential-Secret: {ACCESS_KEY_SECRET}" \
-H "service-type: server" \ # service-type이 server인데,
-d "query=lb_bytes_in_persec" # When the metric belongs to Load Balancer
▶️ Solution: Request a metric that matches the specified service-type
. For metrics provided by each service, refer to the Monitoring Metrics documentation.
curl -vvv -X POST 'https://monitoring.kr-central-2.kakaocloud.com/metric-export/grafana/{PROJECT_ID}/prometheus/api/v1/query' \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Credential-ID: {ACCESS_KEY_ID}" \
-H "Credential-Secret: {ACCESS_KEY_SECRET}" \
-H "service-type: server" \
-d "query=cpu_usage" # change
Cause 2: Monitoring agent is not installed on the instance
If you are unable to retrieve metrics for a specific instance in Grafana, the monitoring agent may not be installed on that instance, preventing access to metrics.
▶️ Solution: Refer to the Install monitoring agent to install the monitoring agent, then check the metrics.
Metrics for Load Balancing, MySQL, and MemStore can be viewed without installing the monitoring agent.