Skip to main content

Monitoring metrics

This document describes the metrics provided by the KakaoCloud Monitoring service.

Virtual Machine, GPU, and Bare Metal Server metrics

These are the primary system resource metrics collected commonly across Virtual Machines, GPUs, and Bare Metal Servers. They can be utilized in the following service areas:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnitRecommended use
cpu_usageTotal CPU utilization%Key performance indicator
cpu_usage_userCPU utilization (User processes)%Check user process load
cpu_usage_systemCPU utilization (System kernel)%Check kernel/system load
cpu_usage_iowaitCPU utilization (I/O wait)%Diagnose I/O bottlenecks
cpu_usage_per_coreCPU utilization per core%Check load imbalance across cores
mem_usageSelf-memory utilization%Major memory alerts
mem_usedUsed memory sizebytes(IEC)Check absolute usage
mem_bufferedMemory usage (Buffers)bytes(IEC)Linux only
mem_cachedMemory usage (Cache)bytes(IEC)Linux only

Virtual Machine, metrics for burstable instances only

The following metrics are collected only for t1i family instances with the burstable option applied (excluding the t1i.medium.dns.default type).

Metric nameDescriptionUnit
cpu_credit_usageAccumulated CPU credit usage; credits consumed when CPU usage exceeds baseline performancecount
cpu_credit_balanceRemaining CPU credit balance; earned when operating below baseline performancecount
info
  • mem_buffered, mem_cached, and disk_inodes_usage metrics are only collected and provided for servers with Linux OS installed.
  • nvidia_smi metrics are only collected for servers equipped with GPUs.
caution
  • GPU instance library compatibility: When updating NVIDIA libraries for GPU instances, ensure you check CUDA version compatibility. If incompatible, the monitoring agent may fail to collect NVIDIA metrics.
  • Network notification policy: When setting an Alert Center notification policy using the network_rx_bytes_persec metric, the policy applies to all network interfaces. In multi-NIC instances, an alert will be sent if any connected interface exceeds the threshold.

Libvirt metrics

These are the primary resource metrics for virtualization-based servers collected in Libvirt environments. They can be used in:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
Metric nameDescriptionUnit
libvirt_domain_info_cpu_time_seconds_totalTotal CPU time usedcount
libvirt_domain_info_virtual_cpusNumber of CPU corescount

Kubernetes Engine metrics

Primary cluster, node, and pod resource metrics collected in Kubernetes Engine environments. They can be used in:

  • Monitoring: Metric Export
Metric nameDescriptionUnit
cluster_autoscaler_node_group_min_countMinimum node count for node group autoscalingcount
cluster_autoscaler_node_group_max_countMaximum node count for node group autoscalingcount
cluster_autoscaler_node_group_target_countTarget node count for node group autoscalingcount
node_countCurrent number of nodescount

Load Balancing metrics

Primary metrics for monitoring traffic and connection status of Load Balancer resources. Used in:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnit
lb_bytes_in_persecInbound traffic per second (Received bytes)bytes/s(IEC)
lb_bytes_out_persecOutbound traffic per second (Sent bytes)bytes/s(IEC)
lb_connections_persecConnections created per secondcount/s
lb_current_connectionsNumber of currently maintained connectionscount
lb_healthy_host_countNumber of healthy hosts available for connectioncount
lb_unhealthy_host_countNumber of unhealthy hosts unavailable for connectioncount

MySQL metrics

Primary metrics for monitoring storage, network, query, and connection status of MySQL instances. Used in:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnitNote
cpu_usageTotal CPU utilization%Key performance indicator
cpu_usage_userCPU utilization (User processes)%
cpu_usage_systemCPU utilization (System kernel)%
cpu_usage_iowaitCPU utilization (I/O wait)%
cpu_usage_per_coreCPU utilization per core%
mem_swap_totalTotal swap memorybytes(IEC)
mem_swap_cachedCached swap memorybytes(IEC)
mem_swap_freeAvailable swap memorybytes(IEC)
mem_usageSelf-memory utilization%
mem_usedUsed memory sizebytes(IEC)
mem_bufferedMemory usage (Buffers)bytes(IEC)
mem_cachedMemory usage (Cache)bytes(IEC)

PostgreSQL metrics

Primary metrics for monitoring the disk, network, connection, and transaction status of PostgreSQL instances. These can be used in the following service areas:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnitNote
cpu_usageTotal CPU utilization%Key performance indicator
cpu_usage_userCPU utilization (User processes)%
cpu_usage_systemCPU utilization (System kernel)%
cpu_usage_iowaitCPU utilization (I/O wait)%
cpu_usage_per_coreCPU utilization per core%
mem_swap_totalTotal swap memorybytes(IEC)
mem_swap_cachedCached swap memorybytes(IEC)
mem_swap_freeAvailable swap memorybytes(IEC)
mem_usageSelf-memory utilization%
mem_usedUsed memory sizebytes(IEC)
mem_bufferedMemory usage (Buffers)bytes(IEC)
mem_cachedMemory usage (Cache)bytes(IEC)

MemStore metrics

Primary metrics for monitoring the memory, network, replication, and CPU usage status of MemStore instances. These can be used in the following service areas:

  • Monitoring: Custom dashboards, Metric Explorer, Metric Export
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnit
memstore_used_cpu_sysTotal system CPU usagecount
memstore_used_cpu_sys_main_threadSystem CPU usage of the main threadcount
memstore_used_cpu_userTotal user CPU usagecount
memstore_used_cpu_user_main_threadUser CPU usage of the main threadcount
memstore_memory_usageTotal memory utilization%
memstore_used_memoryMemory size currently in use by MemStorebytes(IEC)
memstore_used_memory_peakPeak memory usedbytes(IEC)
memstore_used_memory_peak_percRatio of peak usage to total memory%
memstore_used_memory_datasetMemory used for actual data storagebytes(IEC)
memstore_used_memory_dataset_percRatio of memory used for actual data storage%
memstore_used_memory_overheadOverhead memory required for internal data structure managementbytes(IEC)
memstore_used_memory_luaMemory used for Lua script executionbytes(IEC)
memstore_allocator_allocatedMemory allocated to the allocator (including internal fragmentation)bytes(IEC)
memstore_allocator_activeActive memory in the allocator (including external fragmentation)bytes(IEC)
memstore_allocator_residentResident memory managed by the allocatorbytes(IEC)
memstore_allocator_rss_bytesRSS memory sizebytes(IEC)
memstore_allocator_frag_bytesDifference between active memory and allocated memorybytes(IEC)
memstore_allocator_frag_ratioRatio of allocated memory to active memory%
memstore_allocator_rss_ratioRatio of active memory to resident memory%
memstore_mem_fragmentation_bytesDifference between resident memory in use and allocated memorybytes(IEC)
memstore_mem_fragmentation_ratioRatio of resident memory in use to allocated memory%
memstore_rss_overhead_bytesDifference between process RSS and allocator resident memorybytes(IEC)
memstore_rss_overhead_ratioRatio of process RSS to allocator resident memory%
memstore_total_system_memoryTotal memory of the system where MemStore is runningbytes(IEC)

Metrics for burstable instances only

The following metrics are collected only for t1i family instances with the burstable option applied.

Metric nameDescriptionUnit
cpu_credit_usageCPU credit usagecount
cpu_credit_balanceRemaining CPU credit balancecount

Hadoop Eco metrics

Primary HBase, HDFS, Yarn, and Kafka related system metrics collected in a Hadoop Eco environment. These can be used in:

  • Monitoring: Metric Export
Metric nameDescriptionUnit
HBase_Master_JvmMetrics_MemHeapMaxMMaximum JVM heap memory size for HBase MasterMB
HBase_Master_JvmMetrics_MemHeapUsedMJVM heap memory usage for HBase MasterMB
HBase_Master_Server_numDeadRegionServersNumber of Region Servers in an abnormal (Dead) statecount
HBase_Master_Server_numRegionServersNumber of Region Servers operating normallycount

Pub/Sub metrics

Primary metrics for monitoring the status of message publishing, subscription, and storage for the Pub/Sub service. These can be used in:

  • Monitoring: Custom dashboards, Metric Explorer
  • Alert Center: Setting metric-based notification policies
Metric nameDescriptionUnit
pubsub_published_message_count_persecNumber of published messages per secondcount/s
pubsub_published_message_bytes_persecSize of published messages per secondbytes/s(IEC)
pubsub_publish_request_count_persecNumber of publish requests per secondcount/s
pubsub_topic_storage_used_bytesSize of stored data in topicbytes(IEC)

Direct Connect metrics

Primary metrics for monitoring traffic and connection status of Direct Connect virtual interfaces. These can be used in:

  • Monitoring: Metric Export
Metric nameDescriptionUnit
dx_virtual_interface_input_bits_persecBits received per second on the virtual interfacebits/s(IEC)
dx_virtual_interface_output_bits_persecBits sent per second on the virtual interfacebits/s(IEC)
dx_virtual_interface_input_packets_persecPackets received per second on the virtual interfacepackets/s
dx_virtual_interface_output_packets_persecPackets sent per second on the virtual interfacepackets/s

Gateway Load Balancer metrics

Primary metrics for monitoring traffic, connections, and health status of Gateway Load Balancers and Endpoint Services. These can be used in:

  • Monitoring: Metric Export
Metric nameDescriptionUnit
gwlb_bytes_in_persecTotal bytes received by the Gateway Load Balancer per secondbytes/s(IEC)
gwlb_bytes_out_persecTotal bytes sent by the Gateway Load Balancer per secondbytes/s(IEC)
eps_bytes_in_persecTotal bytes received by the Endpoint Service per secondbytes/s(IEC)
eps_bytes_out_persecTotal bytes sent by the Endpoint Service per secondbytes/s(IEC)
ep_bytes_in_persecTotal bytes received by the Endpoint per secondbytes/s(IEC)
ep_bytes_out_persecTotal bytes sent by the Endpoint per secondbytes/s(IEC)

Private Endpoint metrics

Primary metrics for monitoring traffic and connection status of Private Endpoints. These can be used in:

  • Monitoring: Metric Export
Metric nameDescriptionUnit
ep_bytes_in_persecTotal bytes received by the Endpoint per secondbytes/s(IEC)
ep_bytes_out_persecTotal bytes sent by the Endpoint per secondbytes/s(IEC)