Skip to main content

Monitoring Metrics

This section describes the metrics provided by KakaoCloud's Monitoring service.

Virtual Machine, GPU, Bare Metal Server Metrics

These are the primary system resource metrics commonly collected from Virtual Machine, GPU, and Bare Metal Server instances, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer, Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
cpu_usageTotal CPU utilization%
cpu_usage_userCPU utilization (user process)%
cpu_usage_systemCPU utilization (system kernel)%
cpu_usage_iowaitCPU utilization (I/O wait)%
cpu_usage_per_coreCPU utilization per core%
mem_usageTotal memory utilization%
mem_usedAmount of memory in usebytes(IEC)
mem_bufferedMemory usage (buffer)bytes(IEC)
mem_cachedMemory usage (cache)bytes(IEC)
info
  • mem_buffered, mem_cached, and disk_inodes_usage metrics are only collected and provided on Linux OS.
  • nvidia_smi metrics are only collected from servers equipped with a GPU.
caution
  • GPU Instance Library Compatibility: If you update the NVIDIA library on a GPU instance, you must check compatibility with the CUDA version. If they are incompatible, the monitoring agent may fail to collect NVIDIA metrics.
  • Network Alert Policy: When configuring an Alert Center policy using the network_rx_bytes_persec metric, the policy will apply to all network interfaces. For multi-NIC instances, an alert is triggered if any connected interface exceeds the configured threshold.

Libvirt Metrics

These are the main resource metrics for virtualization-based servers collected in the Libvirt environment, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
libvirt_domain_info_cpu_time_seconds_totalTotal CPU time usedcount
libvirt_domain_info_virtual_cpusNumber of CPU corescount

Burstable Instance Exclusive Metrics

The following metrics are only collected from t1i family instances with the Burstable option applied (excluding t1i.medium.dns.default type).

Metric NameDescriptionUnit
cpu_credit_usageAccumulated CPU credit usage; the amount of credit consumed when CPU usage exceeds baseline performancecount
cpu_credit_balanceRemaining CPU credit balance for the instance, accrued when operating below baseline performancecount

Kubernetes Engine Metrics

These are the primary cluster, node, and pod resource metrics collected in the Kubernetes Engine environment, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
Metric NameDescriptionUnit
cluster_autoscaler_node_group_min_countMinimum number of nodes during node group autoscalingcount
cluster_autoscaler_node_group_max_countMaximum number of nodes during node group autoscalingcount
cluster_autoscaler_node_group_target_countTarget number of nodes during node group autoscalingcount
node_countCurrent number of nodescount

Load Balancing Metrics

These are the main metrics for monitoring the traffic and connection status of Load Balancer resources, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer, Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
lb_bytes_in_persecInbound traffic per second (received bytes)bytes/s(IEC)
lb_bytes_out_persecOutbound traffic per second (sent bytes)bytes/s(IEC)
lb_connections_persecNumber of connections created per secondcount/s
lb_current_connectionsNumber of currently maintained connectionscount
lb_healthy_host_countNumber of healthy (connectable) targetscount
lb_unhealthy_host_countNumber of unhealthy (unconnectable) targetscount

MySQL Metrics

These are the main metrics for monitoring the storage, network, query, and connection status of MySQL instances, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer, Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
mem_swap_totalTotal swap memorybytes(IEC)
mem_swap_cachedCached swap memorybytes(IEC)
mem_swap_freeAvailable swap memorybytes(IEC)

PostgreSQL Metrics

These are the main metrics for monitoring the disk, network, connection, and transaction status of PostgreSQL instances, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer, Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
pg_defaultstorage_disk_read_bytes_persecBytes read per second on the default storage diskbytes/s(IEC)
pg_defaultstorage_disk_write_bytes_persecBytes written per second on the default storage diskbytes/s(IEC)
pg_defaultstorage_disk_read_iopsRead operations completed per second on the default storage diskcount/s
pg_defaultstorage_disk_write_iopsWrite operations completed per second on the default storage diskcount/s
pg_defaultstorage_disk_usedDefault storage disk usagebytes(IEC)
pg_defaultstorage_disk_used_percentDefault storage disk utilization%
pg_defaultstorage_disk_inodes_usageDefault storage inode utilization%
pg_defaultstorage_disk_freeAvailable capacity on the default storage diskbytes(IEC)
pg_defaultstorage_disk_totalTotal capacity of the default storage diskbytes(IEC)
pg_defaultstorage_disk_inodes_freeNumber of available inodes on the default storage diskcount
pg_defaultstorage_disk_inodes_totalTotal number of inodes on the default storage diskcount
pg_defaultstorage_disk_inodes_usedInode usage on the default storage diskcount
pg_logstorage_disk_read_bytes_persecBytes read per second on the log storage diskbytes/s(IEC)
pg_logstorage_disk_write_bytes_persecBytes written per second on the log storage diskbytes/s(IEC)
pg_logstorage_disk_read_iopsRead operations completed per second on the log storage diskcount/s
pg_logstorage_disk_write_iopsWrite operations completed per second on the log storage diskcount/s
pg_logstorage_disk_usedLog storage disk usagebytes(IEC)
pg_logstorage_disk_used_percentLog storage disk utilization%
pg_logstorage_disk_inodes_usageLog storage inode utilization%
pg_logstorage_disk_freeAvailable capacity on the log storage diskbytes(IEC)
pg_logstorage_disk_totalTotal capacity of the log storage diskbytes(IEC)
pg_logstorage_disk_inodes_freeNumber of available inodes on the log storage diskcount
pg_logstorage_disk_inodes_totalTotal number of inodes on the log storage diskcount
pg_logstorage_disk_inodes_usedInode usage on the log storage diskcount

MemStore Metrics

These are the main metrics for monitoring the memory, network, replication, and CPU usage status of MemStore instances, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer, Metric Export
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
memstore_used_cpu_sysTotal system CPU usagecount
memstore_used_cpu_sys_main_threadSystem CPU usage of the main threadcount
memstore_used_cpu_userTotal user CPU usagecount
memstore_used_cpu_user_main_threadUser CPU usage of the main threadcount
memstore_memory_usageTotal memory utilization%
memstore_used_memorySize of memory used by MemStorebytes(IEC)
memstore_used_memory_peakPeak memory usagebytes(IEC)
memstore_used_memory_peak_percPeak usage ratio relative to total memory%
memstore_used_memory_datasetMemory used for actual data storagebytes(IEC)
memstore_used_memory_dataset_percMemory ratio used for actual data storage%
memstore_used_memory_overheadOverhead memory required for internal data structure managementbytes(IEC)
memstore_used_memory_luaMemory used for Lua script executionbytes(IEC)
memstore_allocator_allocatedMemory allocated to the allocator (including internal fragmentation)bytes(IEC)
memstore_allocator_activeActive memory in the allocator (including external fragmentation)bytes(IEC)
memstore_allocator_residentResident memory managed by the allocatorbytes(IEC)
memstore_allocator_rss_bytesRSS memory sizebytes(IEC)
memstore_allocator_frag_bytesDifference between active memory and allocated memorybytes(IEC)
memstore_allocator_frag_ratioRatio of allocated memory to active memory%
memstore_allocator_rss_ratioRatio of active memory to resident memory%
memstore_mem_fragmentation_bytesDifference between used resident memory and allocated memorybytes(IEC)
memstore_mem_fragmentation_ratioRatio of used resident memory to allocated memory%
memstore_rss_overhead_bytesDifference between process RSS and allocator resident memorybytes(IEC)
memstore_rss_overhead_ratioRatio between process RSS and allocator resident memory%
memstore_total_system_memoryTotal memory of the system where MemStore is runningbytes(IEC)

Burstable Instance Exclusive Metrics

The following metrics are only collected from t1i family instances with the Burstable option applied.

Metric NameDescriptionUnit
cpu_credit_usageCPU credit usagecount
cpu_credit_balanceCPU credit balancecount

Hadoop Eco Metrics

These are the main HBase, HDFS, Yarn, and Kafka related system metrics collected in the Hadoop Eco environment, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
Metric NameDescriptionUnit
HBase_Master_JvmMetrics_MemHeapMaxMMaximum JVM heap memory size of HBase MasterMB
HBase_Master_JvmMetrics_MemHeapUsedMJVM heap memory usage of HBase MasterMB
HBase_Master_Server_numDeadRegionServersNumber of dead (unhealthy) Region Serverscount
HBase_Master_Server_numRegionServersNumber of running (healthy) Region Serverscount

Pub/Sub Metrics

These are the main metrics for monitoring the message publishing, subscription, and storage status of the Pub/Sub service, and they can be utilized in the following service areas:

  • Monitoring: Custom Dashboard, Metric Explorer
  • Alert Center: Metric-based Alert Policy Configuration
Metric NameDescriptionUnit
pubsub_published_message_count_persecNumber of published messages per secondcount/s
pubsub_published_message_bytes_persecSize of published messages per secondbytes/s(IEC)
pubsub_publish_request_count_persecNumber of publish requests per secondcount/s
pubsub_topic_storage_used_bytesTopic retention data sizebytes(IEC)

Direct Connect Metrics

These are the main metrics for monitoring the traffic and connection status of Direct Connect virtual interfaces, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
Metric NameDescriptionUnit
dx_virtual_interface_input_bits_persecBits received per second on the virtual interfacebits/s(IEC)
dx_virtual_interface_output_bits_persecBits sent per second on the virtual interfacebits/s(IEC)
dx_virtual_interface_input_packets_persecPackets received per second on the virtual interfacepackets/s
dx_virtual_interface_output_packets_persecPackets sent per second on the virtual interfacepackets/s

Gateway Load Balancer Metrics

These are the main metrics for monitoring the traffic, connection, and health status of Gateway Load Balancer and Endpoint Service, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
Metric NameDescriptionUnit
gwlb_bytes_in_persecTotal bytes received by the Gateway Load Balancerbytes/s(IEC)
gwlb_bytes_out_persecTotal bytes sent by the Gateway Load Balancerbytes/s(IEC)
eps_bytes_in_persecTotal bytes received by the Endpoint Servicebytes/s(IEC)
eps_bytes_out_persecTotal bytes sent by the Endpoint Servicebytes/s(IEC)
ep_bytes_in_persecTotal bytes received by the Endpointbytes/s(IEC)
ep_bytes_out_persecTotal bytes sent by the Endpointbytes/s(IEC)

Private Endpoint Metrics

These are the main metrics for monitoring the traffic and connection status of Private Endpoint, and they can be utilized in the following service areas:

  • Monitoring: Metric Export
Metric NameDescriptionUnit
ep_bytes_in_persecTotal bytes received by the Endpointbytes/s(IEC)
ep_bytes_out_persecTotal bytes sent by the Endpointbytes/s(IEC)