Monitoring metrics

Explains the metrics provided by Monitoring service in KakaoCloud.
These metrics are used in Custom dashboards, Metric Explorer, Metric Export, and Alert Center for metric-based alert policies provided by Monitoring.

Virtual Machine, GPU, Bare Metal Server Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Metric Export, Alert Center - Alert Policies (Metric-based)

info

The metrics mem_buffered, mem_cached, and disk_inodes_usage are collected and provided only on servers with Linux OS.
The nvidia_smi metric is only collected on servers with a GPU.

info

When updating the NVIDIA library on GPU instances, please ensure compatibility between the library version and CUDA version.
If version compatibility issues arise due to updates via apt upgrade or similar methods, the monitoring agent installed by the user may fail to collect metrics related to Nvidia.

info

If an alert policy is set for the network_rx_bytes_persec metric in Alert Center, it applies to the entire network interface. That is, if the instance has multiple network interfaces, an alarm will be triggered if any of the connected network interfaces exceed the set threshold.

Metric Name	Description	Unit
cpu_usage	Measures the overall CPU usage	%
cpu_usage_iowait	CPU usage, CPU state: iowait	%
cpu_usage_system	CPU usage, CPU state: system	%
cpu_usage_user	CPU usage, CPU state: user	%
cpu_usage_per_core	Measures the CPU usage per core	%
mem_buffered	Memory usage, memory state: buffered	bytes(IEC)
mem_cached	Memory usage, memory state: cached	bytes(IEC)
mem_used	Memory usage	bytes(IEC)
mem_usage	Memory usage percentage	%
disk_used	Disk usage	bytes(IEC)
disk_used_percent	Disk usage percentage	%
disk_inodes_usage	Disk inode usage percentage	%
disk_read_bytes_persec	Bytes read per second from disk	bytes/s(IEC)
disk_write_bytes_persec	Bytes written per second to disk	bytes/s(IEC)
disk_read_iops	Disk I/O operations (reads) per second	count/s
disk_write_iops	Disk I/O operations (writes) per second	count/s
network_rx_bytes_persec	Bytes received per second by network interface	bytes/s(IEC)
network_tx_bytes_persec	Bytes transmitted per second by network interface	bytes/s(IEC)
network_rx_packets_persec	Packets received per second by network interface	packets/s
network_tx_packets_persec	Packets transmitted per second by network interface	packets/s
nvidia_smi_memory_free	Free memory per GPU core	MiB(IEC)
nvidia_smi_memory_total	Total memory per GPU core	MiB(IEC)
nvidia_smi_memory_used	Used memory per GPU core	MiB(IEC)
nvidia_smi_power_draw	Power draw per GPU core	watt
nvidia_smi_utilization_gpu	GPU usage percentage per core	%

Libvirt metrics

Service Scope

Monitoring - Metric Export

Metric Name	Description	Unit
libvirt_domain_info_cpu_time_seconds_total	Total CPU time used	count
libvirt_domain_info_virtual_cpus	Number of CPU cores	count
libvirt_domain_block_stats_read_bytes_total	Total bytes read from disk	bytes(IEC)
libvirt_domain_block_stats_write_bytes_total	Total bytes written to disk	bytes(IEC)
libvirt_domain_interface_stats_receive_bytes_total	Total bytes received by network interface	bytes(IEC)
libvirt_domain_interface_stats_transmit_bytes_total	Total bytes transmitted by network interface	bytes(IEC)

info

The metrics cpu_credit_usage and cpu_credit_balance are only collected on t1i servers with Burstable option enabled (*except for t1i.medium.dns.default).

Metric Name	Description	Unit
cpu_credit_usage	CPU credit usage	count
cpu_credit_balance	Remaining CPU credits	count

Kubernetes Engine Metrics

Service Scope

Monitoring - Metric Export

Metric Name	Description	Unit
cluster_autoscaler_node_group_min_count	Minimum node count during node group autoscaling	count
cluster_autoscaler_node_group_max_count	Maximum node count during node group autoscaling	count
cluster_autoscaler_node_group_target_count	Target node count during node group autoscaling	count
kube_node_status_allocatable	Amount of resources allocatable to pods on the node	none
kube_node_status_capacity	Total available resources on the node	none
node_count	Current number of nodes	count
node_cpu_seconds_total	Time spent in each CPU mode on node	count
node_filesystem_avail_bytes	Available file system space for non-root users	bytes(IEC)
node_filesystem_size_bytes	File system size on node	bytes(IEC)
node_memory_Active_bytes	Active memory that is currently used or reusable	bytes(IEC)
node_memory_Buffers_bytes	Buffer memory used by the kernel for disk block I/O	bytes(IEC)
node_memory_Cached_bytes	Memory used for file system cache	bytes(IEC)
node_memory_MemAvailable_bytes	Estimated available memory that can be used by new processes	bytes(IEC)
node_memory_MemFree_bytes	Currently unused memory	bytes(IEC)
node_memory_MemTotal_bytes	Total memory on the node	bytes(IEC)
node_memory_SReclaimable_bytes	Reclaimable memory in slab cache	bytes(IEC)
node_network_receive_bytes_total	Total bytes received by node's network devices	bytes(IEC)
node_network_transmit_bytes_total	Total bytes transmitted by node's network devices	bytes(IEC)
kube_pod_container_info	Information about containers within a pod	none
kube_pod_container_resource_limits	Resource limits requested by the container	none
kube_pod_container_resource_requests	Resource values requested by the container	none
kube_pod_container_status_running	Whether the container is in Running state	count
kube_pod_container_status_terminated	Whether the container is in Terminated state	count
kube_pod_info	Information about the pod	none
container_cpu_usage_seconds_total	Total CPU time consumed by the container	count
container_memory_working_set_bytes	Memory used by the container that cannot be reclaimed by the OS	bytes(IEC)
container_network_receive_bytes_total	Total network bytes received by the container	bytes(IEC)
container_network_transmit_bytes_total	Total network bytes transmitted by the container	bytes(IEC)

Load Balancing Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Metric Export, Alert Center - Alert Policies (Metric-based)

Metric Name	Description	Unit
lb_bytes_in_persec	Inbound traffic	bytes/s(IEC)
lb_bytes_out_persec	Outbound traffic	bytes/s(IEC)
lb_connections_persec	Number of connections per second	count/s
lb_current_connections	Number of connected connections	count

MySQL Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Metric Export, Alert Center - Alert Policies (Metric-based)

Metric Name	Description	Unit
mem_swap_total	Total swap memory	bytes(IEC)
mem_swap_cached	Cached swap memory	bytes(IEC)
mem_swap_free	Free swap memory	bytes(IEC)
mysql_logstorage_disk_write_bytes_persec	Bytes written per second to the log storage disk	bytes/s(IEC)
mysql_defaultstorage_disk_write_bytes_persec	Bytes written per second to the default storage disk	bytes/s(IEC)
mysql_logstorage_disk_read_bytes_persec	Bytes read per second from the log storage disk	bytes/s(IEC)
mysql_defaultstorage_disk_read_bytes_persec	Bytes read per second from the default storage disk	bytes/s(IEC)
mysql_logstorage_disk_write_iops	Write operations completed per second on log storage disk	count/s
mysql_defaultstorage_disk_write_iops	Write operations completed per second on default storage disk	count/s
mysql_logstorage_disk_read_iops	Read operations completed per second on log storage disk	count/s
mysql_defaultstorage_disk_read_iops	Read operations completed per second on default storage disk	count/s
mysql_logstorage_disk_used	Log storage disk usage	bytes(IEC)
mysql_defaultstorage_disk_used	Default storage disk usage	bytes(IEC)
mysql_defaultstorage_disk_used_percent	Default storage disk usage percentage	%
mysql_logstorage_disk_used_percent	Log storage disk usage percentage	%
mysql_logstorage_disk_inodes_usage	Log storage inode usage percentage	%
mysql_defaultstorage_disk_inodes_usage	Default storage inode usage percentage	%
mysql_network_rx_bytes_persec	Bytes received per second by network interface	bytes/s(IEC)
mysql_network_tx_bytes_persec	Bytes transmitted per second by network interface	bytes/s(IEC)
mysql_network_rx_packets_persec	Packets received per second by network interface	packets/s
mysql_network_tx_packets_persec	Packets transmitted per second by network interface	packets/s
mysql_innodb_row_lock_current_waits	Number of current row lock waits	count
mysql_binary_size_bytes	Size of binary logs	bytes(IEC)
mysql_binary_files_count	Number of binary log files	count
mysql_variables_max_binlog_size	Maximum binary log size	bytes(IEC)
mysql_connections_count	Number of connected connections	count
mysql_slow_query_count	Number of slow queries in the last 5 minutes	count
mysql_com_insert_count	Number of INSERT queries in the last 5 minutes	count
mysql_com_select_count	Number of SELECT queries in the last 5 minutes	count
mysql_com_delete_count	Number of DELETE queries in the last 5 minutes	count
mysql_com_commit_count	Number of COMMIT queries in the last 5 minutes	count
mysql_com_update_count	Number of UPDATE queries in the last 5 minutes	count
mysql_query_persec	Queries per second (QPS)	count/s
mysql_connection_usage_percent	Percentage of connections used compared to max connections	%
mysql_innodb_buffer_pool_read_requests	Total buffer pool read requests	count
mysql_innodb_row_lock_time	Row lock time	milliseconds
mysql_innodb_buffer_pool_reads	Number of buffer pool reads	count
mysql_innodb_buffer_cache_hit_ratio	MySQL InnoDB buffer pool cache hit ratio	%
mysql_uptime	Uptime duration	duration
mysql_instance_status	Instance status	count
mysql_instance_group_status	Instance group status	count
mysql_replication_lag	Binlog replication lag	seconds
mysql_max_connections_count	Maximum number of connections	count

PostgreSQL Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Metric Export, Alert Center - Alert Policies (Metric-based)

Metric Name	Description	Unit
pg_active_connections	Number of active PostgreSQL connections	count
pg_active_transactions	Number of active PostgreSQL transactions	count
pg_buffer_hit_ratio	PostgreSQL buffer hit ratio	%
pg_defaultstorage_disk_inodes_usage	Default storage inode usage percentage	%
pg_defaultstorage_disk_read_bytes_persec	Bytes read per second from default storage disk	bytes/s(IEC)
pg_defaultstorage_disk_read_iops	Read operations per second on default storage disk	count/s
pg_defaultstorage_disk_used	Default storage disk usage	bytes(IEC)
pg_defaultstorage_disk_used_percent	Default storage disk usage percentage	%
pg_defaultstorage_disk_write_bytes_persec	Bytes written per second to default storage disk	bytes/s(IEC)
pg_defaultstorage_disk_write_iops	Write operations per second on default storage disk	count/s
pg_lock_sessions	Number of lock sessions in PostgreSQL	count
pg_logstorage_disk_inodes_usage	Log storage inode usage percentage	%
pg_logstorage_disk_read_bytes_persec	Bytes read per second from log storage disk	bytes/s(IEC)
pg_logstorage_disk_read_iops	Read operations per second from log storage disk	count/s
pg_logstorage_disk_used	Log storage disk usage	bytes(IEC)
pg_logstorage_disk_used_percent	Log storage disk usage percentage	%
pg_logstorage_disk_write_bytes_persec	Bytes written per second to log storage disk	bytes/s(IEC)
pg_logstorage_disk_write_iops	Write operations per second to log storage disk	count/s
pg_network_rx_bytes_persec	Bytes received per second by network interface	bytes/s(IEC)
pg_network_rx_packets_persec	Packets received per second by network interface	packets/s
pg_network_tx_bytes_persec	Bytes transmitted per second by network interface	bytes/s(IEC)
pg_network_tx_packets_persec	Packets transmitted per second by network interface	packets/s
pg_replication_lag	PostgreSQL replication lag	seconds
pg_temp_file_ratio_per_group	PostgreSQL temporary file usage per instance group	%
pg_total_connections	Total number of PostgreSQL connections	count
pg_total_deadlocks	Total number of deadlocks in PostgreSQL	count
pg_xid_age_per_group	PostgreSQL vacuum xid per instance group	count

MemStore Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Metric Export, Alert Center - Alert Policies (Metric-based)

Metric Name	Description	Unit
memstore_allocator_rss_bytes	RSS memory size	bytes(IEC)
memstore_clients	Number of connected clients	count
memstore_connected_slaves	Number of connected replicas	count
memstore_evicted_keys	Number of evicted keys due to maxmemory limit	count
memstore_expired_keys	Number of expired keys	count
memstore_instantaneous_ops_per_sec	Commands processed per second	count
memstore_client_ratio	Ratio of current clients to max clients	%
memstore_memory_usage	Memory usage percentage in MemStore instance	%
memstore_keyspace_hits	Number of keyspace hits	count
memstore_keyspace_misses	Number of keyspace misses	count
memstore_maxclients	Maximum number of clients allowed	count
memstore_maxmemory	Maximum available memory	bytes(IEC)
memstore_replication_lag	Replication lag time	s
memstore_uptime	Uptime duration	s
memstore_used_memory	Used memory in MemStore	bytes(IEC)
memstore_cmdstat_calls_persec	Number of command calls per second	count/s
memstore_keyspace_hitrate_percent	Keyspace hit rate percentage	%
memstore_lru_clock	LRU (Least Recently Used) time value for managing algorithm	count
memstore_blocked_clients	Number of clients waiting due to BLPOP, BRPOP, etc.	count
memstore_cluster_connections	Number of sockets used by the cluster bus	count
memstore_allocator_active	Active memory in allocator, including external fragmentation	bytes(IEC)
memstore_allocator_allocated	Allocated memory in allocator, including internal fragmentation	bytes(IEC)
memstore_allocator_resident	Resident memory managed by allocator, including OS-returnable memory	bytes(IEC)
memstore_allocator_frag_bytes	Difference between activated memory and allocated memory	bytes(IEC)
memstore_allocator_frag_ratio	Ratio between activated memory and allocated memory	%
memstore_allocator_rss_ratio	Ratio between resident memory and activated memory	%
memstore_lazyfree_pending_objects	Number of objects waiting to be freed due to UNLINK calls or ASYNC option	count
memstore_lazyfreed_objects	Number of objects freed via Lazy Free process	count
memstore_mem_fragmentation_bytes	Difference between used resident memory and allocated memory	bytes(IEC)
memstore_mem_fragmentation_ratio	Ratio between used resident memory and allocated memory	%
memstore_mem_not_counted_for_evict	Memory not counted for eviction due to temporary replicas and AOF buffers	bytes(IEC)
memstore_rss_overhead_bytes	Difference between MemStore process resident memory and allocator's resident memory	bytes(IEC)
memstore_rss_overhead_ratio	Ratio between MemStore process resident memory and allocator's resident memory	%
memstore_total_system_memory	Total system memory where MemStore is running	bytes(IEC)
memstore_used_memory_dataset	Memory used for actual data storage, considering overhead memory	bytes(IEC)
memstore_used_memory_dataset_perc	Percentage of memory used for actual data storage	%
memstore_used_memory_lua	Memory used by Lua engine for script execution	bytes(IEC)
memstore_used_memory_overhead	Memory needed for managing internal data structures	bytes(IEC)
memstore_used_memory_peak	Maximum memory used by MemStore	bytes(IEC)
memstore_used_memory_peak_perc	Percentage of maximum memory used relative to total memory	%
memstore_used_memory_rss	Resident set size memory assigned by the OS	bytes(IEC)
memstore_instantaneous_input_kbps	Data read from network per second	KiB/s(IEC)
memstore_instantaneous_output_kbps	Data written to network per second	KiB/s(IEC)
memstore_io_threaded_reads_processed	Total number of read events processed by main and I/O threads	count
memstore_io_threaded_writes_processed	Total number of write events processed by main and I/O threads	count
memstore_pubsub_channels	Number of pub/sub channels subscribed by clients	count
memstore_pubsub_patterns	Number of pub/sub patterns subscribed by clients	count
memstore_total_commands_processed	Total number of commands processed by the server	count
memstore_total_connections_received	Total number of connections accepted by the server	count
memstore_total_error_replies	Total number of error replies (sum of rejected

info

The metrics cpu_credit_usage and cpu_credit_balance are only collected on clusters where the flavor is set to t1i.

Metric Name	Description	Unit
cpu_credit_usage	CPU credit usage	count
cpu_credit_balance	Remaining CPU credits	count

Pub/Sub Metrics

Service Scope

Monitoring - Custom Dashboard, Metric Explorer, Alert Center - Alert Policies (Metric-based)

Metric Name	Description	Unit
pubsub_published_message_count_persec	Messages published per second	count/s
pubsub_published_message_bytes_persec	Bytes of messages published per second	bytes/s(IEC)
pubsub_publish_request_count_persec	Publish requests per second	count/s
pubsub_topic_storage_used_bytes	Topic storage used	bytes(IEC)
pubsub_seek_request_count_permin	Seek requests per 5 minutes	count
pubsub_ack_request_count_persec	Acknowledgment requests per second	count/s
pubsub_acked_message_count_persec	Acknowledged messages per second	count/s
pubsub_unprocessed_messages	Number of unprocessed messages	count
pubsub_pulled_message_count_persec	Messages pulled per second	count/s
pubsub_streaming_pull_response_count_persec	Streaming pull responses per second	count/s
pubsub_push_count_persec	Push requests per second	count/s
pubsub_pushed_message_count_persec	Pushed messages per second	count/s
pubsub_subscription_storage_used_bytes	Subscription storage used	bytes(IEC)
pubsub_exported_message_count_persec	Messages exported to Object Storage per second	count/s
pubsub_object_storage_api_call_count_permin	Object Storage API calls per minute	count/m

Direct Connect Metrics

Service Scope

Monitoring - Metric Export

Metric Name	Description	Unit
dx_virtual_intrerface_state	Direct Connect virtual interface state	count
dx_virtual_interface_output_packets_persec	Packets transmitted per second by Direct Connect virtual interface	packets/s
dx_virtual_interface_output_bits_persec	Bits transmitted per second by Direct Connect virtual interface	bits/s(IEC)
dx_virtual_interface_input_packets_persec	Packets received per second by Direct Connect virtual interface	packets/s
dx_virtual_interface_input_bits_persec	Incoming bits per second on the Direct Connect virtual interface	bits/s (IEC)

Gateway Load Balancer metrics

Scope of Provided Services

Monitoring - Metric Export

Metric Name	Description	Unit
gwlb_bytes_in_persec	Total bytes received by the Gateway Load Balancer	bytes/s (IEC)
gwlb_bytes_out_persec	Total bytes sent by the Gateway Load Balancer	bytes/s (IEC)
gwlb_current_connections	Number of currently active connections	count
gwlb_healthy_host_count	Number of healthy hosts	count
gwlb_unhealthy_host_count	Number of unhealthy hosts	count
eps_bytes_in_persec	Total bytes received by the endpoint service	bytes/s (IEC)
eps_bytes_out_persec	Total bytes sent by the endpoint service	bytes/s (IEC)
eps_current_connections	Number of currently active connections in the endpoint service	count
eps_endpoint_count	Number of connected endpoints	count
ep_bytes_in_persec	Total bytes received by the endpoint	bytes/s (IEC)
ep_bytes_out_persec	Total bytes sent by the endpoint	bytes/s (IEC)
ep_current_connections	Number of currently active connections at the endpoint	count

Private endpoint metrics

Scope of Provided Services

Monitoring - Metric Export

Metric Name	Description	Unit
ep_bytes_in_persec	Total bytes received by the endpoint	bytes/s (IEC)
ep_bytes_out_persec	Total bytes sent by the endpoint	bytes/s (IEC)
ep_current_connections	Number of currently active connections at the endpoint	count

Virtual Machine, GPU, Bare Metal Server Metrics​

Libvirt metrics​

Kubernetes Engine Metrics​

Load Balancing Metrics​

MySQL Metrics​

PostgreSQL Metrics​

MemStore Metrics​

Pub/Sub Metrics​

Direct Connect Metrics​

Gateway Load Balancer metrics​

Private endpoint metrics​

Virtual Machine, GPU, Bare Metal Server Metrics

Libvirt metrics

Kubernetes Engine Metrics

Load Balancing Metrics

MySQL Metrics

PostgreSQL Metrics

MemStore Metrics

Pub/Sub Metrics

Direct Connect Metrics

Gateway Load Balancer metrics

Private endpoint metrics