Skip to main content

Key Concepts

KakaoCloud’s Monitoring service supports rapid issue detection and response by monitoring the status of computing resources and providing notifications when events occur. Users can monitor key resources in real time from the dashboard in a web environment and configure a systematic monitoring system by setting metric and log policies. The Monitoring service allows for flexible and efficient resource management, minimizing the resources needed for administration.

Monitoring service system architecture

The Monitoring service is designed for users to configure multiple policies necessary for resource operations and management, enabling the collection of specific data as needed. In case of failure, the monitoring history can be checked through notifications, allowing for quick issue identification.

image Monitoring service architecture

Key Concepts

Dashboard

The Monitoring service dashboard provides real-time monitoring of key resources. The types of dashboards available are as follows:

TypeDescription
Default dashboardThe default dashboard provided by KakaoCloud allows users to view metrics for in-use resources without additional configuration
- Users cannot modify the default dashboard and can only view the provided metrics
Custom dashboardA user-created dashboard where desired service metric charts can be added for management
- See Monitoring Metrics for supported metrics
info

The KakaoCloud monitoring agent must be installed to view metrics.
For installation instructions, refer to Install Agent.

Monitoring supported services

CategoryService details
Monitoring-supported services   Beyond Compute Service
 ㄴ Virtual Machine
 ㄴ Bare Metal Server
 ㄴ GPU
Beyond Networking Service
 ㄴ Load Balancing
Container Pack
 ㄴ Kubernetes Engine (default)
Data Store
 ㄴ MySQL
 ㄴ PostgreSQL
 ㄴ MemStore
Analytics
 ㄴ Pub/Sub

Monitoring metrics

Key computing metrics

info

The mem_buffered, mem_cached, and disk_inodes_usage metrics are collected and provided only for servers with Linux OS installed.

The nvidia_smi metric is collected only for servers equipped with a GPU.

info

When updating the NVIDIA library on a GPU instance, ensure compatibility between the library version and the CUDA version.
If an update is performed via apt upgrade without checking compatibility,
the monitoring agent installed by the user may fail to collect Nvidia-related metrics.

Metric nameDescriptionUnit
cpu_usageMeasures overall CPU usage%
cpu_usage_iowaitCPU usage rate, CPU state: iowait%
cpu_usage_systemCPU usage rate, CPU state: system%
cpu_usage_userCPU usage rate, CPU state: user%
cpu_usage_per_coreMeasures CPU usage per core%
mem_bufferedMemory usage, memory state: bufferedbytes(IEC)
mem_cachedMemory usage, memory state: cachedbytes(IEC)
mem_usedMemory usagebytes(IEC)
mem_usageMemory usage rate%
disk_usedDisk usagebytes(IEC)
disk_used_percentDisk usage rate%
disk_inodes_usageDisk inode usage rate%
disk_read_bytes_persecBytes read per second from diskbytes/s(IEC)
disk_write_bytes_persecBytes written per second to diskbytes/s(IEC)
disk_read_iopsNumber of input operations completed per second on diskcount/s
disk_write_iopsNumber of output operations completed per second on diskcount/s
network_rx_bytes_persecBytes received per second on network interfacebytes/s(IEC)
network_tx_bytes_persecBytes sent per second on network interfacebytes/s(IEC)
network_rx_packets_persecPackets received per second on network interfacepackets/s
network_tx_packets_persecPackets sent per second on network interfacepackets/s
nvidia_smi_memory_freeFree memory per GPU coreMiB(IEC)
nvidia_smi_memory_totalTotal memory per GPU coreMiB(IEC)
nvidia_smi_memory_usedUsed memory per GPU coreMiB(IEC)
nvidia_smi_power_drawPower consumption per GPU corewatt
nvidia_smi_utilization_gpuGPU utilization per core%

Key Load Balancing metrics

Metric nameDescriptionUnit
lb_bytes_in_persecInbound trafficbytes/s(IEC)
lb_bytes_out_persecOutbound trafficbytes/s(IEC)
lb_connections_persecConnections per secondcount/s
lb_current_connectionsActive connectionscount

Key MySQL metrics

Metric nameDescriptionUnit
mem_swap_totalTotal swap memorybytes(IEC)
mem_swap_cachedCached swap memorybytes(IEC)
mem_swap_freeFree swap memorybytes(IEC)
mysql_logstorage_disk_write_bytes_persecBytes written per second to log storage diskbytes/s(IEC)
mysql_defaultstorage_disk_write_bytes_persecBytes written per second to default storage diskbytes/s(IEC)
mysql_logstorage_disk_read_bytes_persecBytes read per second from log storage diskbytes/s(IEC)
mysql_defaultstorage_disk_read_bytes_persecBytes read per second from default storage diskbytes/s(IEC)
mysql_logstorage_disk_write_iopsWrite operations completed per second on log storage diskcount/s
mysql_defaultstorage_disk_write_iopsWrite operations completed per second on default storage diskcount/s
mysql_logstorage_disk_read_iopsRead operations completed per second on log storage diskcount/s
mysql_defaultstorage_disk_read_iopsRead operations completed per second on default storage diskcount/s
mysql_logstorage_disk_usedLog storage disk usagebytes(IEC)
mysql_defaultstorage_disk_usedDefault storage disk usagebytes(IEC)
mysql_defaultstorage_disk_used_percentDefault storage disk usage rate%
mysql_logstorage_disk_used_percentLog storage disk usage rate%
mysql_logstorage_disk_inodes_usageLog storage inode usage rate%
mysql_defaultstorage_disk_inodes_usageDefault storage inode usage rate%
mysql_network_rx_bytes_persecBytes received per second on network interfacebytes/s(IEC)
mysql_network_tx_bytes_persecBytes sent per second on network interfacebytes/s(IEC)
mysql_network_rx_packets_persecPackets received per second on network interfacepackets/s
mysql_network_tx_packets_persecPackets sent per second on network interfacepackets/s
mysql_innodb_row_lock_current_waitsCurrent row lock countcount
mysql_binary_size_bytesBinary log sizebytes(IEC)
mysql_binary_files_countBinary log file countcount
mysql_variables_max_binlog_sizeMaximum binary log sizebytes(IEC)
mysql_connections_countActive connections countcount
mysql_slow_query_countNumber of slow queries executed in 5 minutescount
mysql_com_insert_countNumber of INSERT queries executed in 5 minutescount
mysql_com_select_countNumber of SELECT queries executed in 5 minutescount
mysql_com_delete_countNumber of DELETE queries executed in 5 minutescount
mysql_com_commit_countNumber of COMMIT queries executed in 5 minutescount
mysql_com_update_countNumber of UPDATE queries executed in 5 minutescount
mysql_query_persecQueries per second (QPS)count/s
mysql_connection_usage_percentRatio of active connections to max connections%
mysql_innodb_buffer_pool_read_requestsTotal buffer pool requestscount
mysql_innodb_row_lock_timeRow lock timemilliseconds
mysql_innodb_buffer_pool_readsBuffer pool read requestscount
mysql_innodb_buffer_cache_hit_ratioMySQL InnoDB buffer pool cache hit rate%
mysql_uptimeUptimeduration
mysql_instance_statusInstance statuscount
mysql_instance_group_statusInstance group statuscount
mysql_replication_lagBinlog replication delayseconds
mysql_max_connections_countMaximum number of connections allowedcount

Key PostgreSQL metrics

Metric nameDescriptionUnit
pg_active_connectionsNumber of active PostgreSQL connectionscount
pg_active_transactionsNumber of active PostgreSQL transactionscount
pg_buffer_hit_ratioPostgreSQL buffer hit ratio%
pg_defaultstorage_disk_inodes_usageDefault storage inode usage rate%
pg_defaultstorage_disk_read_bytes_persecBytes read per second from default storage diskbytes/s(IEC)
pg_defaultstorage_disk_read_iopsRead operations completed per second on default storage diskcount/s
pg_defaultstorage_disk_usedDefault storage disk usagebytes(IEC)
pg_defaultstorage_disk_used_percentDefault storage disk usage rate%
pg_defaultstorage_disk_write_bytes_persecBytes written per second to default storage diskbytes/s(IEC)
pg_defaultstorage_disk_write_iopsWrite operations completed per second on default storage diskcount/s
pg_lock_sessionsNumber of PostgreSQL lock sessionscount
pg_logstorage_disk_inodes_usageLog storage inode usage rate%
pg_logstorage_disk_read_bytes_persecBytes read per second from log storage diskbytes/s(IEC)
pg_logstorage_disk_read_iopsRead operations completed per second on log storage diskcount/s
pg_logstorage_disk_usedLog storage disk usagebytes(IEC)
pg_logstorage_disk_used_percentLog storage disk usage rate%
pg_logstorage_disk_write_bytes_persecBytes written per second to log storage diskbytes/s(IEC)
pg_logstorage_disk_write_iopsWrite operations completed per second on log storage diskcount/s
pg_network_rx_bytes_persecBytes received per second on network interfacebytes/s(IEC)
pg_network_rx_packets_persecPackets received per second on network interfacepackets/s
pg_network_tx_bytes_persecBytes sent per second on network interfacebytes/s(IEC)
pg_network_tx_packets_persecPackets sent per second on network interfacepackets/s
pg_replication_lagPostgreSQL replication lag timeseconds
pg_temp_file_ratio_per_groupPostgreSQL temporary file usage ratio per instance group%
pg_total_connectionsNumber of PostgreSQL connectionscount
pg_total_deadlocksNumber of PostgreSQL deadlockscount
pg_xid_age_per_groupPostgreSQL vacuum XID per instance groupcount

Key MemStore metrics

Metric nameDescriptionUnit
memstore_allocator_rss_bytesRSS memory sizebytes(IEC)
memstore_clientsNumber of connected connectionscount
memstore_connected_slavesNumber of connected replicascount
memstore_evicted_keysNumber of keys evicted due to maxmemory limitcount
memstore_expired_keysNumber of expired keyscount
memstore_instantaneous_ops_per_secCommands processed per secondcount
memstore_client_ratioRatio of current clients to max clients%
memstore_memory_usageMemory usage rate of the MemStore instance%
memstore_keyspace_hitsNumber of key hitscount
memstore_keyspace_missesNumber of key missescount
memstore_maxclientsMaximum number of connections allowedcount
memstore_maxmemoryMaximum usable memorybytes(IEC)
memstore_replication_lagReplication delay times
memstore_uptimeUptimes
memstore_used_memoryUsed memory in MemStorebytes(IEC)
memstore_cmdstat_calls_persecCommand calls per secondcount/s
memstore_keyspace_hitrate_percentKey hit rate%
memstore_lru_clockTime value increasing for LRU (Least Recently Used) managementcount
memstore_blocked_clientsNumber of clients waiting due to BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX commandscount
memstore_cluster_connectionsEstimated number of sockets used by the cluster buscount
memstore_allocator_activeActive memory in the allocator, including external fragmentationbytes(IEC)
memstore_allocator_allocatedMemory allocated in the allocator, including internal fragmentationbytes(IEC)
memstore_allocator_residentResident memory managed by the allocator, including memory that can be returned to the OSbytes(IEC)
memstore_allocator_frag_bytesDifference between active memory and allocated memory in the allocatorbytes(IEC)
memstore_allocator_frag_ratioRatio of active memory to allocated memory in the allocator%
memstore_allocator_rss_ratioRatio of resident memory to active memory in the allocator%
memstore_lazyfree_pending_objectsNumber of objects waiting to be freed due to UNLINK calls or ASYNC options in FLUSHDB and FLUSHALLcount
memstore_lazyfreed_objectsNumber of objects freed through the Lazy Free processcount
memstore_mem_fragmentation_bytesDifference between resident memory and allocated memory in MemStorebytes(IEC)
memstore_mem_fragmentation_ratioRatio of resident memory to allocated memory in MemStore%
memstore_mem_not_counted_for_evictMemory excluded from eviction calculations, such as temporary replicas and AOF buffersbytes(IEC)
memstore_rss_overhead_bytesDifference between the resident memory of the MemStore process and the resident memory managed by the allocatorbytes(IEC)
memstore_rss_overhead_ratioRatio of resident memory of the MemStore process to the resident memory managed by the allocator%
memstore_total_system_memoryTotal memory of the system running MemStorebytes(IEC)
memstore_used_memory_datasetMemory used for actual data storage, considering overhead memorybytes(IEC)
memstore_used_memory_dataset_percRatio of memory used for actual data storage to total memory, considering overhead%
memstore_used_memory_luaMemory used by the Lua engine for executing scriptsbytes(IEC)
memstore_used_memory_overheadAll overhead memory required for managing internal data structuresbytes(IEC)
memstore_used_memory_peakMaximum memory used by MemStorebytes(IEC)
memstore_used_memory_peak_percRatio of peak memory usage to total memory usage%
memstore_used_memory_rssMemory allocated by the OS (resident set size)bytes(IEC)
memstore_instantaneous_input_kbpsData read speed from the network per secondKiB/s(IEC)
memstore_instantaneous_output_kbpsData write speed to the network per secondKiB/s(IEC)
memstore_io_threaded_reads_processedTotal number of read events processed by main and I/O threadscount
memstore_io_threaded_writes_processedTotal number of write events processed by main and I/O threadscount
memstore_pubsub_channelsNumber of pub/sub channels subscribed by clientscount
memstore_pubsub_patternsNumber of pub/sub patterns subscribed by clientscount
memstore_total_commands_processedTotal number of commands processed by the servercount
memstore_total_connections_receivedTotal number of connections accepted by the servercount
memstore_total_error_repliesTotal number of error responses (sum of rejected and failed commands)count
memstore_total_net_input_bytesTotal network input bytesbytes(IEC)
memstore_total_net_output_bytesTotal network output bytesbytes(IEC)
memstore_total_reads_processedTotal number of read events processedcount
memstore_total_writes_processedTotal number of write events processedcount
memstore_used_cpu_sysSystem CPU used by all threads (main and background) of the server processcount
memstore_used_cpu_sys_main_threadSystem CPU used by the main threadcount
memstore_used_cpu_userUser CPU used by all threads (main and background) of the server processcount
memstore_used_cpu_user_main_threadUser CPU used by the main threadcount
memstore_cluster_enabledCluster activation statuscount

Key Pub/Sub metrics

Metric nameDescriptionUnit
pubsub_published_message_count_persecNumber of messages published per secondcount/s
pubsub_published_message_bytes_persecSize of messages published per secondbytes/s(IEC)
pubsub_publish_request_count_persecNumber of publish requests per secondcount/s
pubsub_topic_storage_used_bytesTopic storage sizebytes(IEC)
pubsub_seek_request_count_perminNumber of seek requests in 5 minutescount
pubsub_ack_request_count_persecNumber of acknowledgment requests per secondcount/s
pubsub_acked_message_count_persecNumber of acknowledged messages per secondcount/s
pubsub_unprocessed_messagesNumber of unprocessed messagescount
pubsub_pulled_message_count_persecNumber of pulled messages per secondcount/s
pubsub_streaming_pull_response_count_persecNumber of streaming pull responses per secondcount/s
pubsub_push_count_persecNumber of push requests per secondcount/s
pubsub_pushed_message_count_persecNumber of pushed messages per secondcount/s
pubsub_subscription_storage_used_bytesSubscription storage sizebytes(IEC)
pubsub_exported_message_count_persecNumber of messages exported to Object Storage per secondcount/s