Key Concepts
KakaoCloud’s Monitoring service supports rapid issue detection and response by monitoring the status of computing resources and providing notifications when events occur. Users can monitor key resources in real time from the dashboard in a web environment and configure a systematic monitoring system by setting metric and log policies. The Monitoring service allows for flexible and efficient resource management, minimizing the resources needed for administration.
Monitoring service system architecture
The Monitoring service is designed for users to configure multiple policies necessary for resource operations and management, enabling the collection of specific data as needed. In case of failure, the monitoring history can be checked through notifications, allowing for quick issue identification.
Monitoring service architecture
Key Concepts
Dashboard
The Monitoring service dashboard provides real-time monitoring of key resources. The types of dashboards available are as follows:
Type | Description |
---|---|
Default dashboard | The default dashboard provided by KakaoCloud allows users to view metrics for in-use resources without additional configuration - Users cannot modify the default dashboard and can only view the provided metrics |
Custom dashboard | A user-created dashboard where desired service metric charts can be added for management - See Monitoring Metrics for supported metrics |
The KakaoCloud monitoring agent must be installed to view metrics.
For installation instructions, refer to Install Agent.
Monitoring supported services
Category | Service details |
---|---|
Monitoring supported services | - Beyond Compute Service ᄂ Virtual Machine ᄂ Bare Metal Server ᄂ GPU - Kubernetes Engine (default) - MySQL - MemStore - Load Balancing |
Monitoring metrics
Key BCS metrics
- kr-central-1 & 2
- kr-central-2
Metrics such as mem_buffered
, mem_cached
, and disk_inodes_usage
are only collected and available on servers with Linux OS installed.
The nvidia_smi
metric is only collected on servers with a GPU installed.
For GPU instance NVIDIA library updates, please check compatibility between the library version and CUDA version.
If the versions are incompatible due to updates through apt upgrade or similar,
metrics related to Nvidia may not be collected by the monitoring agent installed by the user.
Metric name | Description | Unit |
---|---|---|
cpu_usage | Measures total CPU usage | % |
cpu_usage_iowait | CPU usage percentage in iowait state | % |
cpu_usage_system | CPU usage percentage in system state | % |
cpu_usage_user | CPU usage percentage in user state | % |
cpu_usage_per_core | Measures CPU usage per core | % |
mem_buffered | Memory usage in buffered state | bytes(IEC) |
mem_cached | Memory usage in cached state | bytes(IEC) |
mem_used | Memory usage | bytes(IEC) |
mem_usage | Memory usage percentage | % |
disk_used | Disk usage | bytes(IEC) |
disk_used_percent | Disk usage percentage | % |
disk_inodes_usage | Disk inode usage percentage | % |
disk_read_bytes_persec | Bytes read per second from disk | bytes/s(IEC) |
disk_write_bytes_persec | Bytes written per second to disk | bytes/s(IEC) |
disk_read_iops | Completed input operations per second on disk | count/s |
disk_write_iops | Completed output operations per second on disk | count/s |
network_rx_bytes_persec | Bytes received per second on network interface | bytes/s(IEC) |
network_tx_bytes_persec | Bytes sent per second on network interface | bytes/s(IEC) |
network_rx_packets_persec | Packets received per second on network interface | packets/s |
network_tx_packets_persec | Packets sent per second on network interface | packets/s |
nvidia_smi_memory_free | Free memory per GPU core | MiB(IEC) |
nvidia_smi_memory_total | Total memory per GPU core | MiB(IEC) |
nvidia_smi_memory_used | Used memory per GPU core | MiB(IEC) |
nvidia_smi_power_draw | Power consumption per GPU core | watt |
nvidia_smi_utilization_gpu | GPU core utilization rate | % |
Metrics cpu_credit_usage
and cpu_credit_balance
are only collected for t1i servers with the Burstable option (excluding t1i.medium.dns.default).
Metric name | Description | Unit |
---|---|---|
cpu_credit_usage | CPU credit usage | count |
cpu_credit_balance | Remaining CPU credits | count |
Key MemStore metrics
- kr-central-1 & 2
Metric name | Description | Unit |
---|---|---|
memstore_allocator_rss_bytes | RSS memory size | bytes(IEC) |
memstore_clients | Number of connected connections | count |
memstore_connected_slaves | Number of connected replicas | count |
memstore_evicted_keys | Number of keys removed due to maxmemory limit | count |
memstore_expired_keys | Number of expired keys | count |
memstore_instantaneous_ops_per_sec | Commands processed per second | count |
memstore_client_ratio | Ratio of current clients to max clients | % |
memstore_memory_usage | Memory usage by MemStore instance | % |
memstore_keyspace_hits | Number of key hits | count |
memstore_keyspace_misses | Number of key misses | count |
memstore_maxclients | Maximum number of connections allowed | count |
memstore_maxmemory | Maximum memory available | bytes(IEC) |
memstore_replication_lag | Replication lag time | s |
memstore_uptime | Uptime | s |
memstore_used_memory | Memory used by MemStore | bytes(IEC) |
memstore_cmdstat_calls_persec | Command calls per second | count/s |
memstore_keyspace_hitrate_percent | Key hit rate | % |
memstore_lru_clock | LRU (Least Recently Used) clock for tracking elapsed time | count |
memstore_blocked_clients | Number of clients waiting on BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX commands | count |
memstore_cluster_connections | Estimated number of sockets used by cluster bus | count |
memstore_allocator_active | Active memory managed by allocator, including external fragmentation | bytes(IEC) |
memstore_allocator_allocated | Allocated memory in allocator, including internal fragmentation | bytes(IEC) |
memstore_allocator_resident | Resident memory managed by allocator, including reclaimable memory | bytes(IEC) |
memstore_allocator_frag_bytes | Difference between active and allocated memory in allocator | bytes(IEC) |
memstore_allocator_frag_ratio | Ratio of active to allocated memory in allocator | % |
memstore_allocator_rss_ratio | Ratio of resident to active memory in allocator | % |
memstore_lazyfree_pending_objects | Number of objects waiting to be freed by UNLINK or ASYNC options | count |
memstore_lazyfreed_objects | Number of objects freed by lazy free process | count |
memstore_mem_fragmentation_bytes | Difference between resident and allocated memory in MemStore | bytes(IEC) |
memstore_mem_fragmentation_ratio | Ratio of resident to allocated memory in MemStore | % |
memstore_mem_not_counted_for_evict | Memory excluded from eviction calculations | bytes(IEC) |
memstore_rss_overhead_bytes | Difference between resident memory of MemStore and allocator | bytes(IEC) |
memstore_rss_overhead_ratio | Ratio of resident memory of MemStore to allocator | % |
memstore_total_system_memory | System memory available for MemStore | bytes(IEC) |
memstore_used_memory_dataset | Memory used for actual data storage, excluding overhead | bytes(IEC) |
memstore_used_memory_dataset_perc | Percentage of memory used for data storage, excluding overhead | % |
memstore_used_memory_lua | Memory used by Lua engine for script execution | bytes(IEC) |
memstore_used_memory_overhead | All overhead memory required to manage internal data structures | bytes(IEC) |
memstore_used_memory_peak | Peak memory used by MemStore | bytes(IEC) |
memstore_used_memory_peak_perc | Peak memory usage percentage | % |
memstore_used_memory_rss | Resident memory allocated by OS | bytes(IEC) |
memstore_instantaneous_input_kbps | Network input rate in KiB/s | KiB/s(IEC) |
memstore_instantaneous_output_kbps | Network output rate in KiB/s | KiB/s(IEC) |
memstore_io_threaded_reads_processed | Total read events processed by main and I/O threads | count |
memstore_io_threaded_writes_processed | Total write events processed by main and I/O threads | count |
memstore_pubsub_channels | Number of pub/sub channels with subscriptions | count |
memstore_pubsub_patterns | Number of pub/sub patterns with subscriptions | count |
memstore_total_commands_processed | Total number of commands processed by server | count |
memstore_total_connections_received | Total number of connections accepted by server | count |
memstore_total_error_replies | Total number of error responses | count |
memstore_total_net_input_bytes | Total network input bytes | bytes(IEC) |
memstore_total_net_output_bytes | Total network output bytes | bytes(IEC) |
memstore_total_reads_processed | Total number of read events processed | count |
memstore_total_writes_processed | Total number of write events processed | count |
memstore_used_cpu_sys | System CPU used by all threads in server process | count |
memstore_used_cpu_sys_main_thread | System CPU used by main thread | count |
memstore_used_cpu_user | User CPU used by all threads in user process | count |
memstore_used_cpu_user_main_thread | User CPU used by main thread | count |
memstore_cluster_enabled | Cluster enabled status | count |
Key MySQL metrics
Metric name | Description | Unit |
---|---|---|
mem_swap_total | Total swap memory | bytes(IEC) |
mem_swap_cached | Cached swap memory | bytes(IEC) |
mem_swap_free | Free swap memory | bytes(IEC) |
mysql_logstorage_disk_write_bytes_persec | Bytes written per second to log storage disk | bytes/s(IEC) |
mysql_defaultstorage_disk_write_bytes_persec | Bytes written per second to default storage disk | bytes/s(IEC) |
mysql_logstorage_disk_read_bytes_persec | Bytes read per second from log storage disk | bytes/s(IEC) |
mysql_defaultstorage_disk_read_bytes_persec | Bytes read per second from default storage disk | bytes/s(IEC) |
mysql_logstorage_disk_write_iops | Write operations per second on log storage disk | count/s |
mysql_defaultstorage_disk_write_iops | Write operations per second on default storage disk | count/s |
mysql_logstorage_disk_read_iops | Read operations per second on log storage disk | count/s |
mysql_defaultstorage_disk_read_iops | Read operations per second on default storage disk | count/s |
mysql_logstorage_disk_used | Log storage disk usage | bytes(IEC) |
mysql_defaultstorage_disk_used | Default storage disk usage | bytes(IEC) |
mysql_defaultstorage_disk_used_percent | Default storage disk usage percentage | % |
mysql_logstorage_disk_used_percent | Log storage disk usage percentage | % |
mysql_logstorage_disk_inodes_usage | Log storage inode usage percentage | % |
mysql_defaultstorage_disk_inodes_usage | Default storage inode usage percentage | % |
mysql_network_rx_bytes_persec | Bytes received per second on network interface | bytes/s(IEC) |
mysql_network_tx_bytes_persec | Bytes sent per second on network interface | bytes/s(IEC) |
mysql_innodb_row_lock_current_waits | Number of current row locks | count |
mysql_binary_size_bytes | Binary log size | bytes(IEC) |
mysql_binary_files_count | Number of binary log files | count |
mysql_variables_max_binlog_size | Maximum binary log size | bytes(IEC) |
mysql_connections_count | Number of connections | count |
mysql_slow_query_count | Number of slow queries in last 5 minutes | count |
mysql_com_insert_count | Number of INSERT queries in last 5 minutes | count |
mysql_com_select_count | Number of SELECT queries in last 5 minutes | count |
mysql_com_delete_count | Number of DELETE queries in last 5 minutes | count |
mysql_com_commit_count | Number of COMMIT queries in last 5 minutes | count |
mysql_com_update_count | Number of UPDATE queries in last 5 minutes | count |
mysql_query_persec | Queries per second (QPS) | count/s |
mysql_connection_usage_percent | Ratio of connected to max connections | % |
mysql_innodb_buffer_pool_read_requests | Total buffer pool read requests | count |
mysql_innodb_row_lock_time | Row lock time | milliseconds |
mysql_innodb_buffer_pool_reads | Buffer pool read requests | count |
mysql_innodb_buffer_cache_hit_ratio | InnoDB buffer pool cache hit ratio | % |
mysql_uptime | Uptime | duration |
mysql_instance_status | Instance status | count |
mysql_instance_group_status | Instance group status | count |
mysql_replication_lag | Binlog replication lag | seconds |
mysql_max_connections_count | Maximum number of connections allowed | count |
Key Load Balancing metrics
Metric name | Description | Unit |
---|---|---|
lb_bytes_in_persec | Inbound traffic | bytes/s(IEC) |
lb_bytes_out_persec | Outbound traffic | bytes/s(IEC) |
lb_connections_persec | Connections per second | count/s |
lb_current_connections | Current connections | count |