Key Concepts
KakaoCloud’s Monitoring service supports rapid issue detection and response by monitoring the status of computing resources and providing notifications when events occur. Users can monitor key resources in real time from the dashboard in a web environment and configure a systematic monitoring system by setting metric and log policies. The Monitoring service allows for flexible and efficient resource management, minimizing the resources needed for administration.
Monitoring service system architecture
The Monitoring service is designed for users to configure multiple policies necessary for resource operations and management, enabling the collection of specific data as needed. In case of failure, the monitoring history can be checked through notifications, allowing for quick issue identification.
Monitoring service architecture
Key Concepts
Dashboard
The Monitoring service dashboard provides real-time monitoring of key resources. The types of dashboards available are as follows:
Type | Description |
---|---|
Default dashboard | The default dashboard provided by KakaoCloud allows users to view metrics for in-use resources without additional configuration - Users cannot modify the default dashboard and can only view the provided metrics |
Custom dashboard | A user-created dashboard where desired service metric charts can be added for management - See Monitoring Metrics for supported metrics |
The KakaoCloud monitoring agent must be installed to view metrics.
For installation instructions, refer to Install Agent.
Monitoring supported services
Category | Service details |
---|---|
Monitoring-supported services | Beyond Compute Service ㄴ Virtual Machine ㄴ Bare Metal Server ㄴ GPU Beyond Networking Service ㄴ Load Balancing Container Pack ㄴ Kubernetes Engine (default) Data Store ㄴ MySQL ㄴ PostgreSQL ㄴ MemStore Analytics ㄴ Pub/Sub |
Monitoring metrics
Key computing metrics
- kr-central-1 & 2 Common
- kr-central-2
The mem_buffered
, mem_cached
, and disk_inodes_usage
metrics are collected and provided only for servers with Linux OS installed.
The nvidia_smi
metric is collected only for servers equipped with a GPU.
When updating the NVIDIA library on a GPU instance, ensure compatibility between the library version and the CUDA version.
If an update is performed via apt upgrade
without checking compatibility,
the monitoring agent installed by the user may fail to collect Nvidia-related metrics.
Metric name | Description | Unit |
---|---|---|
cpu_usage | Measures overall CPU usage | % |
cpu_usage_iowait | CPU usage rate, CPU state: iowait | % |
cpu_usage_system | CPU usage rate, CPU state: system | % |
cpu_usage_user | CPU usage rate, CPU state: user | % |
cpu_usage_per_core | Measures CPU usage per core | % |
mem_buffered | Memory usage, memory state: buffered | bytes(IEC) |
mem_cached | Memory usage, memory state: cached | bytes(IEC) |
mem_used | Memory usage | bytes(IEC) |
mem_usage | Memory usage rate | % |
disk_used | Disk usage | bytes(IEC) |
disk_used_percent | Disk usage rate | % |
disk_inodes_usage | Disk inode usage rate | % |
disk_read_bytes_persec | Bytes read per second from disk | bytes/s(IEC) |
disk_write_bytes_persec | Bytes written per second to disk | bytes/s(IEC) |
disk_read_iops | Number of input operations completed per second on disk | count/s |
disk_write_iops | Number of output operations completed per second on disk | count/s |
network_rx_bytes_persec | Bytes received per second on network interface | bytes/s(IEC) |
network_tx_bytes_persec | Bytes sent per second on network interface | bytes/s(IEC) |
network_rx_packets_persec | Packets received per second on network interface | packets/s |
network_tx_packets_persec | Packets sent per second on network interface | packets/s |
nvidia_smi_memory_free | Free memory per GPU core | MiB(IEC) |
nvidia_smi_memory_total | Total memory per GPU core | MiB(IEC) |
nvidia_smi_memory_used | Used memory per GPU core | MiB(IEC) |
nvidia_smi_power_draw | Power consumption per GPU core | watt |
nvidia_smi_utilization_gpu | GPU utilization per core | % |
The cpu_credit_usage
and cpu_credit_balance
metrics are collected only for t1i servers with the Burstable option applied (excluding t1i.medium.dns.default).
Metric name | Description | Unit |
---|---|---|
cpu_credit_usage | CPU credit usage | count |
cpu_credit_balance | Remaining CPU credits | count |
Key Load Balancing metrics
Metric name | Description | Unit |
---|---|---|
lb_bytes_in_persec | Inbound traffic | bytes/s(IEC) |
lb_bytes_out_persec | Outbound traffic | bytes/s(IEC) |
lb_connections_persec | Connections per second | count/s |
lb_current_connections | Active connections | count |
Key MySQL metrics
Metric name | Description | Unit |
---|---|---|
mem_swap_total | Total swap memory | bytes(IEC) |
mem_swap_cached | Cached swap memory | bytes(IEC) |
mem_swap_free | Free swap memory | bytes(IEC) |
mysql_logstorage_disk_write_bytes_persec | Bytes written per second to log storage disk | bytes/s(IEC) |
mysql_defaultstorage_disk_write_bytes_persec | Bytes written per second to default storage disk | bytes/s(IEC) |
mysql_logstorage_disk_read_bytes_persec | Bytes read per second from log storage disk | bytes/s(IEC) |
mysql_defaultstorage_disk_read_bytes_persec | Bytes read per second from default storage disk | bytes/s(IEC) |
mysql_logstorage_disk_write_iops | Write operations completed per second on log storage disk | count/s |
mysql_defaultstorage_disk_write_iops | Write operations completed per second on default storage disk | count/s |
mysql_logstorage_disk_read_iops | Read operations completed per second on log storage disk | count/s |
mysql_defaultstorage_disk_read_iops | Read operations completed per second on default storage disk | count/s |
mysql_logstorage_disk_used | Log storage disk usage | bytes(IEC) |
mysql_defaultstorage_disk_used | Default storage disk usage | bytes(IEC) |
mysql_defaultstorage_disk_used_percent | Default storage disk usage rate | % |
mysql_logstorage_disk_used_percent | Log storage disk usage rate | % |
mysql_logstorage_disk_inodes_usage | Log storage inode usage rate | % |
mysql_defaultstorage_disk_inodes_usage | Default storage inode usage rate | % |
mysql_network_rx_bytes_persec | Bytes received per second on network interface | bytes/s(IEC) |
mysql_network_tx_bytes_persec | Bytes sent per second on network interface | bytes/s(IEC) |
mysql_network_rx_packets_persec | Packets received per second on network interface | packets/s |
mysql_network_tx_packets_persec | Packets sent per second on network interface | packets/s |
mysql_innodb_row_lock_current_waits | Current row lock count | count |
mysql_binary_size_bytes | Binary log size | bytes(IEC) |
mysql_binary_files_count | Binary log file count | count |
mysql_variables_max_binlog_size | Maximum binary log size | bytes(IEC) |
mysql_connections_count | Active connections count | count |
mysql_slow_query_count | Number of slow queries executed in 5 minutes | count |
mysql_com_insert_count | Number of INSERT queries executed in 5 minutes | count |
mysql_com_select_count | Number of SELECT queries executed in 5 minutes | count |
mysql_com_delete_count | Number of DELETE queries executed in 5 minutes | count |
mysql_com_commit_count | Number of COMMIT queries executed in 5 minutes | count |
mysql_com_update_count | Number of UPDATE queries executed in 5 minutes | count |
mysql_query_persec | Queries per second (QPS) | count/s |
mysql_connection_usage_percent | Ratio of active connections to max connections | % |
mysql_innodb_buffer_pool_read_requests | Total buffer pool requests | count |
mysql_innodb_row_lock_time | Row lock time | milliseconds |
mysql_innodb_buffer_pool_reads | Buffer pool read requests | count |
mysql_innodb_buffer_cache_hit_ratio | MySQL InnoDB buffer pool cache hit rate | % |
mysql_uptime | Uptime | duration |
mysql_instance_status | Instance status | count |
mysql_instance_group_status | Instance group status | count |
mysql_replication_lag | Binlog replication delay | seconds |
mysql_max_connections_count | Maximum number of connections allowed | count |
Key PostgreSQL metrics
Metric name | Description | Unit |
---|---|---|
pg_active_connections | Number of active PostgreSQL connections | count |
pg_active_transactions | Number of active PostgreSQL transactions | count |
pg_buffer_hit_ratio | PostgreSQL buffer hit ratio | % |
pg_defaultstorage_disk_inodes_usage | Default storage inode usage rate | % |
pg_defaultstorage_disk_read_bytes_persec | Bytes read per second from default storage disk | bytes/s(IEC) |
pg_defaultstorage_disk_read_iops | Read operations completed per second on default storage disk | count/s |
pg_defaultstorage_disk_used | Default storage disk usage | bytes(IEC) |
pg_defaultstorage_disk_used_percent | Default storage disk usage rate | % |
pg_defaultstorage_disk_write_bytes_persec | Bytes written per second to default storage disk | bytes/s(IEC) |
pg_defaultstorage_disk_write_iops | Write operations completed per second on default storage disk | count/s |
pg_lock_sessions | Number of PostgreSQL lock sessions | count |
pg_logstorage_disk_inodes_usage | Log storage inode usage rate | % |
pg_logstorage_disk_read_bytes_persec | Bytes read per second from log storage disk | bytes/s(IEC) |
pg_logstorage_disk_read_iops | Read operations completed per second on log storage disk | count/s |
pg_logstorage_disk_used | Log storage disk usage | bytes(IEC) |
pg_logstorage_disk_used_percent | Log storage disk usage rate | % |
pg_logstorage_disk_write_bytes_persec | Bytes written per second to log storage disk | bytes/s(IEC) |
pg_logstorage_disk_write_iops | Write operations completed per second on log storage disk | count/s |
pg_network_rx_bytes_persec | Bytes received per second on network interface | bytes/s(IEC) |
pg_network_rx_packets_persec | Packets received per second on network interface | packets/s |
pg_network_tx_bytes_persec | Bytes sent per second on network interface | bytes/s(IEC) |
pg_network_tx_packets_persec | Packets sent per second on network interface | packets/s |
pg_replication_lag | PostgreSQL replication lag time | seconds |
pg_temp_file_ratio_per_group | PostgreSQL temporary file usage ratio per instance group | % |
pg_total_connections | Number of PostgreSQL connections | count |
pg_total_deadlocks | Number of PostgreSQL deadlocks | count |
pg_xid_age_per_group | PostgreSQL vacuum XID per instance group | count |
Key MemStore metrics
- kr-central-1 & 2 Common
- kr-central-2
Metric name | Description | Unit |
---|---|---|
memstore_allocator_rss_bytes | RSS memory size | bytes(IEC) |
memstore_clients | Number of connected connections | count |
memstore_connected_slaves | Number of connected replicas | count |
memstore_evicted_keys | Number of keys evicted due to maxmemory limit | count |
memstore_expired_keys | Number of expired keys | count |
memstore_instantaneous_ops_per_sec | Commands processed per second | count |
memstore_client_ratio | Ratio of current clients to max clients | % |
memstore_memory_usage | Memory usage rate of the MemStore instance | % |
memstore_keyspace_hits | Number of key hits | count |
memstore_keyspace_misses | Number of key misses | count |
memstore_maxclients | Maximum number of connections allowed | count |
memstore_maxmemory | Maximum usable memory | bytes(IEC) |
memstore_replication_lag | Replication delay time | s |
memstore_uptime | Uptime | s |
memstore_used_memory | Used memory in MemStore | bytes(IEC) |
memstore_cmdstat_calls_persec | Command calls per second | count/s |
memstore_keyspace_hitrate_percent | Key hit rate | % |
memstore_lru_clock | Time value increasing for LRU (Least Recently Used) management | count |
memstore_blocked_clients | Number of clients waiting due to BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, BZPOPMAX commands | count |
memstore_cluster_connections | Estimated number of sockets used by the cluster bus | count |
memstore_allocator_active | Active memory in the allocator, including external fragmentation | bytes(IEC) |
memstore_allocator_allocated | Memory allocated in the allocator, including internal fragmentation | bytes(IEC) |
memstore_allocator_resident | Resident memory managed by the allocator, including memory that can be returned to the OS | bytes(IEC) |
memstore_allocator_frag_bytes | Difference between active memory and allocated memory in the allocator | bytes(IEC) |
memstore_allocator_frag_ratio | Ratio of active memory to allocated memory in the allocator | % |
memstore_allocator_rss_ratio | Ratio of resident memory to active memory in the allocator | % |
memstore_lazyfree_pending_objects | Number of objects waiting to be freed due to UNLINK calls or ASYNC options in FLUSHDB and FLUSHALL | count |
memstore_lazyfreed_objects | Number of objects freed through the Lazy Free process | count |
memstore_mem_fragmentation_bytes | Difference between resident memory and allocated memory in MemStore | bytes(IEC) |
memstore_mem_fragmentation_ratio | Ratio of resident memory to allocated memory in MemStore | % |
memstore_mem_not_counted_for_evict | Memory excluded from eviction calculations, such as temporary replicas and AOF buffers | bytes(IEC) |
memstore_rss_overhead_bytes | Difference between the resident memory of the MemStore process and the resident memory managed by the allocator | bytes(IEC) |
memstore_rss_overhead_ratio | Ratio of resident memory of the MemStore process to the resident memory managed by the allocator | % |
memstore_total_system_memory | Total memory of the system running MemStore | bytes(IEC) |
memstore_used_memory_dataset | Memory used for actual data storage, considering overhead memory | bytes(IEC) |
memstore_used_memory_dataset_perc | Ratio of memory used for actual data storage to total memory, considering overhead | % |
memstore_used_memory_lua | Memory used by the Lua engine for executing scripts | bytes(IEC) |
memstore_used_memory_overhead | All overhead memory required for managing internal data structures | bytes(IEC) |
memstore_used_memory_peak | Maximum memory used by MemStore | bytes(IEC) |
memstore_used_memory_peak_perc | Ratio of peak memory usage to total memory usage | % |
memstore_used_memory_rss | Memory allocated by the OS (resident set size) | bytes(IEC) |
memstore_instantaneous_input_kbps | Data read speed from the network per second | KiB/s(IEC) |
memstore_instantaneous_output_kbps | Data write speed to the network per second | KiB/s(IEC) |
memstore_io_threaded_reads_processed | Total number of read events processed by main and I/O threads | count |
memstore_io_threaded_writes_processed | Total number of write events processed by main and I/O threads | count |
memstore_pubsub_channels | Number of pub/sub channels subscribed by clients | count |
memstore_pubsub_patterns | Number of pub/sub patterns subscribed by clients | count |
memstore_total_commands_processed | Total number of commands processed by the server | count |
memstore_total_connections_received | Total number of connections accepted by the server | count |
memstore_total_error_replies | Total number of error responses (sum of rejected and failed commands) | count |
memstore_total_net_input_bytes | Total network input bytes | bytes(IEC) |
memstore_total_net_output_bytes | Total network output bytes | bytes(IEC) |
memstore_total_reads_processed | Total number of read events processed | count |
memstore_total_writes_processed | Total number of write events processed | count |
memstore_used_cpu_sys | System CPU used by all threads (main and background) of the server process | count |
memstore_used_cpu_sys_main_thread | System CPU used by the main thread | count |
memstore_used_cpu_user | User CPU used by all threads (main and background) of the server process | count |
memstore_used_cpu_user_main_thread | User CPU used by the main thread | count |
memstore_cluster_enabled | Cluster activation status | count |
The cpu_credit_usage
and cpu_credit_balance
metrics are collected only for clusters with the t1i
flavor.
Metric name | Description | Unit |
---|---|---|
cpu_credit_usage | CPU credit usage | count |
cpu_credit_balance | Remaining CPU credits | count |
Key Pub/Sub metrics
- kr-central-1 & 2 Common
- kr-central-2
Metric name | Description | Unit |
---|---|---|
pubsub_published_message_count_persec | Number of messages published per second | count/s |
pubsub_published_message_bytes_persec | Size of messages published per second | bytes/s(IEC) |
pubsub_publish_request_count_persec | Number of publish requests per second | count/s |
pubsub_topic_storage_used_bytes | Topic storage size | bytes(IEC) |
pubsub_seek_request_count_permin | Number of seek requests in 5 minutes | count |
pubsub_ack_request_count_persec | Number of acknowledgment requests per second | count/s |
pubsub_acked_message_count_persec | Number of acknowledged messages per second | count/s |
pubsub_unprocessed_messages | Number of unprocessed messages | count |
pubsub_pulled_message_count_persec | Number of pulled messages per second | count/s |
pubsub_streaming_pull_response_count_persec | Number of streaming pull responses per second | count/s |
pubsub_push_count_persec | Number of push requests per second | count/s |
pubsub_pushed_message_count_persec | Number of pushed messages per second | count/s |
pubsub_subscription_storage_used_bytes | Subscription storage size | bytes(IEC) |
pubsub_exported_message_count_persec | Number of messages exported to Object Storage per second | count/s |
Metric name | Description | Unit |
---|---|---|
pubsub_object_storage_api_call_count_permin | Number of Object Storage API calls per minute | count/m |