指标说明

prometheus采集指标

Prometheus服务通过/metrics路径暴露了其内置的度量数据,再由Grafana报表系统来展示。 用户可以直接通过http://localhost:9090/metrics 查看

进程指标

度量名 说明
process_cpu_seconds_total Total user and system CPU time spent in seconds.
process_max_fds Maximum number of open file descriptors.
process_open_fds Number of open file descriptors.
process_resident_memory_bytes Resident memory size in bytes.
process_start_time_seconds Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes Virtual memory size in bytes.
process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.

网络指标

度量名 说明
net_conntrack_dialer_conn_attempted_total Total number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_total Total number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_total Total number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_total Total number of connections failed to dial by the dialer a given name.
net_conntrack_listener_conn_accepted_total Total number of connections opened to the listener of a given name.
net_conntrack_listener_conn_closed_total Total number of connections closed that were made to the listener of a given name.

通用指标

度量名 说明
prometheus_api_remote_read_queries The current number of remote read queries being executed or waiting.
prometheus_build_info A metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which prometheus was built.
prometheus_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
prometheus_config_last_reload_successful Whether the last configuration reload attempt was successful.
prometheus_engine_queries_concurrent_max The max number of concurrent queries.
prometheus_engine_queries The current number of queries being executed or waiting.
prometheus_engine_query_duration_seconds Query timings
prometheus_engine_query_log_enabled State of the query log.
prometheus_engine_query_log_failures_total The number of query log failures.
prometheus_http_request_duration_seconds Histogram of latencies for HTTP requests.
prometheus_http_requests_total Counter of HTTP requests.
prometheus_http_response_size_bytes Histogram of response size for HTTP requests.
prometheus_notifications_alertmanagers_discovered The number of alertmanagers discovered and active.
prometheus_notifications_dropped_total Total number of alerts dropped due to errors when sending to Alertmanager.
prometheus_notifications_queue_capacity The capacity of the alert notifications queue.
prometheus_notifications_queue_length The number of alert notifications in the queue.
prometheus_remote_storage_highest_timestamp_in_seconds Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch.
prometheus_remote_storage_samples_in_total Samples in to remote storage, compare to samples out for queue managers.
prometheus_remote_storage_string_interner_zero_reference_releases_total The number of times release has been called for strings that are not interned.
prometheus_rule_evaluation_duration_seconds The duration for a rule to execute.
prometheus_rule_evaluation_failures_total The total number of rule evaluation failures.
prometheus_rule_evaluations_total The total number of rule evaluations.
prometheus_rule_group_duration_seconds The duration of rule group evaluations.
prometheus_rule_group_iterations_missed_total The total number of rule group evaluations missed due to slow rule group evaluation.
prometheus_rule_group_iterations_total The total number of scheduled rule group evaluations, whether executed or misse
prometheus_template_text_expansion_failures_total The total number of template text expansion failures.
prometheus_template_text_expansions_total The total number of template text expansions.
prometheus_treecache_watcher_goroutines The current number of watcher goroutines.
prometheus_treecache_zookeeper_failures_total The total number of ZooKeeper failures.
promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.

HTTP指标

度量名 说明
prometheus_http_request_duration_seconds Histogram of latencies for HTTP requests.
prometheus_http_requests_total Counter of HTTP requests.
prometheus_http_response_size_bytes Histogram of response size for HTTP requests.

服务发现指标

度量名 说明
prometheus_sd_consul_rpc_duration_seconds The duration of a Consul RPC call in seconds.
prometheus_sd_consul_rpc_failures_total The number of Consul RPC call failures.
prometheus_sd_discovered_targets Current number of discovered targets.
prometheus_sd_dns_lookup_failures_total The number of DNS-SD lookup failures.
prometheus_sd_dns_lookups_total The number of DNS-SD lookups.
prometheus_sd_failed_configs Current number of service discovery configurations that failed to load.
prometheus_sd_file_read_errors_total The number of File-SD read errors.
prometheus_sd_file_scan_duration_seconds The duration of the File-SD scan in seconds.
prometheus_sd_kubernetes_events_total The number of Kubernetes events handled.
prometheus_sd_received_updates_total Total number of update events received from the SD providers.
prometheus_sd_updates_total Total number of update events sent to the SD consumers.

Target指标

度量名 说明
prometheus_target_metadata_cache_bytes The number of bytes that are currently used for storing metric metadata in the cache
prometheus_target_metadata_cache_entries Total number of metric metadata entries in the cache
prometheus_target_scrape_pool_reloads_failed_total Total number of failed scrape loop reloads.
prometheus_target_scrape_pool_reloads_total Total number of scrape loop reloads.
prometheus_target_scrape_pool_sync_total Total number of syncs that were executed on a scrape pool.
prometheus_target_scrape_pools_failed_total Total number of scrape pool creations that failed.
prometheus_target_scrape_pools_total Total number of scrape pool creation attempts.
prometheus_target_scrapes_cache_flush_forced_total How many times a scrape cache was flushed due to getting big while scrapes are failing.
prometheus_target_scrapes_exceeded_sample_limit_total Total number of scrapes that hit the sample limit and were rejected.
prometheus_target_scrapes_sample_duplicate_timestamp_total Total number of samples rejected due to duplicate timestamps but different values
prometheus_target_scrapes_sample_out_of_bounds_total Total number of samples rejected due to timestamp falling outside of the time bounds
prometheus_target_scrapes_sample_out_of_order_total Total number of samples rejected due to not being out of the expected order
prometheus_target_sync_length_seconds Actual interval to sync the scrape pool.

TSDB 指标

度量名 说明
prometheus_tsdb_blocks_loaded Number of currently loaded data blocks
prometheus_tsdb_checkpoint_creations_failed_total Total number of checkpoint creations that failed.
prometheus_tsdb_checkpoint_creations_total Total number of checkpoint creations attempted.
prometheus_tsdb_checkpoint_deletions_failed_total Total number of checkpoint deletions that failed.
prometheus_tsdb_checkpoint_deletions_total Total number of checkpoint deletions attempted.
prometheus_tsdb_compaction_chunk_range_seconds Final time range of chunks on their first compaction
prometheus_tsdb_compaction_chunk_samples Final number of samples on their first compaction
prometheus_tsdb_compaction_chunk_size_bytes Final size of chunks on their first compaction
prometheus_tsdb_compaction_duration_seconds Duration of compaction runs
prometheus_tsdb_compaction_populating_block Set to 1 when a block is currently being written to the disk.
prometheus_tsdb_compactions_failed_total Total number of compactions that failed for the partition.
prometheus_tsdb_compactions_skipped_total Total number of skipped compactions due to disabled auto compaction.
prometheus_tsdb_compactions_total Total number of compactions that were executed for the partition.
prometheus_tsdb_compactions_triggered_total Total number of triggered compactions for the partition.
prometheus_tsdb_head_active_appenders Number of currently active appender transactions
prometheus_tsdb_head_chunks Total number of chunks in the head block.
prometheus_tsdb_head_chunks_created_total Total number of chunks created in the head
prometheus_tsdb_head_chunks_removed_total Total number of chunks removed in the head
prometheus_tsdb_head_gc_duration_seconds Runtime of garbage collection in the head block.
prometheus_tsdb_head_max_time Maximum timestamp of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_max_time_seconds Maximum timestamp of the head block.
prometheus_tsdb_head_min_time Minimum time bound of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_min_time_seconds Minimum time bound of the head block.
prometheus_tsdb_head_samples_appended_total Total number of appended samples.
prometheus_tsdb_head_series Total number of series in the head block.
prometheus_tsdb_head_series_created_total Total number of series created in the head
prometheus_tsdb_head_series_not_found_total Total number of requests for series that were not found.
prometheus_tsdb_head_series_removed_total Total number of series removed in the head
prometheus_tsdb_head_truncations_failed_total Total number of head truncations that failed.
prometheus_tsdb_head_truncations_total Total number of head truncations attempted.
prometheus_tsdb_lowest_timestamp Lowest timestamp value stored in the database. The unit is decided by the library consumer.
prometheus_tsdb_lowest_timestamp_seconds Lowest timestamp value stored in the database.
prometheus_tsdb_reloads_failures_total Number of times the database failed to reload block data from disk.
prometheus_tsdb_reloads_total Number of times the database reloaded block data from disk.
prometheus_tsdb_retention_limit_bytes Max number of bytes to be retained in the tsdb blocks, configured 0 means disabled
prometheus_tsdb_size_retentions_total The number of times that blocks were deleted because the maximum number of bytes was exceeded.
prometheus_tsdb_storage_blocks_bytes The number of bytes that are currently used for local storage by all blocks.
prometheus_tsdb_symbol_table_size_bytes Size of symbol table on disk (in bytes)
prometheus_tsdb_time_retentions_total The number of times that blocks were deleted because the maximum time limit was exceeded.
prometheus_tsdb_tombstone_cleanup_seconds The time taken to recompact blocks to remove tombstones.
prometheus_tsdb_vertical_compactions_total Total number of compactions done on overlapping blocks.
prometheus_tsdb_wal_completed_pages_total Total number of completed pages.
prometheus_tsdb_wal_corruptions_total Total number of WAL corruptions.
prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.
prometheus_tsdb_wal_page_flushes_total Total number of page flushes.
prometheus_tsdb_wal_segment_current WAL segment index that TSDB is currently writing to.
prometheus_tsdb_wal_truncate_duration_seconds Duration of WAL truncation.
prometheus_tsdb_wal_truncations_failed_total Total number of WAL truncations that failed.
prometheus_tsdb_wal_truncations_total Total number of WAL truncations attempted.
prometheus_tsdb_wal_writes_failed_total Total number of WAL writes that failed.

Grafana指标

Grafana通过配置相应参数,可以暴露其内置的度量数据,并由Prometheus服务抓取后再由Grafana报表系统来展示。 用户可以直接通过http://localhost:3000/metrics 查看

活跃实例指标

度量名 说明
grafana_instance_start_total 启动的实例数量

仪表盘、用户和播放列表指标

度量名 说明
grafana_stat_active_users 活跃用户数
grafana_stat_total_orgs 总的组织数
grafana_stat_total_playlists 总的playlist数量
grafana_stat_total_users 总用户数
grafana_stat_totals_active_admins 总活跃管理员数量
grafana_stat_totals_active_editors 总活跃编辑人员数量
grafana_stat_totals_active_viewers 总活跃浏览人员数量
grafana_stat_totals_admins 总管理员
grafana_stat_totals_annotations 总注释数
grafana_stat_totals_dashboard 总报表数
grafana_stat_totals_dashboard_versions 总报表版本数量
grafana_stat_totals_datasource 总数据源
grafana_stat_totals_editors 总编辑人员
grafana_stat_totals_viewers 总浏览人员
grafana_build_info grafana build信息
grafana_plugin_build_info grafana的插件build信息
grafana_rendering_queue_size 图片渲染队列长度

HTTP 度量

度量名 度量标签 说明
http_request_duration_milliseconds handler method statuscode quantile 平均请求执行时长,如handler=”/“,method=”get”,statuscode=”200”,quantile=”0.5”
http_request_duration_milliseconds_count handler method statuscode 请求次数,如handler=”/*“,method=”get”,statuscode=”200”
http_request_duration_milliseconds_sum handler method statuscode 总请求时长
http_request_total handler method statuscode 总请求数
http_request_in_flight - 正在执行的请

Requests by routing group

度量名 度量标签 说明
grafana_database_queries_duration_seconds_bucket le 数据库平均桶查询时长的分位数 如{le=”0.0001”}
grafana_database_queries_duration_seconds_count - 数据库查询次数
grafana_database_queries_duration_seconds_sum - 数据库查询总时长
grafana_datasource_request_duration_seconds code datasource method quantile} 数据源请求持续时间 如{code=”200”,datasource=”easyops”,method=”get”,quantile=”0.5”}
grafana_datasource_request_duration_seconds_count code datasource method 该数据源请求次数分位数 如{code=”200”,datasource=”easyops”,method=”get”}
grafana_datasource_request_duration_seconds_sum code datasource method 该数据源请求持续总时长
grafana_datasource_request_in_flight datasource 该数据源正在请求数量 如{datasource=”easyops”}
grafana_datasource_request_total code datasource method 数据源请求总数 如{code=”404”,datasource=”easyops”,method=”get”}
grafana_datasource_response_size_bytes datasource quantile 数据源响应请求的字节数分位数 如{datasource=”easyops”,quantile=”0.5”}
grafana_datasource_response_size_bytes_count datasource 数据源响应次数 如{datasource=”easyops”}
grafana_datasource_response_size_bytes_sum datasource 数据源响应总字节数
grafana_db_datasource_query_by_id_total - 通过id获取datasource的查询总数

活跃报警指标

度量名 度量标签 说明
grafana_alerting_active_alerts - 活跃报警数
grafana_alerting_execution_time_milliseconds quantile 报警执行时长的分位数 如{quantile=”0.5”}
grafana_alerting_execution_time_milliseconds_count - 报警次数
grafana_alerting_execution_time_milliseconds_sum - 报警总时长

性能指标

度量名 度量标签 说明
grafana_api_admin_user_created_total - 创建的管理账号数
grafana_api_dashboard_get_milliseconds quantile 获取报表时间分位数 如{quantile=”0.5”}
grafana_api_dashboard_get_milliseconds_count - 获取报表次数
grafana_api_dashboard_get_milliseconds_sum - 获取报表的总时长
grafana_api_dashboard_save_milliseconds quantile 保存报表时长分位数 如{quantile=”0.5”}
grafana_api_dashboard_save_milliseconds_count - 保存报表次数
grafana_api_dashboard_save_milliseconds_sum - 保存报表总时长
grafana_api_dashboard_search_milliseconds quantile 查询报表时间分位数 如{quantile=”0.99”}
grafana_api_dashboard_search_milliseconds_count - 查询报表次数
grafana_api_dashboard_search_milliseconds_sum - 查询报表总时长
grafana_api_dashboard_snapshot_create_total - 创建报表快照总数
grafana_api_dashboard_snapshot_external_total - 外部报表快照总数
grafana_api_dashboard_snapshot_get_total - 获取报表快照总次数
grafana_api_dataproxy_request_all_milliseconds quantile 查询数据代理时长分位数 如{quantile=”0.9”}
grafana_api_dataproxy_request_all_milliseconds_count - 查询数据代理次数
grafana_api_dataproxy_request_all_milliseconds_sum - 查询数据代理总时长
grafana_api_login_oauth_total - 使用oauth登录次数
grafana_api_login_post_total - 使用post登录次数
grafana_api_login_saml_total - 使用saml登录次数
grafana_api_models_dashboard_insert_total - 插入报表次数
grafana_api_org_create_total - 创建org次数
grafana_api_response_status_total code response类型数量 如{code=”200”}
grafana_page_response_status_total code 页面response状态类型数量 如{code=”500”}
grafana_proxy_response_status_total code 代理response状态类型数量 如{code=”404”}
grafana_api_user_signup_completed_total - 完成注册用户数量
grafana_api_user_signup_invite_total - 邀请注册用户数量
grafana_api_user_signup_started_total - 启动注册流程的用户数量
grafana_ldap_users_sync_execution_time quantile LDAP用户同步执行时间的分位数 如{quantile=”0.9”}
grafana_ldap_users_sync_execution_time_count - LDAP用户同步次数
grafana_ldap_users_sync_execution_time_sum - LDAP用户同步执行总时长

HDFS Metrics

指标名称 含义 单位 备注
namenode_rpc_rpc_RpcProcessingTimeAvgTime 8020端口请求处理平均时间 ms
namenode_rpc_rpc_RpcProcessingTimeNumOps 8020端口rpc请求次数
namenode_rpc_rpc_ReceivedBytes 8020端口接收到的数据量 byte
namenode_rpc_rpc_SentBytes 8020端口发送的数据 byte
namenode_rpc_rpc_RpcQueueTimeAvgTime 8020端口平均队列处理时间
namenode_rpc_rpc_RpcQueueTimeNumOps 8020端口rpc请求次数 ms
namenode_rpc_rpc_CallQueueLength 8020端口CallQueueLength
namenode_rpcdetailed_rpcdetailed_GetContentSummaryAvgTime GetContentSummary方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_MkdirsAvgTime Mkdirs方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_DeleteAvgTime Delete方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_CreateAvgTime Create方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_CompleteAvgTime Complete方法平均回调时长 ms
namenode_dfs_namenode_GetBlockLocationsAvgTime GetBlockLocations操作总数 ms
namenode_rpcdetailed_rpcdetailed_GetFileInfoAvgTime GetFileInfo方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_GetListingAvgTime GetListing方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_AddBlockAvgTime AddBlock方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_BlockReportAvgTime BlockReport方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_MonitorHealthAvgTime MonitorHealth方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_BlockReceivedAndDeletedAvgTime BlockReceivedAndDeleted方法平均回调时长 ms
namenode_rpcdetailed_rpcdetailed_AddBlockNumOps AddBlock方法调用次数
namenode_rpcdetailed_rpcdetailed_MkdirsNumOps Mkdirs方法调用次数
namenode_rpcdetailed_rpcdetailed_DeleteNumOps DeleteNum方法调用次数
namenode_rpcdetailed_rpcdetailed_CreateNumOps Create方法调用次数
namenode_rpcdetailed_rpcdetailed_CompleteNumOps Complete方法调用次数
namenode_dfs_namenode_GetBlockLocationsNumOps GetBlockLocations操作总数
namenode_rpcdetailed_rpcdetailed_GetFileInfoNumOps GetFileInfo方法调用次数
namenode_rpcdetailed_rpcdetailed_GetListingNumOps GetListing方法调用次数
namenode_rpcdetailed_rpcdetailed_GetContentSummaryNumOps GetContentSummary方法调用次数
namenode_rpcdetailed_rpcdetailed_BlockReceivedAndDeletedNumOps BlockReceivedAndDeleted方法调用次数
namenode_rpcdetailed_rpcdetailed_BlockReportNumOps BlockReport方法调用次数
namenode_rpcdetailed_rpcdetailed_MonitorHealthNumOps MonitorHealth方法调用次数
namenode_jvm_JvmMetrics_ThreadsBlocked blocked状态的线程数
namenode_jvm_JvmMetrics_ThreadsNew new状态的线程数
namenode_jvm_JvmMetrics_ThreadsRunnable runnable状态的线程数
namenode_jvm_JvmMetrics_ThreadsTerminated terminated状态的线程数
namenode_jvm_JvmMetrics_ThreadsTimedWaiting timedWaiting状态的线程数
namenode_jvm_JvmMetrics_ThreadsWaiting waiting状态的线程数
namenode_jvm_JvmMetrics_LogError error类型的日志数
namenode_jvm_JvmMetrics_LogFatal fatal类型的日志数
namenode_jvm_JvmMetrics_LogInfo info类型的日志数
namenode_jvm_JvmMetrics_LogWarn warn类型的日志数
namenode_jvm_JvmMetrics_MemHeapUsedM 堆已使用的内存大小 Mb
namenode_jvm_JvmMetrics_MemHeapCommittedM 堆committed的内存大小 Mb
namenode_jvm_JvmMetrics_MemHeapMaxM 最大堆内存大小 Mb
namenode_jvm_JvmMetrics_MemNonHeapUsedM 非堆已使用的内存大小 Mb
namenode_jvm_JvmMetrics_MemNonHeapCommittedM 非堆committed的内存大小 Mb
namenode_jvm_JvmMetrics_MemHeapUsedM 最大非堆内存大小 Mb
namenode_jvm_JvmMetrics_GcCountConcurrentMarkSweep CMS GC次数
namenode_jvm_JvmMetrics_GcCountParNew ParNew GC次数 和CMC一起混合GC,主要是年轻代
namenode_jvm_JvmMetrics_GcTimeMillisConcurrentMarkSweep CMS GC时长
namenode_jvm_JvmMetrics_GcTimeMillisParNew ParNew GC时长
namenode_rpc_RetryCache_NameNodeRetryCache_CacheHit Retrycache hit次数
namenode_rpc_RetryCache_NameNodeRetryCache_CacheUpdated Retrycache updated次数
namenode_rpc_RetryCache_NameNodeRetryCache_CacheCleared Retrycache cleared次数
namenode_dfs_FSNamesystem_LastCheckpointTime 上一次checkpoint的时间
namenode_dfs_FSNamesystem_CapacityTotal 当前总容量
namenode_dfs_FSNamesystem_NumLiveDataNodes 存活dn数量
namenode_dfs_FSNamesystem_NumDeadDataNodes Dead dn数量
namenode_dfs_FSNamesystem_VolumeFailuresTotal 坏盘数量
namenode_dfs_FSNamesystem_CapacityUsedNonDFS 当前已使用的NonDFS容量
namenode_dfs_FSNamesystem_BlocksTotal 当前blocks数量
namenode_dfs_FSNamesystem_MissingBlocks 当前missing的block数量
namenode_dfs_FSNamesystem_ExpiredHeartbeats 超时心跳个数
namenode_dfs_FSNamesystem_TransactionsSinceLastLogRoll 上一次edit log roll的transaction个数
namenode_dfs_FSNamesystem_TransactionsSinceLastCheckpoint 上一次checkpoint的总transaction个数
namenode_dfs_FSNamesystem_SnapshottableDirectories Snapshottable目录个数
namenode_dfs_FSNamesystem_TotalLoad 当前总连接数
namenode_dfs_FSNamesystem_FilesTotal 当前文件和目录总数
namenode_dfs_FSNamesystem_StaleDataNodes 心跳超时被标记为stale的datanode个数
namenode_dfs_namenode_CreateFileOps 创建文件操作数
namenode_dfs_namenode_FilesCreated 通过 create 或者 mkdir 操作创建的文件或者目录操作数
namenode_dfs_namenode_FilesAppended File append操作数
namenode_dfs_namenode_GetBlockLocations GetBlockLocations操作数
namenode_dfs_namenode_GetListingOps directory list 操作数
namenode_dfs_namenode_DeleteFileOps 删除文件操作数
namenode_dfs_namenode_FilesDeleted 通过 delete 或者 rename 删除文件或者目录的操作数
namenode_dfs_namenode_FileInfoOps getFileInfo 和 getLinkFileInfo 操作数
namenode_dfs_namenode_FilesRenamed 文件 rename 操作数
namenode_dfs_namenode_GetAdditionalDatanodeOps GetAdditionalDatanode操作数
namenode_dfs_namenode_AddBlockOps HDFS NameNode 添加Block的操作数
datanode_rpc_rpc_NumOpenConnections datanode打开连接数
datanode_rpc_rpc_NumDroppedConnections datanode drop掉的连接数
datanode_rpc_rpc_RpcProcessingTimeAvgTime datanode请求处理平均时间 ms
datanode_rpc_rpc_RpcProcessingTimeNumOps datanode请求次数
datanode_rpc_rpc_CallQueueLength datanode的callQueueLength
datanode_rpc_rpc_ReceivedBytes datanode接收到的数据量 byte
datanode_rpc_rpc_SentBytes datanode发送的数据量 byte
datanode_rpc_rpc_DeferredRpcProcessingTimeAvgTime ms
datanode_rpc_rpc_RpcQueueTimeAvgTime datanode的平均队列处理时间 ms
datanode_rpc_rpc_DeferredRpcProcessingTimeNumOps
datanode_jvm_JvmMetrics_MemHeapCommittedM datanode堆committed的内存大小 Mb
datanode_jvm_JvmMetrics_MemHeapMaxM datanode堆最大内存 Mb
datanode_jvm_JvmMetrics_MemHeapUsedM datanode堆已使用内存 Mb
datanode_jvm_JvmMetrics_MemNonHeapCommittedM datanode非堆committed的内存大小 Mb
datanode_jvm_JvmMetrics_MemNonHeapMaxM datanode非堆最大内存 Mb
datanode_jvm_JvmMetrics_MemNonHeapUsedM datanode非堆已使用内存 Mb
datanode_jvm_JvmMetrics_ThreadsBlocked blocked状态的线程数量
datanode_jvm_JvmMetrics_ThreadsNew new状态的线程数量
datanode_jvm_JvmMetrics_ThreadsRunnable runnable状态的线程数量
datanode_jvm_JvmMetrics_ThreadsTerminated terminated状态的线程数量
datanode_jvm_JvmMetrics_ThreadsTimedWaiting timedWaiting状态的线程数量
datanode_jvm_JvmMetrics_ThreadsWaiting waiting状态的线程数量
datanode_jvm_JvmMetrics_LogError error类型的日志数
datanode_jvm_JvmMetrics_LogFatal fatal类型的日志数
datanode_jvm_JvmMetrics_LogInfo info类型的日志数
datanode_jvm_JvmMetrics_LogWarn warn类型的日志数
datanode_dfs_datanode_BlockChecksumOpNumOps BlockChecksum操作数
datanode_dfs_datanode_BlockReportsNumOps BlockReport操作数
datanode_dfs_datanode_CopyBlockOpNumOps Block copy操作数
datanode_dfs_datanode_IncrementalBlockReportsNumOps 增量block报告操作数
datanode_dfs_datanode_ReadBlockOpNumOps 读操作数
datanode_dfs_datanode_ReplaceBlockOpNumOps Block replace操作数
datanode_dfs_datanode_WriteBlockOpNumOps 写操作数
datanode_dfs_datanode_BlockChecksumOpAvgTime BlockCheckSum操作的平均时间 ms
datanode_dfs_datanode_BlockReportsAvgTime BlockReport操作的平均时间 ms
datanode_dfs_datanode_CopyBlockOpAvgTime Block copy操作的平均时间 ms
datanode_dfs_datanode_WriteBlockOpAvgTime 写操作的平均时间 ms
datanode_dfs_datanode_ReadBlockOpAvgTime 读操作的平均时间 ms
datanode_dfs_datanode_ReplaceBlockOpAvgTime Block replace操作的平均时间 ms
datanode_dfs_datanode_IncrementalBlockReportsAvgTime 增量block报告操作的平均时长 ms
datanode_dfs_FsVolume_DataFileIoRateNumOps 间隔时间内数据文件io操作次数
datanode_dfs_FsVolume_FileIoErrorRateNumOps 间隔时间内文件io错误操作次数
datanode_dfs_FsVolume_FlushIoRateNumOps 间隔时间内文件刷新io操作次数
datanode_dfs_FsVolume_MetadataOperationRateNumOps 间隔时间内元数据操作次数
datanode_dfs_FsVolume_ReadIoRateNumOps 间隔时间内文件读取操作次数
datanode_dfs_FsVolume_SyncIoRateNumOps 间隔时间内文件同步操作次数
datanode_dfs_FsVolume_WriteIoRateNumOps 间隔时间内写入文件操作次数
datanode_dfs_FsVolume_DataFileIoRateAvgTime 数据文件操作的平均时长 ms
datanode_dfs_FsVolume_FileIoErrorRateAvgTime 操作开始到发生故障的平均时长 ms
datanode_dfs_FsVolume_FlushIoRateAvgTime 文件刷新io操作的平均时长 ms
datanode_dfs_FsVolume_MetadataOperationRateAvgTime 元数据操作的平均时长 ms
datanode_dfs_FsVolume_ReadIoRateAvgTime 文件读取操作的平均时长 ms
datanode_dfs_FsVolume_SyncIoRateAvgTime 文件同步操作的平均时长 ms
datanode_dfs_FsVolume_WriteIoRateAvgTime 文件写入操作的平均时长 ms
datanode_ugi_UgiMetrics_LoginSuccessNumOps kerberos成功登陆总数
datanode_ugi_UgiMetrics_LoginFailureNumOps kerberos失败登陆总数
datanode_ugi_UgiMetrics_LoginSuccessAvgTime kerberos成功登陆平均时长 ms
datanode_ugi_UgiMetrics_LoginFailureAvgTime kerberos失败登陆平均时长 ms

YARN Metrics

https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-common/Metrics.html

指标名称 含义 单位 备注
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumActiveNMs active状态的nm数量
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumUnhealthyNMs unhealth状态的nm数量
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumShutdownNMs shutdown状态的nm数量
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumRebootedNMs rebooted状态的nm数量
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumLostNMs lost状态的nm数量
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumDecommissioningNMs decommissioning状态的nm数量
resourcemanager_yarn_QueueMetrics_AvailableVCores 当前队列当前可用的VCore数量
resourcemanager_yarn_QueueMetrics_AvailableMB 当前队列当前可用的内存大小
resourcemanager_yarn_QueueMetrics_AbsoluteUsedCapacity 当前队列资源使用率
resourcemanager_yarn_QueueMetrics_AllocatedVCores 当前队列分配的VCore数量
resourcemanager_yarn_QueueMetrics_PendingVCores 当前队列pending的VCore数量
resourcemanager_yarn_QueueMetrics_AllocatedMB 当前队列分配的内存大小
resourcemanager_yarn_QueueMetrics_PendingMB 当前队列pending的内存大小
resourcemanager_yarn_QueueMetrics_AppsRunning 当前队列running状态的任务数
resourcemanager_yarn_QueueMetrics_AppsFailed 当前队列failed状态的任务数
resourcemanager_yarn_QueueMetrics_AppsPending 当前队列pending状态的任务数
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayNumOps AM启动总数
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayNumOps AM注册总数
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayAvgTime AM启动平均耗时 ms
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayAvgTime AM注册平均耗时 ms
resourcemanager_jvm_JvmMetrics_GcCount RM的GC总次数
resourcemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded RM超过GC Info的阈值次数
resourcemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded RM超过GC Warn的阈值次数
resourcemanager_jvm_JvmMetrics_GcTotalExtraSleepTime GC额外的休眠时间
resourcemanager_jvm_JvmMetrics_GcTimeMillis GC时长
resourcemanager_jvm_JvmMetrics_LogError error状态的日志数量
resourcemanager_jvm_JvmMetrics_LogFatal fatal状态的日志数量
resourcemanager_jvm_JvmMetrics_LogWarn warn状态的日志数量
resourcemanager_jvm_JvmMetrics_ThreadsWaiting waiting状态的线程数
resourcemanager_jvm_JvmMetrics_ThreadsRunnable runnable状态的线程数
resourcemanager_jvm_JvmMetrics_ThreadsTimedWaiting timedwaiting状态的线程数
resourcemanager_jvm_JvmMetrics_ThreadsBlocked blocked状态的线程数
resourcemanager_jvm_JvmMetrics_MemHeapCommittedM 堆committed的内存大小
resourcemanager_jvm_JvmMetrics_MemHeapMaxM 最大堆大小
resourcemanager_jvm_JvmMetrics_MemHeapUsedM 已使用堆大小
resourcemanager_rpc_rpc_CallQueueLength RPC Call队列长度
resourcemanager_rpc_rpc_NumOpenConnections 当前打开连接数
resourcemanager_rpc_rpc_NumDroppedConnections 当前丢弃连接数
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps RPC调用总延迟次数
resourcemanager_rpc_rpc_RpcProcessingTimeNumOps RPC调用总次数
resourcemanager_rpc_rpc_RpcQueueTimeNumOps 同上
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime RPC调用延迟平均时长
resourcemanager_rpc_rpc_RpcProcessingTimeAvgTime RPC处理平均时长
resourcemanager_rpc_rpc_RpcQueueTimeAvgTime RPC队列平均耗时
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportNumOps GetApplicationReport方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusNumOps GetServiceStatus方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthNumOps MonitorHealth方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatNumOps NodeHeartbeat方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerNumOps RegisterNodeManager方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveNumOps TransitionToActive方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyNumOps TransitionToStandby方法调用次数
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportAvgTime GetApplicationReport方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusAvgTime GetServiceStatus方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthAvgTime MonitorHealth方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatAvgTime NodeHeartbeat方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerAvgTime RegisterNodeManager方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveAvgTime TransitionToActive方法平均回调时长
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyAvgTime TransitionToStandby方法平均回调时长
resourcemanager_ugi_UgiMetrics_LoginSuccessNumOps kerberos成功登陆总数
resourcemanager_ugi_UgiMetrics_LoginFailureNumOps kerberos失败登陆总数
resourcemanager_ugi_UgiMetrics_LoginSuccessAvgTime kerberos成功登陆平均时长
resourcemanager_ugi_UgiMetrics_LoginFailureAvgTime kerberos失败登陆平均时长
nodemanager_yarn_NodeManagerMetrics_ContainersRunning 当前running状态的container数量
nodemanager_yarn_NodeManagerMetrics_ContainersFailed failed状态的container总数
nodemanager_yarn_NodeManagerMetrics_ContainersKilled 被kill的container总数
nodemanager_yarn_NodeManagerMetrics_ContainersLaunched 启动的container总数
nodemanager_yarn_NodeManagerMetrics_ContainersCompleted completed状态的container总数
nodemanager_yarn_NodeManagerMetrics_AllocatedGB 当前分配的内存大小 GB
nodemanager_yarn_NodeManagerMetrics_AllocatedContainers 当前分配的container数量
nodemanager_yarn_NodeManagerMetrics_AllocatedVCores 当前分配的VCore数量
nodemanager_yarn_NodeManagerMetrics_AvailableVCores 当前可用的VCore数量
nodemanager_yarn_NodeManagerMetrics_AvailableGB 当前可用的内存大小 GB
nodemanager_jvm_JvmMetrics_GcCount GC次数
nodemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded 超过GC Info的阈值次数
nodemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded 超过GC Warn的阈值次数
nodemanager_jvm_JvmMetrics_GcTotalExtraSleepTime GC额外的休眠时间
nodemanager_jvm_JvmMetrics_GcTimeMillis GC时长
nodemanager_jvm_JvmMetrics_LogError error状态的日志数量
nodemanager_jvm_JvmMetrics_LogFatal fatal状态的日志数量
nodemanager_jvm_JvmMetrics_LogWarn warn状态的日志数量
nodemanager_jvm_JvmMetrics_ThreadsBlocked blocked状态的线程数
nodemanager_jvm_JvmMetrics_ThreadsWaiting waiting状态的线程数
nodemanager_jvm_JvmMetrics_ThreadsRunnable runnable状态的线程数
nodemanager_jvm_JvmMetrics_ThreadsTimedWaiting timedwaiting状态的线程数
nodemanager_jvm_JvmMetrics_MemHeapCommittedM 堆committed的内存大小
nodemanager_jvm_JvmMetrics_MemHeapMaxM 最大堆大小
nodemanager_jvm_JvmMetrics_MemHeapUsedM 已使用堆大小
nodemanager_rpc_rpc_NumOpenConnections 当前打开连接数
nodemanager_rpc_rpc_NumDroppedConnections 当前丢弃连接数
nodemanager_rpc_rpc_CallQueueLength RPC Call队列长度
nodemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps RPC调用总延迟次数
nodemanager_rpc_rpc_RpcProcessingTimeNumOps RPC调用总次数
nodemanager_rpc_rpc_RpcQueueTimeNumOps 同上
nodemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime RPC调用延迟平均时长
nodemanager_rpc_rpc_RpcProcessingTimeAvgTime RPC处理平均时长
nodemanager_rpc_rpc_RpcQueueTimeAvgTime RPC队列平均耗时
nodemanager_ugi_UgiMetrics_LoginFailureNumOps kerberos失败登陆总数
nodemanager_ugi_UgiMetrics_LoginSuccessNumOps kerberos成功登陆总数
nodemanager_ugi_UgiMetrics_LoginFailureAvgTime kerberos失败登陆平均时长
nodemanager_ugi_UgiMetrics_LoginSuccessAvgTime kerberos成功登陆平均时长

Hive Metrics

指标名称 含义 单位 备注
hivemetastore_open_connections_count 当前打开连接数
hivemetastore_threads_blocked_count_value blocked状态线程数
hivemetastore_threads_runnable_count_value runnable状态线程数
hivemetastore_threads_waiting_count_value waiting状态线程数
hivemetastore_threads_timed_waiting_count_value timedwaiting状态线程数
hivemetastore_threads_deadlock_count_value deadlock状态线程数
hivemetastore_memory_heap_max_value 最大堆大小
hivemetastore_memory_heap_used_value 已使用堆大小
hivemetastore_memory_heap_init_value 初始化堆大小
hivemetastore_memory_pools_CMS_Old_Gen_usage_value 老年代内存使用率
hivemetastore_memory_pools_Par_Eden_Space_usage_value 年轻代内存使用率
hivemetastore_gc_ParNew_count_value 年轻代GC总次数
hivemetastore_gc_ConcurrentMarkSweep_count_value 老年代GC总次数
hivemetastore_gc_ParNew_time_value 年轻代GC总时长
hivemetastore_gc_ConcurrentMarkSweep_time_value 老年代GC总时长
hivemetastore_buffers_direct_count_value 直接内存的buffer数量 hive的nio相关指标
hivemetastore_buffers_mapped_count_value
hivemetastore_buffers_direct_capacity_value
hivemetastore_buffers_direct_used_value
hivemetastore_buffers_mapped_capacity_value
hivemetastore_buffers_mapped_used_value
hivemetastoreapi.*_names_max 调用对应api最大耗时 .*为具体api名称
hivemetastoreapi.*_names_min 调用对应api最小耗时
hivemetastoreapi.*_names_mean 调用对应api平均耗时
hivemetastoreapi.*_names_stddev 调用对应api耗时标准差
hivemetastoreapi.*_names_p95 调用对应api耗时p95
hivemetastoreapi.*_names_p98 调用对应api耗时p98
hivemetastoreapi.*_names_p99 调用对应api耗时p99
hivemetastoreapi.*_names_p999 调用对应api耗时p999
hivemetastoreapi.*_names_count 调用对应api总次数
hivemetastoreapi.*_names_m15_rate 调用对应api平均每秒次数(15分钟内)
hivemetastoreapi.*_names_m1_rate 调用对应api平均每秒次数(1分钟内)
hivemetastoreapi.*_names_m5_rate 调用对应api平均每秒次数(5分钟内)
hivemetastoreapi.*_names_mean_rate 调用对应api平均每秒次数
hiveserver2_cumulative_connection_count_count 累计连接数
hiveserver2_jvm_pause_extraSleepTime_count GC额外的休眠时间
hiveserver2_open_connections_count 当前打开连接数
hiveserver2_open_operations_count 当前操作数
hiveserver2_gc_ConcurrentMarkSweep_count_value 老年代GC总次数
hiveserver2_gc_ParNew_count_value 年轻代GC总次数
hiveserver2_gc_ConcurrentMarkSweep_time_value 老年代GC总耗时
hiveserver2_gc_ParNew_time_value 年轻代GC总耗时
hiveserver2_buffers_direct_count_value
hiveserver2_buffers_mapped_count_value
hiveserver2_buffers_direct_used_value
hiveserver2_buffers_direct_capacity_value
hiveserver2_buffers_mapped_used_value
hiveserver2_buffers_mapped_capacity_value
hiveserver2_memory_heap_max_value 最大堆大小
hiveserver2_memory_heap_used_value 已使用堆大小
hiveserver2_memory_heap_init_value 初始化堆大小
hiveserver2_memory_pools_Par_Eden_Space_usage_value Eden区内存使用率
hiveserver2_memory_pools_Par_Survivor_Space_usage_value Survivor区内存使用率
hiveserver2_memory_pools_CMS_Old_Gen_usage_value 老年代内存使用率
hiveserver2_threads_waiting_count_value waiting状态线程数
hiveserver2_threads_timed_waiting_count_value timedwaiting状态线程数
hiveserver2_threads_blocked_count_value blocked状态线程数
hiveserver2_threads_runnable_count_value runnable状态线程数
hiveserver2api.*_max 调用对应api最大耗时 .*为具体api名称
hiveserver2api.*_min 调用对应api最小耗时
hiveserver2api.*_mean 调用对应api平均耗时
hiveserver2api.*_stddev 调用对应api耗时标准差
hiveserver2api.*_p95 调用对应api耗时p95
hiveserver2api.*_p98 调用对应api耗时p98
hiveserver2api.*_p99 调用对应api耗时p99
hiveserver2api.*_p999 调用对应api耗时p999
hiveserver2api.*_m15_rate 调用对应api平均每秒次数(15分钟内)
hiveserver2api.*_m1_rate 调用对应api平均每秒次数(1分钟内)
hiveserver2api.*_m5_rate 调用对应api平均每秒次数(5分钟内)
hiveserver2api.*_mean_rate 调用对应api平均每秒次数

Kerberos Metrics

指标名称 含义 单位 备注
procstat_cpu_usage cpu使用率
procstat_memory_usage 内存使用率

Kyuubi Metrics

指标名称 含义 单位 备注
error_queries_count error状态的查询数量
running_queries_count running状态的查询数量
open_connections_count 当前打开连接数
open_operations_count 当前操作数
direct_used_value nio相关
direct_capacity_value
direct_count_value
mapped_used_value
mapped_capacity_value
mapped_count_value
heap_used_value 已使用堆大小
heap_committed_value 堆committed的内存大小
heap_max_value 最大堆大小
ParNew_count_value 年轻代GC总次数
ConcurrentMarkSweep_count_value 老年代GC总次数
ConcurrentMarkSweep_time_value 老年代GC总时长
ParNew_time_value 年轻代GC总时长

Ldap Metrics

指标名称 含义 单位 备注
openldapoperations。*_completed 当前操作总完成数 。*为具体操作
openldap_statistics_bytes server端发送数据的一些统计数据
openldap_statistics_entries server端发送数据的一些统计数据
openldap_statistics_pdu server端发送数据的一些统计数据
openldap_statistics_referrals server端发送数据的一些统计数据
openldap_threads_active active状态线程数
openldap_threads_backload backload状态线程数
openldap_threads_max 最大线程数
openldap_threads_max_pending 最大pending状态线程数
openldap_threads_open open状态线程数
openldap_threads_pending pending状态线程数
openldap_threads_starting starting状态线程数
openldap_waiters_read 当前 read waiters数量
openldap_waiters_write 当前write waiters数量
openldap_connections_max_file_descriptors 最大连接数
openldap_connections_current 当前连接数
openldap_connections_total 总连接数

Nginx Metrics

指标名称 含义 单位 备注
nginx_accepts accept状态连接数
nginx_handled handled状态连接数
nginx_requests request状态连接数
nginx_active active状态连接数
nginx_reading reading状态连接数
nginx_writing writing状态连接数
nginx_waiting waiting状态连接数

Mysql Metrics

指标名称 含义 单位 备注
mysql_global_status_queries 查询总数
mysql_global_variables_innodb_buffer_pool_size innodb的buffer pool大小
mysql_global_status_threads_connected 当前连接数
mysql_global_status_max_used_connections 历史最大连接数
mysql_global_variables_max_connections 允许最大连接数
mysql_global_status_questions 查询总数 不包含存储过程里面执行的查询语句
mysql_global_variables_thread_cache_size Thread cache大小
mysql_global_status_threads_cached Thread cache中线程数
mysql_global_status_threads_created 创建处理连接的线程数
mysql_global_status_created_tmp_tables 临时表创建数量
mysql_global_status_created_tmp_disk_tables 临时表创建数量(on-disk)
mysql_global_status_created_tmp_files 临时文件创建数量
mysql_global_status_select_full_join 无索引进行联合查询的连接数
mysql_global_status_select_full_range_join 目标表使用范围查询的连接数
mysql_global_status_select_range 第一张表使用范围查询的连接数
mysql_global_status_select_range_check
mysql_global_status_select_scan 对第一张表进行完全扫描的连接数
mysql_global_status_sort_rows 已排序行数
mysql_global_status_sort_range 在范围内执行排序数量
mysql_global_status_sort_merge_passes 排序算法已执行的合并数量
mysql_global_status_sort_scan 通过扫描表完成的排序数量
mysql_global_status_slow_queries 慢查询数量 以long_query_time为阈值
mysql_global_status_aborted_connects 尝试连接mysql而失败的连接数
mysql_global_status_aborted_clients 客户端没有正确关闭连接导致中断的连接数
mysql_global_status_table_locks_immediate 立即获得表的锁的次数
mysql_global_status_table_locks_waited 不能立即获得表的锁的次数
mysql_global_status_bytes_received 从所有客户端接收到的字节数
mysql_global_status_bytes_sent 发送给所有客户端的字节数
mysql_global_status_innodb_page_size innodb页大小
mysql_global_status_buffer_pool_pages 总数据页数
mysql_global_variables_innodb_additional_mem_pool_size 内存池大小 用来存储数据字典和其他数据结构
mysql_global_status_innodb_mem_dictionary 数据字典大小
mysql_global_variables_key_buffer_size 索引缓冲区大小
mysql_global_variables_query_cache_size 查询缓存大小
mysql_global_status_commands_total 命令总数
mysql_global_status_handlers_total 处理总数
mysql_global_status_qcache_free_memory 用于查询缓存的free内存大小
mysql_global_status_qcache_hits 查询缓存命中次数
mysql_global_status_qcache_inserts 加入到缓存的查询次数
mysql_global_status_qcache_not_cached 非缓存查询数
mysql_global_status_qcache_lowmem_prunes 由于内存较少从缓存删除的查询数量
mysql_global_status_qcache_queries_in_cache 登记到缓存内的查询数量
mysql_global_status_opened_files 文件打开数
mysql_global_status_open_files 当前打开的文件数
mysql_global_variables_open_files_limit 文件打开数限制
mysql_global_status_innodb_num_open_files innodb当前打开文件数
mysql_global_status_opened_tables 被打开的表数量
mysql_global_status_table_open_cache_hits 从表缓存中获取表数量
mysql_global_status_table_open_cache_misses 非表缓存打开表次数
mysql_global_status_table_open_cache_overflows 超过参数table_open_cache后被淘汰的表实例数
mysql_global_status_open_tables 当前打开表数量
mysql_global_variables_table_open_cache 表描述符缓存大小
mysql_global_status_open_table_definitions 缓存的表描述符数量
mysql_global_variables_table_definition_cache 表描述符缓存大小
mysql_global_status_opened_table_definitions 已经被缓存的表描述符数量

Redis Metrics

指标名称 含义 单位 备注
redis_connected_clients 客户端连接数
redis_commands_processed_total 处理命令总数
redis_keyspace_hits_total 键命中数
redis_keyspace_misses_total 键未命中数
redis_memory_used_bytes redis已使用内存
redis_memory_max_bytes redis最大内存
redis_net_input_bytes_total input总流量
redis_net_output_bytes_total output总流量
redis_db_keys 总key数量
redis_db_keys_expiring 过期key数量
redis_evicted_keys_total 删除的key数量

Node Metrics

指标名称 含义 单位 备注
node_filesystem_size_bytes 挂载盘大小
node_filesystem_avail_bytes 挂载盘可用空间大小
node_filesystem_free_bytes 挂载盘剩余空间
node_disk_reads_merged_total 每个磁盘分区合并读完成次数
node_disk_writes_merged_total 每个磁盘分区合并写完成次数
node_disk_reads_completed_total 每个磁盘分区读完成次数
node_disk_writes_completed_total 每个磁盘分区写完成次数
node_disk_read_bytes_total 每个磁盘分区读的byte数
node_disk_written_bytes_total 每个磁盘分区写的byte数
node_disk_io_time_seconds_total 每个磁盘分区花费在io的总耗时
node_cpu_seconds_total 单个cpu的使用时长
node_memory_MemAvailable_bytes 当前节点可用内存
node_memory_MemTotal_bytes 当前节点内存大小
node_memory_SwapFree_bytes 当前节点空闲交换区大小
node_memory_SwapTotal_bytes 当前节点交换区总大小
node_load1 1分钟内cpu平均负载
node_load5 5分钟内cpu平均负载
node_load15 15分钟内cpu平均负载
node_filefd_allocated 当前节点正在使用的文件描述符数量
node_context_switches_total Context switch总次数
node_network_receive_bytes_total 网卡入网流量
node_network_transmit_bytes_total 网卡出网流量
node_disk_io_now 每个磁盘分区正在处理的io数
node_netstat_Tcp_CurrEstab 当前状态为estab的tcp连接数
node_sockstat_TCP_tw 等待关闭的tcp连接数
node_sockstat_sockets_used 已使用所有协议的套接字总数
node_sockstat_UDP_inuse 正在使用的udp套接字数量
node_sockstat_TCP_alloc 已分配的tcp套接字数量

中台组件 Metrics

指标名称 含义 单位 备注
jvm_memory_bytes_used 堆已使用内存大小
jvm_memory_used_bytes 堆已使用内存大小
jvm_memory_bytes_committed 堆committed的内存大小
jvm_memory_committed_bytes 堆committed的内存大小
jvm_memory_bytes_max 堆最大内存大小
jvm_memory_max_bytes 堆最大内存大小
jvm_threads_current 当前线程数
jvm_threads_live live状态线程数
jvm_threads_live_threads live状态线程数
jvm_threads_daemon daemon状态线程数
jvm_threads_daemon_threads daemon状态线程数
jvm_threads_peak peak状态线程数
jvm_threads_peak_threads peak状态线程数
jvm_threads_deadlocked deadlocked状态线程数
jvm_threads_deadlocked_threads deadlocked状态线程数
jvm_memory_pool_bytes_used jvm内存池使用大小
jvm_memory_pool_bytes_committed jvm内存池committed大小
jvm_memory_pool_bytes_max jvm内存池最大大小
jvm_gc_collection_seconds_count gc总次数
jvm_gc_pause_seconds_count gc总次数
jvm_gc_collection_seconds_sum gc总时长
jvm_gc_pause_seconds_sum gc总时长
jvm_gc_memory_allocated_bytes_total 在一个GC之后到下一个GC之前增加年轻代内存池的大小
jvm_gc_memory_promoted_bytes_total GC之前到GC之后,老年代的大小正向增加的计数
jvm_buffer_pool_used_bytes 已使用缓冲池大小
jvm_buffer_pool_capacity_bytes 缓冲池容量
jvm_buffer_pool_used_buffers 已使用缓冲池数量
jvm_buffer_memory_used_bytes 已使用缓冲池大小
jvm_buffer_total_capacity_bytes 缓冲池容量
jvm_buffer_count_buffers 缓冲池数量
tomcat_global_error_total tomcat全局错误总计
tomcat_threads_config_max_threads tomcat设置最大线程数
tomcat_sessions_active_current tomcat当前active状态线程数
tomcat_sessions_active_current_sessions tomcat当前active状态线程数
tomcat_global_sent_bytes_total tomcat全局发送字节数总计
tomcat_global_received_bytes_total tomcat全局接收字节数总计
http_server_requests_seconds_count http请求次数
tomcat_threads_current tomcat当前线程数
tomcat_threads_current_threads tomcat当前线程数
tomcat_threads_busy tomcat当前busy状态线程数
tomcat_threads_busy_threads tomcat当前busy状态线程数
http_server_requests_seconds_sum http请求总耗时
jetty_threads_busy jetty当前busy状态线程数
jetty_threads_current jetty当前线程数
jetty_threads_config_max jetty设置最大线程数
system_cpu_usage 系统cpu使用率
process_cpu_usage 当前进程cpu使用率
system_load_average_1m 系统1分钟平均cpu负载
jvm_threads_states_threads states状态线程数
logback_events_total 各种状态日志数
process_files_open_files 当前进程开启文件描述符数
process_files_max_files 当前进程最大可开启文件描述符数
hikaricp_connections_max hikaricp连接池最大连接数
hikaricp_connections_min hikaricp连接池最小连接数
hikaricp_connections_timeout_total hikaricp连接池总超时时长
hikaricp_connections_active hikaricp连接池active状态连接数
hikaricp_connections_idle hikaricp连接池idle状态连接数
hikaricp_connections_pending hikaricp连接池pending状态连接数
hikaricp_connections_usage_seconds_sum hikaricp连接池连接使用总耗时
hikaricp_connections_usage_seconds_count hikaricp连接池连接使用总次数
hikaricp_connections_creation_seconds_sum hikaricp连接池连接创建总耗时
hikaricp_connections_creation_seconds_count hikaricp连接池连接创建总次数
hikaricp_connections_acquire_seconds_sum hikaricp连接池连接acquire总耗时
hikaricp_connections_acquire_seconds_count hikaricp连接池连接acquire总次数
exception_count_total 各类异常总数