5.3 指标说明
更新时间: 2023-02-03 18:39:42
阅读 109
指标说明
prometheus采集指标
Prometheus服务通过/metrics路径暴露了其内置的度量数据,再由Grafana报表系统来展示。 用户可以直接通过http://localhost:9090/metrics 查看
进程指标
度量名 | 说明 |
---|---|
process_cpu_seconds_total | Total user and system CPU time spent in seconds. |
process_max_fds | Maximum number of open file descriptors. |
process_open_fds | Number of open file descriptors. |
process_resident_memory_bytes | Resident memory size in bytes. |
process_start_time_seconds | Start time of the process since unix epoch in seconds. |
process_virtual_memory_bytes | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | Maximum amount of virtual memory available in bytes. |
网络指标
度量名 | 说明 |
---|---|
net_conntrack_dialer_conn_attempted_total | Total number of connections attempted by the given dialer a given name. |
net_conntrack_dialer_conn_closed_total | Total number of connections closed which originated from the dialer of a given name. |
net_conntrack_dialer_conn_established_total | Total number of connections successfully established by the given dialer a given name. |
net_conntrack_dialer_conn_failed_total | Total number of connections failed to dial by the dialer a given name. |
net_conntrack_listener_conn_accepted_total | Total number of connections opened to the listener of a given name. |
net_conntrack_listener_conn_closed_total | Total number of connections closed that were made to the listener of a given name. |
通用指标
度量名 | 说明 |
---|---|
prometheus_api_remote_read_queries | The current number of remote read queries being executed or waiting. |
prometheus_build_info | A metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which prometheus was built. |
prometheus_config_last_reload_success_timestamp_seconds | Timestamp of the last successful configuration reload. |
prometheus_config_last_reload_successful | Whether the last configuration reload attempt was successful. |
prometheus_engine_queries_concurrent_max | The max number of concurrent queries. |
prometheus_engine_queries | The current number of queries being executed or waiting. |
prometheus_engine_query_duration_seconds | Query timings |
prometheus_engine_query_log_enabled | State of the query log. |
prometheus_engine_query_log_failures_total | The number of query log failures. |
prometheus_http_request_duration_seconds | Histogram of latencies for HTTP requests. |
prometheus_http_requests_total | Counter of HTTP requests. |
prometheus_http_response_size_bytes | Histogram of response size for HTTP requests. |
prometheus_notifications_alertmanagers_discovered | The number of alertmanagers discovered and active. |
prometheus_notifications_dropped_total | Total number of alerts dropped due to errors when sending to Alertmanager. |
prometheus_notifications_queue_capacity | The capacity of the alert notifications queue. |
prometheus_notifications_queue_length | The number of alert notifications in the queue. |
prometheus_remote_storage_highest_timestamp_in_seconds | Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch. |
prometheus_remote_storage_samples_in_total | Samples in to remote storage, compare to samples out for queue managers. |
prometheus_remote_storage_string_interner_zero_reference_releases_total | The number of times release has been called for strings that are not interned. |
prometheus_rule_evaluation_duration_seconds | The duration for a rule to execute. |
prometheus_rule_evaluation_failures_total | The total number of rule evaluation failures. |
prometheus_rule_evaluations_total | The total number of rule evaluations. |
prometheus_rule_group_duration_seconds | The duration of rule group evaluations. |
prometheus_rule_group_iterations_missed_total | The total number of rule group evaluations missed due to slow rule group evaluation. |
prometheus_rule_group_iterations_total | The total number of scheduled rule group evaluations, whether executed or misse |
prometheus_template_text_expansion_failures_total | The total number of template text expansion failures. |
prometheus_template_text_expansions_total | The total number of template text expansions. |
prometheus_treecache_watcher_goroutines | The current number of watcher goroutines. |
prometheus_treecache_zookeeper_failures_total | The total number of ZooKeeper failures. |
promhttp_metric_handler_requests_in_flight | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | Total number of scrapes by HTTP status code. |
HTTP指标
度量名 | 说明 |
---|---|
prometheus_http_request_duration_seconds | Histogram of latencies for HTTP requests. |
prometheus_http_requests_total | Counter of HTTP requests. |
prometheus_http_response_size_bytes | Histogram of response size for HTTP requests. |
服务发现指标
度量名 | 说明 |
---|---|
prometheus_sd_consul_rpc_duration_seconds | The duration of a Consul RPC call in seconds. |
prometheus_sd_consul_rpc_failures_total | The number of Consul RPC call failures. |
prometheus_sd_discovered_targets | Current number of discovered targets. |
prometheus_sd_dns_lookup_failures_total | The number of DNS-SD lookup failures. |
prometheus_sd_dns_lookups_total | The number of DNS-SD lookups. |
prometheus_sd_failed_configs | Current number of service discovery configurations that failed to load. |
prometheus_sd_file_read_errors_total | The number of File-SD read errors. |
prometheus_sd_file_scan_duration_seconds | The duration of the File-SD scan in seconds. |
prometheus_sd_kubernetes_events_total | The number of Kubernetes events handled. |
prometheus_sd_received_updates_total | Total number of update events received from the SD providers. |
prometheus_sd_updates_total | Total number of update events sent to the SD consumers. |
Target指标
度量名 | 说明 |
---|---|
prometheus_target_metadata_cache_bytes | The number of bytes that are currently used for storing metric metadata in the cache |
prometheus_target_metadata_cache_entries | Total number of metric metadata entries in the cache |
prometheus_target_scrape_pool_reloads_failed_total | Total number of failed scrape loop reloads. |
prometheus_target_scrape_pool_reloads_total | Total number of scrape loop reloads. |
prometheus_target_scrape_pool_sync_total | Total number of syncs that were executed on a scrape pool. |
prometheus_target_scrape_pools_failed_total | Total number of scrape pool creations that failed. |
prometheus_target_scrape_pools_total | Total number of scrape pool creation attempts. |
prometheus_target_scrapes_cache_flush_forced_total | How many times a scrape cache was flushed due to getting big while scrapes are failing. |
prometheus_target_scrapes_exceeded_sample_limit_total | Total number of scrapes that hit the sample limit and were rejected. |
prometheus_target_scrapes_sample_duplicate_timestamp_total | Total number of samples rejected due to duplicate timestamps but different values |
prometheus_target_scrapes_sample_out_of_bounds_total | Total number of samples rejected due to timestamp falling outside of the time bounds |
prometheus_target_scrapes_sample_out_of_order_total | Total number of samples rejected due to not being out of the expected order |
prometheus_target_sync_length_seconds | Actual interval to sync the scrape pool. |
TSDB 指标
度量名 | 说明 |
---|---|
prometheus_tsdb_blocks_loaded | Number of currently loaded data blocks |
prometheus_tsdb_checkpoint_creations_failed_total | Total number of checkpoint creations that failed. |
prometheus_tsdb_checkpoint_creations_total | Total number of checkpoint creations attempted. |
prometheus_tsdb_checkpoint_deletions_failed_total | Total number of checkpoint deletions that failed. |
prometheus_tsdb_checkpoint_deletions_total | Total number of checkpoint deletions attempted. |
prometheus_tsdb_compaction_chunk_range_seconds | Final time range of chunks on their first compaction |
prometheus_tsdb_compaction_chunk_samples | Final number of samples on their first compaction |
prometheus_tsdb_compaction_chunk_size_bytes | Final size of chunks on their first compaction |
prometheus_tsdb_compaction_duration_seconds | Duration of compaction runs |
prometheus_tsdb_compaction_populating_block | Set to 1 when a block is currently being written to the disk. |
prometheus_tsdb_compactions_failed_total | Total number of compactions that failed for the partition. |
prometheus_tsdb_compactions_skipped_total | Total number of skipped compactions due to disabled auto compaction. |
prometheus_tsdb_compactions_total | Total number of compactions that were executed for the partition. |
prometheus_tsdb_compactions_triggered_total | Total number of triggered compactions for the partition. |
prometheus_tsdb_head_active_appenders | Number of currently active appender transactions |
prometheus_tsdb_head_chunks | Total number of chunks in the head block. |
prometheus_tsdb_head_chunks_created_total | Total number of chunks created in the head |
prometheus_tsdb_head_chunks_removed_total | Total number of chunks removed in the head |
prometheus_tsdb_head_gc_duration_seconds | Runtime of garbage collection in the head block. |
prometheus_tsdb_head_max_time | Maximum timestamp of the head block. The unit is decided by the library consumer. |
prometheus_tsdb_head_max_time_seconds | Maximum timestamp of the head block. |
prometheus_tsdb_head_min_time | Minimum time bound of the head block. The unit is decided by the library consumer. |
prometheus_tsdb_head_min_time_seconds | Minimum time bound of the head block. |
prometheus_tsdb_head_samples_appended_total | Total number of appended samples. |
prometheus_tsdb_head_series | Total number of series in the head block. |
prometheus_tsdb_head_series_created_total | Total number of series created in the head |
prometheus_tsdb_head_series_not_found_total | Total number of requests for series that were not found. |
prometheus_tsdb_head_series_removed_total | Total number of series removed in the head |
prometheus_tsdb_head_truncations_failed_total | Total number of head truncations that failed. |
prometheus_tsdb_head_truncations_total | Total number of head truncations attempted. |
prometheus_tsdb_lowest_timestamp | Lowest timestamp value stored in the database. The unit is decided by the library consumer. |
prometheus_tsdb_lowest_timestamp_seconds | Lowest timestamp value stored in the database. |
prometheus_tsdb_reloads_failures_total | Number of times the database failed to reload block data from disk. |
prometheus_tsdb_reloads_total | Number of times the database reloaded block data from disk. |
prometheus_tsdb_retention_limit_bytes | Max number of bytes to be retained in the tsdb blocks, configured 0 means disabled |
prometheus_tsdb_size_retentions_total | The number of times that blocks were deleted because the maximum number of bytes was exceeded. |
prometheus_tsdb_storage_blocks_bytes | The number of bytes that are currently used for local storage by all blocks. |
prometheus_tsdb_symbol_table_size_bytes | Size of symbol table on disk (in bytes) |
prometheus_tsdb_time_retentions_total | The number of times that blocks were deleted because the maximum time limit was exceeded. |
prometheus_tsdb_tombstone_cleanup_seconds | The time taken to recompact blocks to remove tombstones. |
prometheus_tsdb_vertical_compactions_total | Total number of compactions done on overlapping blocks. |
prometheus_tsdb_wal_completed_pages_total | Total number of completed pages. |
prometheus_tsdb_wal_corruptions_total | Total number of WAL corruptions. |
prometheus_tsdb_wal_fsync_duration_seconds | Duration of WAL fsync. |
prometheus_tsdb_wal_page_flushes_total | Total number of page flushes. |
prometheus_tsdb_wal_segment_current | WAL segment index that TSDB is currently writing to. |
prometheus_tsdb_wal_truncate_duration_seconds | Duration of WAL truncation. |
prometheus_tsdb_wal_truncations_failed_total | Total number of WAL truncations that failed. |
prometheus_tsdb_wal_truncations_total | Total number of WAL truncations attempted. |
prometheus_tsdb_wal_writes_failed_total | Total number of WAL writes that failed. |
Grafana指标
Grafana通过配置相应参数,可以暴露其内置的度量数据,并由Prometheus服务抓取后再由Grafana报表系统来展示。 用户可以直接通过http://localhost:3000/metrics 查看
活跃实例指标
度量名 | 说明 |
---|---|
grafana_instance_start_total | 启动的实例数量 |
仪表盘、用户和播放列表指标
度量名 | 说明 |
---|---|
grafana_stat_active_users | 活跃用户数 |
grafana_stat_total_orgs | 总的组织数 |
grafana_stat_total_playlists | 总的playlist数量 |
grafana_stat_total_users | 总用户数 |
grafana_stat_totals_active_admins | 总活跃管理员数量 |
grafana_stat_totals_active_editors | 总活跃编辑人员数量 |
grafana_stat_totals_active_viewers | 总活跃浏览人员数量 |
grafana_stat_totals_admins | 总管理员 |
grafana_stat_totals_annotations | 总注释数 |
grafana_stat_totals_dashboard | 总报表数 |
grafana_stat_totals_dashboard_versions | 总报表版本数量 |
grafana_stat_totals_datasource | 总数据源 |
grafana_stat_totals_editors | 总编辑人员 |
grafana_stat_totals_viewers | 总浏览人员 |
grafana_build_info | grafana build信息 |
grafana_plugin_build_info | grafana的插件build信息 |
grafana_rendering_queue_size | 图片渲染队列长度 |
HTTP 度量
度量名 | 度量标签 | 说明 |
---|---|---|
http_request_duration_milliseconds | handler method statuscode quantile | 平均请求执行时长,如handler=”/“,method=”get”,statuscode=”200”,quantile=”0.5” |
http_request_duration_milliseconds_count | handler method statuscode | 请求次数,如handler=”/*“,method=”get”,statuscode=”200” |
http_request_duration_milliseconds_sum | handler method statuscode | 总请求时长 |
http_request_total | handler method statuscode | 总请求数 |
http_request_in_flight | - | 正在执行的请 |
Requests by routing group
度量名 | 度量标签 | 说明 |
---|---|---|
grafana_database_queries_duration_seconds_bucket | le | 数据库平均桶查询时长的分位数 如{le=”0.0001”} |
grafana_database_queries_duration_seconds_count | - | 数据库查询次数 |
grafana_database_queries_duration_seconds_sum | - | 数据库查询总时长 |
grafana_datasource_request_duration_seconds | code datasource method quantile} | 数据源请求持续时间 如{code=”200”,datasource=”easyops”,method=”get”,quantile=”0.5”} |
grafana_datasource_request_duration_seconds_count | code datasource method | 该数据源请求次数分位数 如{code=”200”,datasource=”easyops”,method=”get”} |
grafana_datasource_request_duration_seconds_sum | code datasource method | 该数据源请求持续总时长 |
grafana_datasource_request_in_flight | datasource | 该数据源正在请求数量 如{datasource=”easyops”} |
grafana_datasource_request_total | code datasource method | 数据源请求总数 如{code=”404”,datasource=”easyops”,method=”get”} |
grafana_datasource_response_size_bytes | datasource quantile | 数据源响应请求的字节数分位数 如{datasource=”easyops”,quantile=”0.5”} |
grafana_datasource_response_size_bytes_count | datasource | 数据源响应次数 如{datasource=”easyops”} |
grafana_datasource_response_size_bytes_sum | datasource | 数据源响应总字节数 |
grafana_db_datasource_query_by_id_total | - | 通过id获取datasource的查询总数 |
活跃报警指标
度量名 | 度量标签 | 说明 |
---|---|---|
grafana_alerting_active_alerts | - | 活跃报警数 |
grafana_alerting_execution_time_milliseconds | quantile | 报警执行时长的分位数 如{quantile=”0.5”} |
grafana_alerting_execution_time_milliseconds_count | - | 报警次数 |
grafana_alerting_execution_time_milliseconds_sum | - | 报警总时长 |
性能指标
度量名 | 度量标签 | 说明 |
---|---|---|
grafana_api_admin_user_created_total | - | 创建的管理账号数 |
grafana_api_dashboard_get_milliseconds | quantile | 获取报表时间分位数 如{quantile=”0.5”} |
grafana_api_dashboard_get_milliseconds_count | - | 获取报表次数 |
grafana_api_dashboard_get_milliseconds_sum | - | 获取报表的总时长 |
grafana_api_dashboard_save_milliseconds | quantile | 保存报表时长分位数 如{quantile=”0.5”} |
grafana_api_dashboard_save_milliseconds_count | - | 保存报表次数 |
grafana_api_dashboard_save_milliseconds_sum | - | 保存报表总时长 |
grafana_api_dashboard_search_milliseconds | quantile | 查询报表时间分位数 如{quantile=”0.99”} |
grafana_api_dashboard_search_milliseconds_count | - | 查询报表次数 |
grafana_api_dashboard_search_milliseconds_sum | - | 查询报表总时长 |
grafana_api_dashboard_snapshot_create_total | - | 创建报表快照总数 |
grafana_api_dashboard_snapshot_external_total | - | 外部报表快照总数 |
grafana_api_dashboard_snapshot_get_total | - | 获取报表快照总次数 |
grafana_api_dataproxy_request_all_milliseconds | quantile | 查询数据代理时长分位数 如{quantile=”0.9”} |
grafana_api_dataproxy_request_all_milliseconds_count | - | 查询数据代理次数 |
grafana_api_dataproxy_request_all_milliseconds_sum | - | 查询数据代理总时长 |
grafana_api_login_oauth_total | - | 使用oauth登录次数 |
grafana_api_login_post_total | - | 使用post登录次数 |
grafana_api_login_saml_total | - | 使用saml登录次数 |
grafana_api_models_dashboard_insert_total | - | 插入报表次数 |
grafana_api_org_create_total | - | 创建org次数 |
grafana_api_response_status_total | code | response类型数量 如{code=”200”} |
grafana_page_response_status_total | code | 页面response状态类型数量 如{code=”500”} |
grafana_proxy_response_status_total | code | 代理response状态类型数量 如{code=”404”} |
grafana_api_user_signup_completed_total | - | 完成注册用户数量 |
grafana_api_user_signup_invite_total | - | 邀请注册用户数量 |
grafana_api_user_signup_started_total | - | 启动注册流程的用户数量 |
grafana_ldap_users_sync_execution_time | quantile | LDAP用户同步执行时间的分位数 如{quantile=”0.9”} |
grafana_ldap_users_sync_execution_time_count | - | LDAP用户同步次数 |
grafana_ldap_users_sync_execution_time_sum | - | LDAP用户同步执行总时长 |
HDFS Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
namenode_rpc_rpc_RpcProcessingTimeAvgTime | 8020端口请求处理平均时间 | ms | |
namenode_rpc_rpc_RpcProcessingTimeNumOps | 8020端口rpc请求次数 | ||
namenode_rpc_rpc_ReceivedBytes | 8020端口接收到的数据量 | byte | |
namenode_rpc_rpc_SentBytes | 8020端口发送的数据 | byte | |
namenode_rpc_rpc_RpcQueueTimeAvgTime | 8020端口平均队列处理时间 | ||
namenode_rpc_rpc_RpcQueueTimeNumOps | 8020端口rpc请求次数 | ms | |
namenode_rpc_rpc_CallQueueLength | 8020端口CallQueueLength | ||
namenode_rpcdetailed_rpcdetailed_GetContentSummaryAvgTime | GetContentSummary方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_MkdirsAvgTime | Mkdirs方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_DeleteAvgTime | Delete方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_CreateAvgTime | Create方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_CompleteAvgTime | Complete方法平均回调时长 | ms | |
namenode_dfs_namenode_GetBlockLocationsAvgTime | GetBlockLocations操作总数 | ms | |
namenode_rpcdetailed_rpcdetailed_GetFileInfoAvgTime | GetFileInfo方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_GetListingAvgTime | GetListing方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_AddBlockAvgTime | AddBlock方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_BlockReportAvgTime | BlockReport方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_MonitorHealthAvgTime | MonitorHealth方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_BlockReceivedAndDeletedAvgTime | BlockReceivedAndDeleted方法平均回调时长 | ms | |
namenode_rpcdetailed_rpcdetailed_AddBlockNumOps | AddBlock方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_MkdirsNumOps | Mkdirs方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_DeleteNumOps | DeleteNum方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_CreateNumOps | Create方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_CompleteNumOps | Complete方法调用次数 | ||
namenode_dfs_namenode_GetBlockLocationsNumOps | GetBlockLocations操作总数 | ||
namenode_rpcdetailed_rpcdetailed_GetFileInfoNumOps | GetFileInfo方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_GetListingNumOps | GetListing方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_GetContentSummaryNumOps | GetContentSummary方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_BlockReceivedAndDeletedNumOps | BlockReceivedAndDeleted方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_BlockReportNumOps | BlockReport方法调用次数 | ||
namenode_rpcdetailed_rpcdetailed_MonitorHealthNumOps | MonitorHealth方法调用次数 | ||
namenode_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数 | ||
namenode_jvm_JvmMetrics_ThreadsNew | new状态的线程数 | ||
namenode_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数 | ||
namenode_jvm_JvmMetrics_ThreadsTerminated | terminated状态的线程数 | ||
namenode_jvm_JvmMetrics_ThreadsTimedWaiting | timedWaiting状态的线程数 | ||
namenode_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数 | ||
namenode_jvm_JvmMetrics_LogError | error类型的日志数 | ||
namenode_jvm_JvmMetrics_LogFatal | fatal类型的日志数 | ||
namenode_jvm_JvmMetrics_LogInfo | info类型的日志数 | ||
namenode_jvm_JvmMetrics_LogWarn | warn类型的日志数 | ||
namenode_jvm_JvmMetrics_MemHeapUsedM | 堆已使用的内存大小 | Mb | |
namenode_jvm_JvmMetrics_MemHeapCommittedM | 堆committed的内存大小 | Mb | |
namenode_jvm_JvmMetrics_MemHeapMaxM | 最大堆内存大小 | Mb | |
namenode_jvm_JvmMetrics_MemNonHeapUsedM | 非堆已使用的内存大小 | Mb | |
namenode_jvm_JvmMetrics_MemNonHeapCommittedM | 非堆committed的内存大小 | Mb | |
namenode_jvm_JvmMetrics_MemHeapUsedM | 最大非堆内存大小 | Mb | |
namenode_jvm_JvmMetrics_GcCountConcurrentMarkSweep | CMS GC次数 | ||
namenode_jvm_JvmMetrics_GcCountParNew | ParNew GC次数 | 和CMC一起混合GC,主要是年轻代 | |
namenode_jvm_JvmMetrics_GcTimeMillisConcurrentMarkSweep | CMS GC时长 | ||
namenode_jvm_JvmMetrics_GcTimeMillisParNew | ParNew GC时长 | ||
namenode_rpc_RetryCache_NameNodeRetryCache_CacheHit | Retrycache hit次数 | ||
namenode_rpc_RetryCache_NameNodeRetryCache_CacheUpdated | Retrycache updated次数 | ||
namenode_rpc_RetryCache_NameNodeRetryCache_CacheCleared | Retrycache cleared次数 | ||
namenode_dfs_FSNamesystem_LastCheckpointTime | 上一次checkpoint的时间 | ||
namenode_dfs_FSNamesystem_CapacityTotal | 当前总容量 | ||
namenode_dfs_FSNamesystem_NumLiveDataNodes | 存活dn数量 | ||
namenode_dfs_FSNamesystem_NumDeadDataNodes | Dead dn数量 | ||
namenode_dfs_FSNamesystem_VolumeFailuresTotal | 坏盘数量 | ||
namenode_dfs_FSNamesystem_CapacityUsedNonDFS | 当前已使用的NonDFS容量 | ||
namenode_dfs_FSNamesystem_BlocksTotal | 当前blocks数量 | ||
namenode_dfs_FSNamesystem_MissingBlocks | 当前missing的block数量 | ||
namenode_dfs_FSNamesystem_ExpiredHeartbeats | 超时心跳个数 | ||
namenode_dfs_FSNamesystem_TransactionsSinceLastLogRoll | 上一次edit log roll的transaction个数 | ||
namenode_dfs_FSNamesystem_TransactionsSinceLastCheckpoint | 上一次checkpoint的总transaction个数 | ||
namenode_dfs_FSNamesystem_SnapshottableDirectories | Snapshottable目录个数 | ||
namenode_dfs_FSNamesystem_TotalLoad | 当前总连接数 | ||
namenode_dfs_FSNamesystem_FilesTotal | 当前文件和目录总数 | ||
namenode_dfs_FSNamesystem_StaleDataNodes | 心跳超时被标记为stale的datanode个数 | ||
namenode_dfs_namenode_CreateFileOps | 创建文件操作数 | ||
namenode_dfs_namenode_FilesCreated | 通过 create 或者 mkdir 操作创建的文件或者目录操作数 | ||
namenode_dfs_namenode_FilesAppended | File append操作数 | ||
namenode_dfs_namenode_GetBlockLocations | GetBlockLocations操作数 | ||
namenode_dfs_namenode_GetListingOps | directory list 操作数 | ||
namenode_dfs_namenode_DeleteFileOps | 删除文件操作数 | ||
namenode_dfs_namenode_FilesDeleted | 通过 delete 或者 rename 删除文件或者目录的操作数 | ||
namenode_dfs_namenode_FileInfoOps | getFileInfo 和 getLinkFileInfo 操作数 | ||
namenode_dfs_namenode_FilesRenamed | 文件 rename 操作数 | ||
namenode_dfs_namenode_GetAdditionalDatanodeOps | GetAdditionalDatanode操作数 | ||
namenode_dfs_namenode_AddBlockOps | HDFS NameNode 添加Block的操作数 | ||
datanode_rpc_rpc_NumOpenConnections | datanode打开连接数 | ||
datanode_rpc_rpc_NumDroppedConnections | datanode drop掉的连接数 | ||
datanode_rpc_rpc_RpcProcessingTimeAvgTime | datanode请求处理平均时间 | ms | |
datanode_rpc_rpc_RpcProcessingTimeNumOps | datanode请求次数 | ||
datanode_rpc_rpc_CallQueueLength | datanode的callQueueLength | ||
datanode_rpc_rpc_ReceivedBytes | datanode接收到的数据量 | byte | |
datanode_rpc_rpc_SentBytes | datanode发送的数据量 | byte | |
datanode_rpc_rpc_DeferredRpcProcessingTimeAvgTime | ms | ||
datanode_rpc_rpc_RpcQueueTimeAvgTime | datanode的平均队列处理时间 | ms | |
datanode_rpc_rpc_DeferredRpcProcessingTimeNumOps | |||
datanode_jvm_JvmMetrics_MemHeapCommittedM | datanode堆committed的内存大小 | Mb | |
datanode_jvm_JvmMetrics_MemHeapMaxM | datanode堆最大内存 | Mb | |
datanode_jvm_JvmMetrics_MemHeapUsedM | datanode堆已使用内存 | Mb | |
datanode_jvm_JvmMetrics_MemNonHeapCommittedM | datanode非堆committed的内存大小 | Mb | |
datanode_jvm_JvmMetrics_MemNonHeapMaxM | datanode非堆最大内存 | Mb | |
datanode_jvm_JvmMetrics_MemNonHeapUsedM | datanode非堆已使用内存 | Mb | |
datanode_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数量 | ||
datanode_jvm_JvmMetrics_ThreadsNew | new状态的线程数量 | ||
datanode_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数量 | ||
datanode_jvm_JvmMetrics_ThreadsTerminated | terminated状态的线程数量 | ||
datanode_jvm_JvmMetrics_ThreadsTimedWaiting | timedWaiting状态的线程数量 | ||
datanode_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数量 | ||
datanode_jvm_JvmMetrics_LogError | error类型的日志数 | ||
datanode_jvm_JvmMetrics_LogFatal | fatal类型的日志数 | ||
datanode_jvm_JvmMetrics_LogInfo | info类型的日志数 | ||
datanode_jvm_JvmMetrics_LogWarn | warn类型的日志数 | ||
datanode_dfs_datanode_BlockChecksumOpNumOps | BlockChecksum操作数 | ||
datanode_dfs_datanode_BlockReportsNumOps | BlockReport操作数 | ||
datanode_dfs_datanode_CopyBlockOpNumOps | Block copy操作数 | ||
datanode_dfs_datanode_IncrementalBlockReportsNumOps | 增量block报告操作数 | ||
datanode_dfs_datanode_ReadBlockOpNumOps | 读操作数 | ||
datanode_dfs_datanode_ReplaceBlockOpNumOps | Block replace操作数 | ||
datanode_dfs_datanode_WriteBlockOpNumOps | 写操作数 | ||
datanode_dfs_datanode_BlockChecksumOpAvgTime | BlockCheckSum操作的平均时间 | ms | |
datanode_dfs_datanode_BlockReportsAvgTime | BlockReport操作的平均时间 | ms | |
datanode_dfs_datanode_CopyBlockOpAvgTime | Block copy操作的平均时间 | ms | |
datanode_dfs_datanode_WriteBlockOpAvgTime | 写操作的平均时间 | ms | |
datanode_dfs_datanode_ReadBlockOpAvgTime | 读操作的平均时间 | ms | |
datanode_dfs_datanode_ReplaceBlockOpAvgTime | Block replace操作的平均时间 | ms | |
datanode_dfs_datanode_IncrementalBlockReportsAvgTime | 增量block报告操作的平均时长 | ms | |
datanode_dfs_FsVolume_DataFileIoRateNumOps | 间隔时间内数据文件io操作次数 | ||
datanode_dfs_FsVolume_FileIoErrorRateNumOps | 间隔时间内文件io错误操作次数 | ||
datanode_dfs_FsVolume_FlushIoRateNumOps | 间隔时间内文件刷新io操作次数 | ||
datanode_dfs_FsVolume_MetadataOperationRateNumOps | 间隔时间内元数据操作次数 | ||
datanode_dfs_FsVolume_ReadIoRateNumOps | 间隔时间内文件读取操作次数 | ||
datanode_dfs_FsVolume_SyncIoRateNumOps | 间隔时间内文件同步操作次数 | ||
datanode_dfs_FsVolume_WriteIoRateNumOps | 间隔时间内写入文件操作次数 | ||
datanode_dfs_FsVolume_DataFileIoRateAvgTime | 数据文件操作的平均时长 | ms | |
datanode_dfs_FsVolume_FileIoErrorRateAvgTime | 操作开始到发生故障的平均时长 | ms | |
datanode_dfs_FsVolume_FlushIoRateAvgTime | 文件刷新io操作的平均时长 | ms | |
datanode_dfs_FsVolume_MetadataOperationRateAvgTime | 元数据操作的平均时长 | ms | |
datanode_dfs_FsVolume_ReadIoRateAvgTime | 文件读取操作的平均时长 | ms | |
datanode_dfs_FsVolume_SyncIoRateAvgTime | 文件同步操作的平均时长 | ms | |
datanode_dfs_FsVolume_WriteIoRateAvgTime | 文件写入操作的平均时长 | ms | |
datanode_ugi_UgiMetrics_LoginSuccessNumOps | kerberos成功登陆总数 | ||
datanode_ugi_UgiMetrics_LoginFailureNumOps | kerberos失败登陆总数 | ||
datanode_ugi_UgiMetrics_LoginSuccessAvgTime | kerberos成功登陆平均时长 | ms | |
datanode_ugi_UgiMetrics_LoginFailureAvgTime | kerberos失败登陆平均时长 | ms |
YARN Metrics
https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-common/Metrics.html
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumActiveNMs | active状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumUnhealthyNMs | unhealth状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumShutdownNMs | shutdown状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumRebootedNMs | rebooted状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumLostNMs | lost状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumDecommissioningNMs | decommissioning状态的nm数量 | ||
resourcemanager_yarn_QueueMetrics_AvailableVCores | 当前队列当前可用的VCore数量 | ||
resourcemanager_yarn_QueueMetrics_AvailableMB | 当前队列当前可用的内存大小 | ||
resourcemanager_yarn_QueueMetrics_AbsoluteUsedCapacity | 当前队列资源使用率 | ||
resourcemanager_yarn_QueueMetrics_AllocatedVCores | 当前队列分配的VCore数量 | ||
resourcemanager_yarn_QueueMetrics_PendingVCores | 当前队列pending的VCore数量 | ||
resourcemanager_yarn_QueueMetrics_AllocatedMB | 当前队列分配的内存大小 | ||
resourcemanager_yarn_QueueMetrics_PendingMB | 当前队列pending的内存大小 | ||
resourcemanager_yarn_QueueMetrics_AppsRunning | 当前队列running状态的任务数 | ||
resourcemanager_yarn_QueueMetrics_AppsFailed | 当前队列failed状态的任务数 | ||
resourcemanager_yarn_QueueMetrics_AppsPending | 当前队列pending状态的任务数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayNumOps | AM启动总数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayNumOps | AM注册总数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayAvgTime | AM启动平均耗时 | ms | |
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayAvgTime | AM注册平均耗时 | ms | |
resourcemanager_jvm_JvmMetrics_GcCount | RM的GC总次数 | ||
resourcemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded | RM超过GC Info的阈值次数 | ||
resourcemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded | RM超过GC Warn的阈值次数 | ||
resourcemanager_jvm_JvmMetrics_GcTotalExtraSleepTime | GC额外的休眠时间 | ||
resourcemanager_jvm_JvmMetrics_GcTimeMillis | GC时长 | ||
resourcemanager_jvm_JvmMetrics_LogError | error状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_LogFatal | fatal状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_LogWarn | warn状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsTimedWaiting | timedwaiting状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_MemHeapCommittedM | 堆committed的内存大小 | ||
resourcemanager_jvm_JvmMetrics_MemHeapMaxM | 最大堆大小 | ||
resourcemanager_jvm_JvmMetrics_MemHeapUsedM | 已使用堆大小 | ||
resourcemanager_rpc_rpc_CallQueueLength | RPC Call队列长度 | ||
resourcemanager_rpc_rpc_NumOpenConnections | 当前打开连接数 | ||
resourcemanager_rpc_rpc_NumDroppedConnections | 当前丢弃连接数 | ||
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps | RPC调用总延迟次数 | ||
resourcemanager_rpc_rpc_RpcProcessingTimeNumOps | RPC调用总次数 | ||
resourcemanager_rpc_rpc_RpcQueueTimeNumOps | 同上 | ||
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime | RPC调用延迟平均时长 | ||
resourcemanager_rpc_rpc_RpcProcessingTimeAvgTime | RPC处理平均时长 | ||
resourcemanager_rpc_rpc_RpcQueueTimeAvgTime | RPC队列平均耗时 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportNumOps | GetApplicationReport方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusNumOps | GetServiceStatus方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthNumOps | MonitorHealth方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatNumOps | NodeHeartbeat方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerNumOps | RegisterNodeManager方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveNumOps | TransitionToActive方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyNumOps | TransitionToStandby方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportAvgTime | GetApplicationReport方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusAvgTime | GetServiceStatus方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthAvgTime | MonitorHealth方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatAvgTime | NodeHeartbeat方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerAvgTime | RegisterNodeManager方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveAvgTime | TransitionToActive方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyAvgTime | TransitionToStandby方法平均回调时长 | ||
resourcemanager_ugi_UgiMetrics_LoginSuccessNumOps | kerberos成功登陆总数 | ||
resourcemanager_ugi_UgiMetrics_LoginFailureNumOps | kerberos失败登陆总数 | ||
resourcemanager_ugi_UgiMetrics_LoginSuccessAvgTime | kerberos成功登陆平均时长 | ||
resourcemanager_ugi_UgiMetrics_LoginFailureAvgTime | kerberos失败登陆平均时长 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersRunning | 当前running状态的container数量 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersFailed | failed状态的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersKilled | 被kill的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersLaunched | 启动的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersCompleted | completed状态的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_AllocatedGB | 当前分配的内存大小 | GB | |
nodemanager_yarn_NodeManagerMetrics_AllocatedContainers | 当前分配的container数量 | ||
nodemanager_yarn_NodeManagerMetrics_AllocatedVCores | 当前分配的VCore数量 | ||
nodemanager_yarn_NodeManagerMetrics_AvailableVCores | 当前可用的VCore数量 | ||
nodemanager_yarn_NodeManagerMetrics_AvailableGB | 当前可用的内存大小 | GB | |
nodemanager_jvm_JvmMetrics_GcCount | GC次数 | ||
nodemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded | 超过GC Info的阈值次数 | ||
nodemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded | 超过GC Warn的阈值次数 | ||
nodemanager_jvm_JvmMetrics_GcTotalExtraSleepTime | GC额外的休眠时间 | ||
nodemanager_jvm_JvmMetrics_GcTimeMillis | GC时长 | ||
nodemanager_jvm_JvmMetrics_LogError | error状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_LogFatal | fatal状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_LogWarn | warn状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsTimedWaiting | timedwaiting状态的线程数 | ||
nodemanager_jvm_JvmMetrics_MemHeapCommittedM | 堆committed的内存大小 | ||
nodemanager_jvm_JvmMetrics_MemHeapMaxM | 最大堆大小 | ||
nodemanager_jvm_JvmMetrics_MemHeapUsedM | 已使用堆大小 | ||
nodemanager_rpc_rpc_NumOpenConnections | 当前打开连接数 | ||
nodemanager_rpc_rpc_NumDroppedConnections | 当前丢弃连接数 | ||
nodemanager_rpc_rpc_CallQueueLength | RPC Call队列长度 | ||
nodemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps | RPC调用总延迟次数 | ||
nodemanager_rpc_rpc_RpcProcessingTimeNumOps | RPC调用总次数 | ||
nodemanager_rpc_rpc_RpcQueueTimeNumOps | 同上 | ||
nodemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime | RPC调用延迟平均时长 | ||
nodemanager_rpc_rpc_RpcProcessingTimeAvgTime | RPC处理平均时长 | ||
nodemanager_rpc_rpc_RpcQueueTimeAvgTime | RPC队列平均耗时 | ||
nodemanager_ugi_UgiMetrics_LoginFailureNumOps | kerberos失败登陆总数 | ||
nodemanager_ugi_UgiMetrics_LoginSuccessNumOps | kerberos成功登陆总数 | ||
nodemanager_ugi_UgiMetrics_LoginFailureAvgTime | kerberos失败登陆平均时长 | ||
nodemanager_ugi_UgiMetrics_LoginSuccessAvgTime | kerberos成功登陆平均时长 |
Hive Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
hivemetastore_open_connections_count | 当前打开连接数 | ||
hivemetastore_threads_blocked_count_value | blocked状态线程数 | ||
hivemetastore_threads_runnable_count_value | runnable状态线程数 | ||
hivemetastore_threads_waiting_count_value | waiting状态线程数 | ||
hivemetastore_threads_timed_waiting_count_value | timedwaiting状态线程数 | ||
hivemetastore_threads_deadlock_count_value | deadlock状态线程数 | ||
hivemetastore_memory_heap_max_value | 最大堆大小 | ||
hivemetastore_memory_heap_used_value | 已使用堆大小 | ||
hivemetastore_memory_heap_init_value | 初始化堆大小 | ||
hivemetastore_memory_pools_CMS_Old_Gen_usage_value | 老年代内存使用率 | ||
hivemetastore_memory_pools_Par_Eden_Space_usage_value | 年轻代内存使用率 | ||
hivemetastore_gc_ParNew_count_value | 年轻代GC总次数 | ||
hivemetastore_gc_ConcurrentMarkSweep_count_value | 老年代GC总次数 | ||
hivemetastore_gc_ParNew_time_value | 年轻代GC总时长 | ||
hivemetastore_gc_ConcurrentMarkSweep_time_value | 老年代GC总时长 | ||
hivemetastore_buffers_direct_count_value | 直接内存的buffer数量 | hive的nio相关指标 | |
hivemetastore_buffers_mapped_count_value | |||
hivemetastore_buffers_direct_capacity_value | |||
hivemetastore_buffers_direct_used_value | |||
hivemetastore_buffers_mapped_capacity_value | |||
hivemetastore_buffers_mapped_used_value | |||
hivemetastoreapi.*_names_max | 调用对应api最大耗时 | .*为具体api名称 | |
hivemetastoreapi.*_names_min | 调用对应api最小耗时 | ||
hivemetastoreapi.*_names_mean | 调用对应api平均耗时 | ||
hivemetastoreapi.*_names_stddev | 调用对应api耗时标准差 | ||
hivemetastoreapi.*_names_p95 | 调用对应api耗时p95 | ||
hivemetastoreapi.*_names_p98 | 调用对应api耗时p98 | ||
hivemetastoreapi.*_names_p99 | 调用对应api耗时p99 | ||
hivemetastoreapi.*_names_p999 | 调用对应api耗时p999 | ||
hivemetastoreapi.*_names_count | 调用对应api总次数 | ||
hivemetastoreapi.*_names_m15_rate | 调用对应api平均每秒次数(15分钟内) | ||
hivemetastoreapi.*_names_m1_rate | 调用对应api平均每秒次数(1分钟内) | ||
hivemetastoreapi.*_names_m5_rate | 调用对应api平均每秒次数(5分钟内) | ||
hivemetastoreapi.*_names_mean_rate | 调用对应api平均每秒次数 | ||
hiveserver2_cumulative_connection_count_count | 累计连接数 | ||
hiveserver2_jvm_pause_extraSleepTime_count | GC额外的休眠时间 | ||
hiveserver2_open_connections_count | 当前打开连接数 | ||
hiveserver2_open_operations_count | 当前操作数 | ||
hiveserver2_gc_ConcurrentMarkSweep_count_value | 老年代GC总次数 | ||
hiveserver2_gc_ParNew_count_value | 年轻代GC总次数 | ||
hiveserver2_gc_ConcurrentMarkSweep_time_value | 老年代GC总耗时 | ||
hiveserver2_gc_ParNew_time_value | 年轻代GC总耗时 | ||
hiveserver2_buffers_direct_count_value | |||
hiveserver2_buffers_mapped_count_value | |||
hiveserver2_buffers_direct_used_value | |||
hiveserver2_buffers_direct_capacity_value | |||
hiveserver2_buffers_mapped_used_value | |||
hiveserver2_buffers_mapped_capacity_value | |||
hiveserver2_memory_heap_max_value | 最大堆大小 | ||
hiveserver2_memory_heap_used_value | 已使用堆大小 | ||
hiveserver2_memory_heap_init_value | 初始化堆大小 | ||
hiveserver2_memory_pools_Par_Eden_Space_usage_value | Eden区内存使用率 | ||
hiveserver2_memory_pools_Par_Survivor_Space_usage_value | Survivor区内存使用率 | ||
hiveserver2_memory_pools_CMS_Old_Gen_usage_value | 老年代内存使用率 | ||
hiveserver2_threads_waiting_count_value | waiting状态线程数 | ||
hiveserver2_threads_timed_waiting_count_value | timedwaiting状态线程数 | ||
hiveserver2_threads_blocked_count_value | blocked状态线程数 | ||
hiveserver2_threads_runnable_count_value | runnable状态线程数 | ||
hiveserver2api.*_max | 调用对应api最大耗时 | .*为具体api名称 | |
hiveserver2api.*_min | 调用对应api最小耗时 | ||
hiveserver2api.*_mean | 调用对应api平均耗时 | ||
hiveserver2api.*_stddev | 调用对应api耗时标准差 | ||
hiveserver2api.*_p95 | 调用对应api耗时p95 | ||
hiveserver2api.*_p98 | 调用对应api耗时p98 | ||
hiveserver2api.*_p99 | 调用对应api耗时p99 | ||
hiveserver2api.*_p999 | 调用对应api耗时p999 | ||
hiveserver2api.*_m15_rate | 调用对应api平均每秒次数(15分钟内) | ||
hiveserver2api.*_m1_rate | 调用对应api平均每秒次数(1分钟内) | ||
hiveserver2api.*_m5_rate | 调用对应api平均每秒次数(5分钟内) | ||
hiveserver2api.*_mean_rate | 调用对应api平均每秒次数 |
Kerberos Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
procstat_cpu_usage | cpu使用率 | ||
procstat_memory_usage | 内存使用率 |
Kyuubi Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
error_queries_count | error状态的查询数量 | ||
running_queries_count | running状态的查询数量 | ||
open_connections_count | 当前打开连接数 | ||
open_operations_count | 当前操作数 | ||
direct_used_value | nio相关 | ||
direct_capacity_value | |||
direct_count_value | |||
mapped_used_value | |||
mapped_capacity_value | |||
mapped_count_value | |||
heap_used_value | 已使用堆大小 | ||
heap_committed_value | 堆committed的内存大小 | ||
heap_max_value | 最大堆大小 | ||
ParNew_count_value | 年轻代GC总次数 | ||
ConcurrentMarkSweep_count_value | 老年代GC总次数 | ||
ConcurrentMarkSweep_time_value | 老年代GC总时长 | ||
ParNew_time_value | 年轻代GC总时长 |
Ldap Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
openldapoperations。*_completed | 当前操作总完成数 | 。*为具体操作 | |
openldap_statistics_bytes | server端发送数据的一些统计数据 | ||
openldap_statistics_entries | server端发送数据的一些统计数据 | ||
openldap_statistics_pdu | server端发送数据的一些统计数据 | ||
openldap_statistics_referrals | server端发送数据的一些统计数据 | ||
openldap_threads_active | active状态线程数 | ||
openldap_threads_backload | backload状态线程数 | ||
openldap_threads_max | 最大线程数 | ||
openldap_threads_max_pending | 最大pending状态线程数 | ||
openldap_threads_open | open状态线程数 | ||
openldap_threads_pending | pending状态线程数 | ||
openldap_threads_starting | starting状态线程数 | ||
openldap_waiters_read | 当前 read waiters数量 | ||
openldap_waiters_write | 当前write waiters数量 | ||
openldap_connections_max_file_descriptors | 最大连接数 | ||
openldap_connections_current | 当前连接数 | ||
openldap_connections_total | 总连接数 |
Nginx Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
nginx_accepts | accept状态连接数 | ||
nginx_handled | handled状态连接数 | ||
nginx_requests | request状态连接数 | ||
nginx_active | active状态连接数 | ||
nginx_reading | reading状态连接数 | ||
nginx_writing | writing状态连接数 | ||
nginx_waiting | waiting状态连接数 |
Mysql Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
mysql_global_status_queries | 查询总数 | ||
mysql_global_variables_innodb_buffer_pool_size | innodb的buffer pool大小 | ||
mysql_global_status_threads_connected | 当前连接数 | ||
mysql_global_status_max_used_connections | 历史最大连接数 | ||
mysql_global_variables_max_connections | 允许最大连接数 | ||
mysql_global_status_questions | 查询总数 | 不包含存储过程里面执行的查询语句 | |
mysql_global_variables_thread_cache_size | Thread cache大小 | ||
mysql_global_status_threads_cached | Thread cache中线程数 | ||
mysql_global_status_threads_created | 创建处理连接的线程数 | ||
mysql_global_status_created_tmp_tables | 临时表创建数量 | ||
mysql_global_status_created_tmp_disk_tables | 临时表创建数量(on-disk) | ||
mysql_global_status_created_tmp_files | 临时文件创建数量 | ||
mysql_global_status_select_full_join | 无索引进行联合查询的连接数 | ||
mysql_global_status_select_full_range_join | 目标表使用范围查询的连接数 | ||
mysql_global_status_select_range | 第一张表使用范围查询的连接数 | ||
mysql_global_status_select_range_check | |||
mysql_global_status_select_scan | 对第一张表进行完全扫描的连接数 | ||
mysql_global_status_sort_rows | 已排序行数 | ||
mysql_global_status_sort_range | 在范围内执行排序数量 | ||
mysql_global_status_sort_merge_passes | 排序算法已执行的合并数量 | ||
mysql_global_status_sort_scan | 通过扫描表完成的排序数量 | ||
mysql_global_status_slow_queries | 慢查询数量 | 以long_query_time为阈值 | |
mysql_global_status_aborted_connects | 尝试连接mysql而失败的连接数 | ||
mysql_global_status_aborted_clients | 客户端没有正确关闭连接导致中断的连接数 | ||
mysql_global_status_table_locks_immediate | 立即获得表的锁的次数 | ||
mysql_global_status_table_locks_waited | 不能立即获得表的锁的次数 | ||
mysql_global_status_bytes_received | 从所有客户端接收到的字节数 | ||
mysql_global_status_bytes_sent | 发送给所有客户端的字节数 | ||
mysql_global_status_innodb_page_size | innodb页大小 | ||
mysql_global_status_buffer_pool_pages | 总数据页数 | ||
mysql_global_variables_innodb_additional_mem_pool_size | 内存池大小 | 用来存储数据字典和其他数据结构 | |
mysql_global_status_innodb_mem_dictionary | 数据字典大小 | ||
mysql_global_variables_key_buffer_size | 索引缓冲区大小 | ||
mysql_global_variables_query_cache_size | 查询缓存大小 | ||
mysql_global_status_commands_total | 命令总数 | ||
mysql_global_status_handlers_total | 处理总数 | ||
mysql_global_status_qcache_free_memory | 用于查询缓存的free内存大小 | ||
mysql_global_status_qcache_hits | 查询缓存命中次数 | ||
mysql_global_status_qcache_inserts | 加入到缓存的查询次数 | ||
mysql_global_status_qcache_not_cached | 非缓存查询数 | ||
mysql_global_status_qcache_lowmem_prunes | 由于内存较少从缓存删除的查询数量 | ||
mysql_global_status_qcache_queries_in_cache | 登记到缓存内的查询数量 | ||
mysql_global_status_opened_files | 文件打开数 | ||
mysql_global_status_open_files | 当前打开的文件数 | ||
mysql_global_variables_open_files_limit | 文件打开数限制 | ||
mysql_global_status_innodb_num_open_files | innodb当前打开文件数 | ||
mysql_global_status_opened_tables | 被打开的表数量 | ||
mysql_global_status_table_open_cache_hits | 从表缓存中获取表数量 | ||
mysql_global_status_table_open_cache_misses | 非表缓存打开表次数 | ||
mysql_global_status_table_open_cache_overflows | 超过参数table_open_cache后被淘汰的表实例数 | ||
mysql_global_status_open_tables | 当前打开表数量 | ||
mysql_global_variables_table_open_cache | 表描述符缓存大小 | ||
mysql_global_status_open_table_definitions | 缓存的表描述符数量 | ||
mysql_global_variables_table_definition_cache | 表描述符缓存大小 | ||
mysql_global_status_opened_table_definitions | 已经被缓存的表描述符数量 |
Redis Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
redis_connected_clients | 客户端连接数 | ||
redis_commands_processed_total | 处理命令总数 | ||
redis_keyspace_hits_total | 键命中数 | ||
redis_keyspace_misses_total | 键未命中数 | ||
redis_memory_used_bytes | redis已使用内存 | ||
redis_memory_max_bytes | redis最大内存 | ||
redis_net_input_bytes_total | input总流量 | ||
redis_net_output_bytes_total | output总流量 | ||
redis_db_keys | 总key数量 | ||
redis_db_keys_expiring | 过期key数量 | ||
redis_evicted_keys_total | 删除的key数量 |
Node Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
node_filesystem_size_bytes | 挂载盘大小 | ||
node_filesystem_avail_bytes | 挂载盘可用空间大小 | ||
node_filesystem_free_bytes | 挂载盘剩余空间 | ||
node_disk_reads_merged_total | 每个磁盘分区合并读完成次数 | ||
node_disk_writes_merged_total | 每个磁盘分区合并写完成次数 | ||
node_disk_reads_completed_total | 每个磁盘分区读完成次数 | ||
node_disk_writes_completed_total | 每个磁盘分区写完成次数 | ||
node_disk_read_bytes_total | 每个磁盘分区读的byte数 | ||
node_disk_written_bytes_total | 每个磁盘分区写的byte数 | ||
node_disk_io_time_seconds_total | 每个磁盘分区花费在io的总耗时 | ||
node_cpu_seconds_total | 单个cpu的使用时长 | ||
node_memory_MemAvailable_bytes | 当前节点可用内存 | ||
node_memory_MemTotal_bytes | 当前节点内存大小 | ||
node_memory_SwapFree_bytes | 当前节点空闲交换区大小 | ||
node_memory_SwapTotal_bytes | 当前节点交换区总大小 | ||
node_load1 | 1分钟内cpu平均负载 | ||
node_load5 | 5分钟内cpu平均负载 | ||
node_load15 | 15分钟内cpu平均负载 | ||
node_filefd_allocated | 当前节点正在使用的文件描述符数量 | ||
node_context_switches_total | Context switch总次数 | ||
node_network_receive_bytes_total | 网卡入网流量 | ||
node_network_transmit_bytes_total | 网卡出网流量 | ||
node_disk_io_now | 每个磁盘分区正在处理的io数 | ||
node_netstat_Tcp_CurrEstab | 当前状态为estab的tcp连接数 | ||
node_sockstat_TCP_tw | 等待关闭的tcp连接数 | ||
node_sockstat_sockets_used | 已使用所有协议的套接字总数 | ||
node_sockstat_UDP_inuse | 正在使用的udp套接字数量 | ||
node_sockstat_TCP_alloc | 已分配的tcp套接字数量 |
中台组件 Metrics
指标名称 | 含义 | 单位 | 备注 |
---|---|---|---|
jvm_memory_bytes_used | 堆已使用内存大小 | ||
jvm_memory_used_bytes | 堆已使用内存大小 | ||
jvm_memory_bytes_committed | 堆committed的内存大小 | ||
jvm_memory_committed_bytes | 堆committed的内存大小 | ||
jvm_memory_bytes_max | 堆最大内存大小 | ||
jvm_memory_max_bytes | 堆最大内存大小 | ||
jvm_threads_current | 当前线程数 | ||
jvm_threads_live | live状态线程数 | ||
jvm_threads_live_threads | live状态线程数 | ||
jvm_threads_daemon | daemon状态线程数 | ||
jvm_threads_daemon_threads | daemon状态线程数 | ||
jvm_threads_peak | peak状态线程数 | ||
jvm_threads_peak_threads | peak状态线程数 | ||
jvm_threads_deadlocked | deadlocked状态线程数 | ||
jvm_threads_deadlocked_threads | deadlocked状态线程数 | ||
jvm_memory_pool_bytes_used | jvm内存池使用大小 | ||
jvm_memory_pool_bytes_committed | jvm内存池committed大小 | ||
jvm_memory_pool_bytes_max | jvm内存池最大大小 | ||
jvm_gc_collection_seconds_count | gc总次数 | ||
jvm_gc_pause_seconds_count | gc总次数 | ||
jvm_gc_collection_seconds_sum | gc总时长 | ||
jvm_gc_pause_seconds_sum | gc总时长 | ||
jvm_gc_memory_allocated_bytes_total | 在一个GC之后到下一个GC之前增加年轻代内存池的大小 | ||
jvm_gc_memory_promoted_bytes_total | GC之前到GC之后,老年代的大小正向增加的计数 | ||
jvm_buffer_pool_used_bytes | 已使用缓冲池大小 | ||
jvm_buffer_pool_capacity_bytes | 缓冲池容量 | ||
jvm_buffer_pool_used_buffers | 已使用缓冲池数量 | ||
jvm_buffer_memory_used_bytes | 已使用缓冲池大小 | ||
jvm_buffer_total_capacity_bytes | 缓冲池容量 | ||
jvm_buffer_count_buffers | 缓冲池数量 | ||
tomcat_global_error_total | tomcat全局错误总计 | ||
tomcat_threads_config_max_threads | tomcat设置最大线程数 | ||
tomcat_sessions_active_current | tomcat当前active状态线程数 | ||
tomcat_sessions_active_current_sessions | tomcat当前active状态线程数 | ||
tomcat_global_sent_bytes_total | tomcat全局发送字节数总计 | ||
tomcat_global_received_bytes_total | tomcat全局接收字节数总计 | ||
http_server_requests_seconds_count | http请求次数 | ||
tomcat_threads_current | tomcat当前线程数 | ||
tomcat_threads_current_threads | tomcat当前线程数 | ||
tomcat_threads_busy | tomcat当前busy状态线程数 | ||
tomcat_threads_busy_threads | tomcat当前busy状态线程数 | ||
http_server_requests_seconds_sum | http请求总耗时 | ||
jetty_threads_busy | jetty当前busy状态线程数 | ||
jetty_threads_current | jetty当前线程数 | ||
jetty_threads_config_max | jetty设置最大线程数 | ||
system_cpu_usage | 系统cpu使用率 | ||
process_cpu_usage | 当前进程cpu使用率 | ||
system_load_average_1m | 系统1分钟平均cpu负载 | ||
jvm_threads_states_threads | states状态线程数 | ||
logback_events_total | 各种状态日志数 | ||
process_files_open_files | 当前进程开启文件描述符数 | ||
process_files_max_files | 当前进程最大可开启文件描述符数 | ||
hikaricp_connections_max | hikaricp连接池最大连接数 | ||
hikaricp_connections_min | hikaricp连接池最小连接数 | ||
hikaricp_connections_timeout_total | hikaricp连接池总超时时长 | ||
hikaricp_connections_active | hikaricp连接池active状态连接数 | ||
hikaricp_connections_idle | hikaricp连接池idle状态连接数 | ||
hikaricp_connections_pending | hikaricp连接池pending状态连接数 | ||
hikaricp_connections_usage_seconds_sum | hikaricp连接池连接使用总耗时 | ||
hikaricp_connections_usage_seconds_count | hikaricp连接池连接使用总次数 | ||
hikaricp_connections_creation_seconds_sum | hikaricp连接池连接创建总耗时 | ||
hikaricp_connections_creation_seconds_count | hikaricp连接池连接创建总次数 | ||
hikaricp_connections_acquire_seconds_sum | hikaricp连接池连接acquire总耗时 | ||
hikaricp_connections_acquire_seconds_count | hikaricp连接池连接acquire总次数 | ||
exception_count_total | 各类异常总数 |
文档反馈
以上内容对您是否有帮助?