YARN指标说明
更新时间: 2023-02-03 18:44:02
阅读 161
YARN指标说明
监控指标
指标名 | 含义 | 单位 | 备注 |
---|---|---|---|
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumActiveNMs | active状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumUnhealthyNMs | unhealth状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumShutdownNMs | shutdown状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumRebootedNMs | rebooted状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumLostNMs | lost状态的nm数量 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_NumDecommissioningNMs | decommissioning状态的nm数量 | ||
resourcemanager_yarn_QueueMetrics_AvailableVCores | 当前队列当前可用的VCore数量 | ||
resourcemanager_yarn_QueueMetrics_AvailableMB | 当前队列当前可用的内存大小 | ||
resourcemanager_yarn_QueueMetrics_AbsoluteUsedCapacity | 当前队列资源使用率 | ||
resourcemanager_yarn_QueueMetrics_AllocatedVCores | 当前队列分配的VCore数量 | ||
resourcemanager_yarn_QueueMetrics_AllocatedMB | 当前队列分配的内存大小 | ||
resourcemanager_yarn_QueueMetrics_PendingMB | 当前队列pending的内存大小 | ||
resourcemanager_yarn_QueueMetrics_AppsRunning | 当前队列running状态的任务数 | ||
resourcemanager_yarn_QueueMetrics_AppsFailed | 当前队列failed状态的任务数 | ||
resourcemanager_yarn_QueueMetrics_AppsPending | 当前队列pending状态的任务数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayNumOps | AM启动总数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayNumOps | AM注册总数 | ||
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMLaunchDelayAvgTime | AM启动平均耗时 | ms | |
resourcemanager_yarn_ClusterMetrics_ClusterMetrics_ResourceManager_AMRegisterDelayAvgTime | AM注册平均耗时 | ms | |
resourcemanager_jvm_JvmMetrics_GcCount | RM的GC总次数 | ||
resourcemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded | RM超过GC Info的阈值次数 | ||
resourcemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded | RM超过GC Warn的阈值次数 | ||
resourcemanager_jvm_JvmMetrics_GcTotalExtraSleepTime | GC额外的休眠时间 | ||
resourcemanager_jvm_JvmMetrics_GcTimeMillis | GC时长 | ||
resourcemanager_jvm_JvmMetrics_LogError | error状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_LogFatal | fatal状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_LogWarn | warn状态的日志数量 | ||
resourcemanager_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsTimedWaiting | timedwaiting状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数 | ||
resourcemanager_jvm_JvmMetrics_MemHeapCommittedM | 堆committed的内存大小 | ||
resourcemanager_jvm_JvmMetrics_MemHeapMaxM | 最大堆大小 | ||
resourcemanager_jvm_JvmMetrics_MemHeapUsedM | 已使用堆大小 | ||
resourcemanager_rpc_rpc_CallQueueLength | RPC Call队列长度 | ||
resourcemanager_rpc_rpc_NumOpenConnections | 当前打开连接数 | ||
resourcemanager_rpc_rpc_NumDroppedConnections | 当前丢弃连接数 | ||
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps | RPC调用总延迟次数 | ||
resourcemanager_rpc_rpc_RpcProcessingTimeNumOps | RPC调用总次数 | ||
resourcemanager_rpc_rpc_RpcQueueTimeNumOps | RPC调用总次数 | ||
resourcemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime | RPC调用延迟平均时长 | ||
resourcemanager_rpc_rpc_RpcProcessingTimeAvgTime | RPC处理平均时长 | ||
resourcemanager_rpc_rpc_RpcQueueTimeAvgTime | RPC队列平均耗时 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportNumOps | GetApplicationReport方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusNumOps | GetServiceStatus方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthNumOps | MonitorHealth方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatNumOps | NodeHeartbeat方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerNumOps | RegisterNodeManager方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveNumOps | TransitionToActive方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyNumOps | TransitionToStandby方法调用次数 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetApplicationReportAvgTime | GetApplicationReport方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_GetServiceStatusAvgTime | GetServiceStatus方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_MonitorHealthAvgTime | MonitorHealth方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_NodeHeartbeatAvgTime | NodeHeartbeat方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_RegisterNodeManagerAvgTime | RegisterNodeManager方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToActiveAvgTime | TransitionToActive方法平均回调时长 | ||
resourcemanager_rpcdetailed_rpcdetailed_TransitionToStandbyAvgTime | TransitionToStandby方法平均回调时长 | ||
resourcemanager_ugi_UgiMetrics_LoginSuccessNumOps | kerberos成功登陆总数 | ||
resourcemanager_ugi_UgiMetrics_LoginFailureNumOps | kerberos失败登陆总数 | ||
resourcemanager_ugi_UgiMetrics_LoginSuccessAvgTime | kerberos成功登陆平均时长 | ||
resourcemanager_ugi_UgiMetrics_LoginFailureAvgTime | kerberos失败登陆平均时长 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersRunning | 当前running状态的container数量 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersFailed | failed状态的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersKilled | 被kill的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersLaunched | 启动的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_ContainersCompleted | completed状态的container总数 | ||
nodemanager_yarn_NodeManagerMetrics_AllocatedGB | 当前分配的内存大小 | GB | |
nodemanager_yarn_NodeManagerMetrics_AllocatedContainers | 当前分配的container数量 | ||
nodemanager_yarn_NodeManagerMetrics_AllocatedVCores | 当前分配的VCore数量 | ||
nodemanager_yarn_NodeManagerMetrics_AvailableVCores | 当前可用的VCore数量 | ||
nodemanager_yarn_NodeManagerMetrics_AvailableGB | 当前可用的内存大小 | GB | |
nodemanager_jvm_JvmMetrics_GcCount | GC次数 | ||
nodemanager_jvm_JvmMetrics_GcNumInfoThresholdExceeded | 超过GC Info的阈值次数 | ||
nodemanager_jvm_JvmMetrics_GcNumWarnThresholdExceeded | 超过GC Warn的阈值次数 | ||
nodemanager_jvm_JvmMetrics_GcTotalExtraSleepTime | GC额外的休眠时间 | ||
nodemanager_jvm_JvmMetrics_GcTimeMillis | GC时长 | ||
nodemanager_jvm_JvmMetrics_LogError | error状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_LogFatal | fatal状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_LogWarn | warn状态的日志数量 | ||
nodemanager_jvm_JvmMetrics_ThreadsBlocked | blocked状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsWaiting | waiting状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsRunnable | runnable状态的线程数 | ||
nodemanager_jvm_JvmMetrics_ThreadsTimedWaiting | timedwaiting状态的线程数 | ||
nodemanager_jvm_JvmMetrics_MemHeapCommittedM | 堆committed的内存大小 | ||
nodemanager_jvm_JvmMetrics_MemHeapMaxM | 最大堆大小 | ||
nodemanager_jvm_JvmMetrics_MemHeapUsedM | 已使用堆大小 | ||
nodemanager_rpc_rpc_NumOpenConnections | 当前打开连接数 | ||
nodemanager_rpc_rpc_NumDroppedConnections | 当前丢弃连接数 | ||
nodemanager_rpc_rpc_CallQueueLength | RPC Call队列长度 | ||
nodemanager_rpc_rpc_DeferredRpcProcessingTimeNumOps | RPC调用总延迟次数 | ||
nodemanager_rpc_rpc_RpcProcessingTimeNumOps | RPC调用总次数 | ||
nodemanager_rpc_rpc_RpcQueueTimeNumOps | RPC调用总次数 | ||
nodemanager_rpc_rpc_DeferredRpcProcessingTimeAvgTime | RPC调用延迟平均时长 | ||
nodemanager_rpc_rpc_RpcProcessingTimeAvgTime | RPC处理平均时长 | ||
nodemanager_rpc_rpc_RpcQueueTimeAvgTime | RPC队列平均耗时 | ||
nodemanager_ugi_UgiMetrics_LoginFailureNumOps | kerberos失败登陆总数 | ||
nodemanager_ugi_UgiMetrics_LoginSuccessNumOps | kerberos成功登陆总数 | ||
nodemanager_ugi_UgiMetrics_LoginFailureAvgTime | kerberos失败登陆平均时长 | ||
nodemanager_ugi_UgiMetrics_LoginSuccessAvgTime | kerberos成功登陆平均时长 |
文档反馈
以上内容对您是否有帮助?