数据字典 - INFO-中台常用统计sql - 《EasyData数据开发治理平台FAQ》

INFO-中台常用统计sql
更新时间: 2025-07-07 18:29:54 | 阅读 5936
扫码
复制
导出
中台常用统计类sql
中台常用统计类sql

适用模块

中台子产品
具体说明

平台相关业务sql 查询汇总
使用示例

1.dqc任务数：（cluster_id代表集群名字）    
库：easydqc
select cluster_id, count(*) num from dq_task group by cluster_id;
2.原子、派生、符合指标数：（0:原子指标,1:派生指标,2:复合指标',）    
库：easyindex    
select type,count(*) num from ei_index where type in (0,1,2) group by type;
3.历史总计线上调度数：（online代表提交上线的任务）    
库：mammut
select type,count(*) num from pf_az_config group by type;
4.近两周用户自助分析查询量：    
库：mammut
select count(*) num from pf_noteblock_exec where updated_at >date_sub(curdate(),interval 14 day);
5.已发布api的数：（1代表发布）
库：easydataservice
select api_status,count(*) num from api where api_status=1;
6.所有主题域数量：（根据集群划分）
库：easydesign
所有主题域数量：
select cluster_id,count(*) num from ed_subject group by cluster_id;
7.一级主题域数量：
库：easydesign
select cluster_id,count(*) num from ed_subject where level =1 group by cluster_id;
8.每个分层下表数量
库：easydesign 
SELECT m.name,m.abbr,n.index_name,sum(n.index_value) FROM ed_warehouse_layer m JOIN ed_self_index n ON m.pg_id = n.pg_id WHERE (CONCAT(m.abbr,'_subjectSetTableCnt') =n.index_name or CONCAT(m.abbr,'_subjectUnsetTableCnt') =n.index_name) AND date_id =DATE_FORMAT(curdate(),'%Y-%m-%d %H:%i:%s') GROUP BY n.index_name;   
9.每个主题域下维度数量    
库：easydesign 
注：easydesign1.2.6之前可以这样统计，1.2.6后由于字段变化，需要待拿到环境后补充
SELECT m.id,m.name,n.subject_id,n.sub_subject_id,count(*) FROM ed_subject m JOIN ed_dimension n ON m.id=n.sub_subject_id WHERE (n.sub_subject_id is not null) and  m.level =2 GROUP BY m.id union all SELECT m.id,m.name,n.subject_id,n.sub_subject_id,count(*) FROM ed_subject m JOIN ed_dimension n ON m.id=n.subject_id WHERE (n.sub_subject_id is  null) and  m.level=1 GROUP BY m.id;
10.每个主题域下度量数量    
库：easydesign    
逻辑：度量可以挂在主题域或二级主题下面 
SELECT m.id,m.name,n.subject_id,count(*) FROM ed_subject m JOIN ed_metric n ON m.id=n.subject_id WHERE m.level =1 or m.level =2 GROUP BY m.id;
11.平台活跃用户数(近14天登录过)    
库：mammut
select email from pf_audit_trail where eventtype=3 and  updated_at>=date_sub(curdate(),interval 14 day) group by email;
12.总调度任务数（sched_status枚举：开启调度：1；未调度：0）
库：easytaskops
select count(*) from eto_az_flow where cluster = 'dev4' and product='intern_a' and sched_status = 1;
13.基线任务数（sched_status枚举：开启调度：1；未调度：0）
库：easytaskops
select count(*) from eto_az_flow where cluster = 'dev4' and product='intern_a' and sched_status = 1 and baseline_id > 0;
14.当天执行的实例数（不包括未到时间点未生成的实例）
exec_type枚举：
• 调度实例：SCHEDULE；
• 调度重跑实例：RERUN；
• 补数据重跑实例：LINKED_BACKFILL_RERUN；
• 补数据实例：LINKED_BACKFILL_EXECUTE
sched_time：调度时间，毫秒
库：easytaskops
select count(*) from eto_az_exec_flow where cluster = 'dev4' and product='intern_a' and exec_type='SCHEDULE' and sched_time>=1645718400000 and sched_time < 1645804800000
15.当天失败实例数
exec_status枚举：
• 就绪中：PREPARING；
• 运行中：RUNNING；
• 暂停：PAUSED；
• 成功：SUCCEEDED；
• 失败：FAILED；
• 终止：KILLED
• 终止中：KILLING
• 运行中异常：FAILED_FINISHING
库：easytaskops
select count(*) from eto_az_exec_flow where cluster = 'dev4' and product='intern_a' and exec_type='SCHEDULE' and sched_time>=1645718400000 and sched_time < 1645804800000 and exec_status in('FAILED','KILLED','FAILED_FINISHING')
16、统计自主分析某类数据源的使用情况 以clickhouse为例
select count(*) from mammut.pf_noteblock_exec where engine='clickhouse' and updated_at > date_sub(curdate(),interval 14 day);
17、统计线上调度任务各种类型节点数
库：easytaskops
数据同步:ndi
sql:sql
VerticaSQL:vertica_sql
MySQL:mysql
OracleSQL:oracle_sql
GPSQL:gp_sql
Cube:scriptCube
MR:hadoopJava
Spark:spark
Script:script
AI2.0:ai_innerflow
AI:ai
kafka入库:hadoopJavaKafkaImport
kafka归档:hadoopJavaKafkaArchive
数据库传输:hadoopJavaSqoop
选择节点:decision
嵌套流:innerflow
虚拟节点:noop
select job_type,count(*) num from eto_az_job job join eto_az_flow flow on flow.id = job.eto_flow_id where flow.sched_status = 1 group by job_type;

调度任务个数
select count(*) from eto_az_flow where sched_status = 1;


单天实例数、单天job数、单天vertica job数
select count(*) from eto_az_exec_flow where submit_time > unix_timestamp('2022-06-16') * 1000 and  end_time< unix_timestamp('2022-06-17') * 1000;


select count(*) from eto_az_exec_job where exec_type ='SCHEDULE' and start_time > unix_timestamp('2022-06-16') * 1000 and  end_time< unix_timestamp('2022-06-17') * 1000 ;


select job_type, count(*) from eto_az_exec_job where exec_type ='SCHEDULE' and start_time > unix_timestamp('2022-06-16') * 1000 and job_type = 'vertica_sql' and end_time< unix_timestamp('2022-06-17') * 1000 group by job_type;


月初job数、月初vertica job数

select count(*) from eto_az_exec_job where exec_type ='SCHEDULE' and start_time > unix_timestamp('2022-06-01') * 1000 and  end_time< unix_timestamp('2022-06-02') * 1000 ;


select job_type, count(*) from eto_az_exec_job where exec_type ='SCHEDULE' and start_time > unix_timestamp('2022-06-01') * 1000 and job_type = 'vertica_sql' and  end_time< unix_timestamp('2022-06-02') * 1000 group by job_type;



敏感类型数量（不包括内置敏感类型）
select count(*) from access_sensitive_type where product != 'all';

识别任务数
select count(*) from access_identify_task

识别出的敏感字段数
select count(*) from access_identify_result

脱敏规则数
select count(*) from access_resource_masking where object <> "public"

配置了脱敏的传输任务数量
SELECT COUNT(DISTINCT task_id) FROM datamask_task

脱敏生效字段数
select count(*) from (select * from access_resource_masking where object <> "public" group by resource) t1

实例报警查询：
rule_type枚举，1：任务失败；2：任务运行超时；3：任务完成超时；11：节点失败；12：节点超时；21：dqc校验不通过；22：dqc执行异常；
inst.create_time表示什么时候发的报警


节点、dqc报警查询 
SELECT inst.cluster_id as '集群id', inst.product as '项目', exec_flow.exec_id as '实例id', exec_flow.flow_name as '任务名', exec_flow.sched_time as '计划执行时间', exec_flow.start_time as '开始时间', exec_flow.end_time as '结束时间', exec_flow.exec_status as '实例状态', CASE inst.rule_type WHEN 1 THEN '实例失败' WHEN 2 THEN '实例超时' END AS '规则类型', CASE inst.rule_type WHEN 1 THEN null WHEN 2 THEN rule.timeout END as '规则超时时间', send.alert_id as '报警id', send.alert_time as '报警时间' FROM alert_rule_instance inst JOIN eto_az_exec_flow exec_flow ON exec_flow.cluster = inst.cluster_id AND exec_flow.exec_id = inst.biz_instance_id JOIN alert_rule_alarm alarm ON alarm.rule_instance_id = inst.id JOIN alert_rule_alarm_send send ON send.alarm_id = alarm.id JOIN alert_rule rule ON rule.id = inst.rule_id WHERE inst.rule_type IN (1, 2) AND inst.create_time >= '2022-05-23' AND inst.create_time < '2022-05-24' and exec_job.job_id = rule.object_name;

报警规则查询：
数据库表
规则表：alert_rule
rule_type，1：任务失败；2：任务运行超时；3：任务完成超时；11：节点失败；12：节点超时；21：dqc校验不通过；22：dqc执行异常；
object_type：1-flow;2-job;3-数据质量报警,4:基线
alert_channel：二进制，1表示开启，0表示未开启。比如11110表示“开启stone|开启popo|开启邮件|开启短信|关闭电话”
内部邮件|内部短信|内部通信|飞书|企业微信|钉钉|定制微信|易信|stone|popo|邮件|短信|电话
规则接收人关联表：alert_rule_user
rule_id即alert_rule表的主键。
规则接收报警组关联表：alert_rule_user_group
rule_id即alert_rule表的主键。
group_type：0：报警组，1：值班组
azkaban任务表alert_azkaban、eto_az_flow
任务元数据表。查询sql如下
SELECT 
    f.flow_name as 任务名, rule.rule_type as 报警类型, rule.object_name as 报警对象, rule.object_type as 报警对象类型,rule.recycle as 循环报警,rule.recycle_period,rule.recycle_times,rule.timeout as 超时时间,send_to_owner as '是否发给负责人',BIN(alert_channel) as 报警通道
FROM
    eto.alert_rule rule
        JOIN
    alert_azkaban az ON az.id = rule.biz_id
        JOIN
    eto_az_flow f ON f.cluster = az.cluster_id
        AND f.project_id = az.project_id
        AND f.flow_id = az.flow_name
WHERE
    rule.biz_type = 1 and rule.is_delete = 0

项目调度任务数查询：
select *  from eto_az_flow where product = xxx and sched_status = 1

实例列表查询：
实例表：eto_az_exec_flow
常用字段：
字段描述cluster集群product项目名exec_id实例idexec_status实例状态：'FAILED','FAILED_FINISHING','KILLED','KILLING','RUNNING','PREPARING','SUCCEEDED','PAUSED'flow_name任务名sched_time计划执行时间submit_time就绪时间，大于0start_time开始时间，未开始为-1end_time结束时间，未结束为-1exec_type周期调度：SCHEDULE；周期实例重跑：RERUN补数据：LINKED_BACKFILL_EXECUTE补数据重跑：LINKED_BACKFILL_RERUNbaseline_id基线owner负责人submitter提交人queue队列名

实例节点表：eto_az_exec_job
常用字段：
字段描述cluster集群product项目名exec_id实例idstatus节点状态：'FAILED','KILLED','RUNNING','READY','QUEUED','SUCCEEDED','CANCELLED'job_id节点名attempt当前重试次数max_attempt最大重试次数start_time开始时间，未开始为-1end_time结束时间，未结束为-1
查询当天实例节点
SELECT 
    flow.flow_name,
    flow.owner,
    job.id,
    job.start_time,
    job.end_time,
    job.status
FROM
    eto_az_exec_flow flow
        JOIN
    eto_az_exec_job job
WHERE
    job.cluster = flow.cluster
        AND job.exec_id = flow.exec_id
        AND flow.submit_time >= 1662048000000
        AND flow.submit_time < 1662134400000
LIMIT 10;
查询SLA提交实例数
queue_priority，1表示SLA、0表示非SLA。
SELECT 
    f.product, s.queue, s.queue_priority, f.flow_name, count(*)
FROM
    eto_az_exec_flow f
        JOIN
    eto_az_exec_job job ON job.cluster = f.cluster
        AND job.exec_id = f.exec_id
        JOIN
    eto_az_exec_flow_runtime_setup s ON s.eto_exec_flow_id = f.id
WHERE
    f.`cluster` = 'dev4'
        AND f.sched_time >= UNIX_TIMESTAMP('2022-11-01 00:00:00') * 1000
        AND f.sched_time < UNIX_TIMESTAMP('2022-11-02 00:00:00') * 1000
        AND f.exec_type = 'SCHEDULE'
        AND job.job_type = 'ndi'
        AND queue_priority = 1
    GROUP BY f.product , s.queue , s.queue_priority, f.flow_name;


调度任务失败设置查询
select failure_action from eto_az_sched_runtime_setup rs join eto_az_schedule s on s.id = rs.eto_sched_id where s.flow_id = 'xx'

dqc任务设置查询：
select job_id ,JSON_EXTRACT(detail.detail,'$."azkaban.post.action.dataquality.1.failedInterrupted"') from eto_az_job_detail detail join eto_az_flow flow on detail.eto_flow_id = flow.id and detail.version = flow.version wher
e flow_id =

大盘统计查询：
调度实例运行耗时统计
    SELECT 
    *
FROM
    eto_az_exec_flow eaef
WHERE
    eaef.`cluster` = 'dev4'
        AND eaef.`product` = 'intern'
        AND eaef.`start_time` BETWEEN 1665244800000 AND 1665331200000
        AND eaef.`exec_type` IN ('SCHEDULE')
        AND eaef.`re_run_flag` = 0
ORDER BY CASE
    WHEN eaef.end_time = - 1 THEN ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - eaef.start_time
    ELSE eaef.end_time - eaef.start_time
END DESC
LIMIT 20;
调度实例延迟统计
SELECT 
    eaef.*
FROM
    eto_az_exec_flow eaef
WHERE
    eaef.`cluster` = 'dev4'
        AND eaef.`product` = 'intern'
        AND eaef.`sched_time` BETWEEN 1665244800000 AND 1665331200000
        AND eaef.`exec_type` IN ('SCHEDULE')
        AND eaef.`re_run_flag` = 0
ORDER BY CASE
    WHEN eaef.start_time = - 1 THEN ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - eaef.sched_time
    ELSE eaef.start_time - eaef.sched_time
END DESC
LIMIT 20

运维值班表查询：
select * from duty_group_member

报警发送用户记录表:
select * from alert_rule_alarm_user

任务依赖查询：
跨流依赖表：eto_az_schedule_dep
字段描述cluster集群source_project_id下游projectIdsource_flow_id下游任务flowIdsource_job_id下游节点id，如果是FLOW_DEP_JOB类型、FLOW_SELF类型时改值为空。target_project_id上游projectIdtarget_flow_id上游任务flowIdtarget_job_id上游节点id，如果是FLOW_SELF类型时改值为空。dep_type依赖类型。FLOW_DEP_JOB：任务依赖；JOB_DEP_JOB：节点依赖；FLOW_SELF：自依赖eto_sched_ideto_az_schedule主键
SELECT DISTINCT
    flow.flow_name
FROM
    eto_az_schedule_dep dep
        JOIN
    eto_az_flow flow ON flow.cluster = dep.cluster and flow.project_id = dep.source_project_id and flow.flow_id = dep.source_flow_id
WHERE
    target_flow_id = '任务下游名'
同周期查询：
select * from eto_az_schedule

节点选择数据源查询：（detail字段里的datasource字段 ）
select detail from eto_az_job_detail d join eto_az_flow flow on flow.id = d.eto_flow_id and flow.version = d.version

查询某个节点所在的项目
select * from easytaskops.eto_az_job_detail join eto_az_flow on eto_az_job_detail d join eto_az_flow flow on flow.id = d.eto_flow_id and flow.version = d.version 

响应状态查询：
select response_status from alert_rule_instance

质量监控任务查询：
select * from dq_task 监控任务信息
select * from dq_rule_custom_sql a join dq_task b on a.task_id=b.task_id 自定义SQL规则信息
select * from dq_rule_instant_field a join dq_task b on a.task_id=b.task_id 自定义字段规则信息
select * from dq_rule_template_field a join dq_task b on a.task_id=b.task_id 模板规则信息

数据服务api列表查询：
select * from api

实时任务状态查询：
 *     READY,  任务就绪，未运行过
 *     STARTING,  启动中
 *     RUNNING,  运行中
 *     STOPPING,  停止中
 *     STOPPED,   任务已经停止
 *     FAILED,    任务运行失败
 *     START_FAILED, 任务启动失败
 *     FINISHED,    任务运行结束
select job_status from rts_ops.job_info
作者：林帅
以上内容对您是否有帮助？

鼠标选中文档，精准反馈问题
选中存在疑惑的内容，即可快速反馈问题，我们会跟进处理
不再提示我知道了
文档反馈