Hive常见故障排查
更新时间: 2021-01-20 17:18:57
阅读 5020
平台上对hive故障的排除方法,以及表修复
HiveServer2服务崩溃
如果HS2服务频繁崩溃,请通过检查HS2实例来确认问题是否与HS2堆耗尽有关 标准输出 日志。
我们可以通过易数运维平台来观察错误,并配置有报警功能。如下图状态所示

并能通过观察历史的线程数,内存状态,gc等进行问题的排查。


还有一些监控信息如gc,buffers 等
使用MSCK
MSCK REPAIR TABLE命令
MSCK REPAIR TABLE命令主要是用来解决通过hdfs dfs -put或者hdfs api写入hive分区表的数据在hive中无法被查询到的问题。
我们知道hive有个服务叫metastore,这个服务主要是存储一些元数据信息,比如数据库名,表名或者表的分区等等信息。如果不是通过hive的insert等插入语句,很多分区信息在metastore中是没有的,如果插入分区数据量很多的话,你用 ALTER TABLE table_name ADD PARTITION 一个个分区添加十分麻烦。这时候MSCK REPAIR TABLE就派上用场了。只需要运行MSCK REPAIR TABLE命令,hive就会去检测这个表在hdfs上的文件,把没有写入metastore的分区信息写入metastore。
例子
我们先创建一个分区表,然后往其中的一个分区插入一条数据,在查看分区信息
CREATE TABLE repair_test (col_a STRING) PARTITIONED BY (par STRING);INSERT INTO TABLE repair_test PARTITION(par="partition_1") VALUES ("test");SHOW PARTITIONS repair_test;
查看分区信息的结果如下
jdbc:hive2://localhost:10000> show partitions repair_test;INFO : Compiling command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21): show partitions repair_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)INFO : Completed compiling command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21); Time taken: 0.029 secondsINFO : Executing command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21): show partitions repair_testINFO : Starting task [Stage-0:DDL] in serial modeINFO : Completed executing command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21); Time taken: 0.017 secondsINFO : OK+------------------+--+| partition |+------------------+--+| par=partition_1 |+------------------+--+1 row selected (0.073 seconds)0: jdbc:hive2://localhost:10000>
然后我们通过hdfs的put命令手动创建一个数据
[ericsson@h3cnamenode1 pcc]$ echo "123123" > test.txt[ericsson@h3cnamenode1 pcc]$ hdfs dfs -mkdir -p /user/hive/warehouse/test.db/repair_test/par=partition_2/[ericsson@h3cnamenode1 pcc]$ hdfs dfs -put -f test.txt /user/hive/warehouse/test.db/repair_test/par=partition_2/[ericsson@h3cnamenode1 pcc]$ hdfs dfs -ls -R /user/hive/warehouse/test.db/repair_testdrwxrwxrwt - ericsson hive 0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1drwxrwxrwt - ericsson hive 0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/.hive-staging_hive_2018-08-10_17-45-59_029_1594310228554990949-1drwxrwxrwt - ericsson hive 0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/.hive-staging_hive_2018-08-10_17-45-59_029_1594310228554990949-1/-ext-10000-rwxrwxrwt 3 ericsson hive 5 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/000000_0drwxr-xr-x - ericsson hive 0 2018-08-10 17:57 /user/hive/warehouse/test.db/repair_test/par=partition_2-rw-r--r-- 3 ericsson hive 7 2018-08-10 17:57 /user/hive/warehouse/test.db/repair_test/par=partition_2/test.txt[ericsson@h3cnamenode1 pcc]$
这时候我们查询分区信息,发现partition_2这个分区并没有加入到hive中
0: jdbc:hive2://localhost:10000> show partitions repair_test;INFO : Compiling command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79): show partitions repair_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)INFO : Completed compiling command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79); Time taken: 0.029 secondsINFO : Executing command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79): show partitions repair_testINFO : Starting task [Stage-0:DDL] in serial modeINFO : Completed executing command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79); Time taken: 0.02 secondsINFO : OK+------------------+--+| partition |+------------------+--+| par=partition_1 |+------------------+--+1 row selected (0.079 seconds)0: jdbc:hive2://localhost:10000>
运行MSCK REPAIR TABLE 命令后再查询分区信息,可以看到通过put命令放入的分区已经可以查询了
0: jdbc:hive2://localhost:10000> MSCK REPAIR TABLE repair_test;INFO : Compiling command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f): MSCK REPAIR TABLE repair_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)INFO : Completed compiling command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f); Time taken: 0.004 secondsINFO : Executing command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f): MSCK REPAIR TABLE repair_testINFO : Starting task [Stage-0:DDL] in serial modeINFO : Completed executing command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f); Time taken: 0.138 secondsINFO : OKNo rows affected (0.154 seconds)0: jdbc:hive2://localhost:10000> show partitions repair_test;INFO : Compiling command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25): show partitions repair_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)INFO : Completed compiling command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25); Time taken: 0.045 secondsINFO : Executing command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25): show partitions repair_testINFO : Starting task [Stage-0:DDL] in serial modeINFO : Completed executing command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25); Time taken: 0.016 secondsINFO : OK+------------------+--+| partition |+------------------+--+| par=partition_1 || par=partition_2 |+------------------+--+2 rows selected (0.088 seconds)0: jdbc:hive2://localhost:10000> select * from repair_test;INFO : Compiling command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38): select * from repair_testINFO : Semantic Analysis CompletedINFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:repair_test.col_a, type:string, comment:null), FieldSchema(name:repair_test.par, type:string, comment:null)], properties:null)INFO : Completed compiling command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38); Time taken: 0.059 secondsINFO : Executing command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38): select * from repair_testINFO : Completed executing command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38); Time taken: 0.001 secondsINFO : OK+--------------------+------------------+--+| repair_test.col_a | repair_test.par |+--------------------+------------------+--+| test | partition_1 || 123123 | partition_2 |+--------------------+------------------+--+2 rows selected (0.121 seconds)0: jdbc:hive2://localhost:10000>
文档反馈
以上内容对您是否有帮助?