FAQ-Spark sql 任务出现空指针异常
更新时间: 2024-03-11 02:51:48
阅读 1924
Task creation failed: java.lang.NullPointerException
问题描述/异常栈
spark sql 任务运行失败,报错如下:
CST DAGScheduler INFO - ShuffleMapStage 21 (sql at AzkabanSparkSQLDriver.java:67) failed in Unknown s due to Job aborted due to stage failure: Task creation failed: java.lang.NullPointerException
java.lang.NullPointerException
at scala.collection.immutable.StringLike$class.stripPrefix(StringLike.scala:155)
at scala.collection.immutable.StringOps.stripPrefix(StringOps.scala:29)
at org.apache.spark.scheduler.TaskLocation$.apply(TaskLocation.scala:71)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$1.apply(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$1.apply(DAGScheduler.scala:1769)
at scala.collection.immutable.List.map(List.scala:277)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1778)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1777)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1777)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1777)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1775)
at scala.collection.immutable.List.foreach(List.scala:381)
解决方案
在sql 任务开头加入如下参数:
set spark.sql.hive.convertMetastoreParquet=true
问题原因
该参数会调整为使用spark 的api 去读取parquet 文件
作者:稚远
文档反馈
以上内容对您是否有帮助?