Task creation failed: java.lang.NullPointerException

问题描述/异常栈

spark sql 任务运行失败,报错如下:
CST DAGScheduler INFO - ShuffleMapStage 21 (sql at AzkabanSparkSQLDriver.java:67) failed in Unknown s due to Job aborted due to stage failure: Task creation failed: java.lang.NullPointerException
java.lang.NullPointerException
    at scala.collection.immutable.StringLike$class.stripPrefix(StringLike.scala:155)
    at scala.collection.immutable.StringOps.stripPrefix(StringOps.scala:29)
    at org.apache.spark.scheduler.TaskLocation$.apply(TaskLocation.scala:71)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$1.apply(DAGScheduler.scala:1769)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$1.apply(DAGScheduler.scala:1769)
    at scala.collection.immutable.List.map(List.scala:277)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1769)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1778)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1777)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$1.apply(DAGScheduler.scala:1777)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1777)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1775)
    at scala.collection.immutable.List.foreach(List.scala:381)

解决方案

sql 任务开头加入如下参数:
set spark.sql.hive.convertMetastoreParquet=true

问题原因

该参数会调整为使用spark api 去读取parquet 文件

作者:稚远