FAQ - 传输任务源端大小写读取异常

问题描述/异常栈

2023-02-27 16:08:11 CST Diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `invitecommentsmethodstr`, `invitecommentsmethod`;

    at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:85)

    at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:67)

    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:431)

发现版本

LTS 650 update2.1,传输3.10

解决方案

设置参数 ndi.spark.spark-conf.spark.sql.caseSensitive=true

FAQ - 传输任务源端大小写读取异常 - 图1 FAQ - 传输任务源端大小写读取异常 - 图2

问题原因

ES默认区分大小写,spark sql默认不区分大小写

作者:林帅