Spark Jar应用开发

maven工程中导入依赖,版本供参考

<properties>
    <scala.version>2.11.8</scala.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <hadoop.version>2.9.2</hadoop.version>
    <spark.version>2.3.2</spark.version>
    <hive.version>2.1.1</hive.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
        <!--<scope>provided</scope>-->
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>${spark.version}</version>
        <!--<scope>provided</scope>-->
    </dependency>
</dependencies>

新建Spark作业:demo.SparkDemo

package demo

import org.apache.spark.sql.SparkSession

object SparkDemo {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("SparkDemo")
      .enableHiveSupport()
      .getOrCreate()

    val table = "demo.ods_acct_acc_transaction"
    val df = spark.sql(s"select count(*) from $table")

    df.show()

    spark.stop()
  }
}

打包

image-20201206215620003

打好jar包后,将jar包上传至离线开发任务流中

image-20201206215844393

image-20201206220210181

上传好之后,既可以在资源信息中查看到该jar包

image-20201206220323649

在该任务流中,新建一个Spark节点

image-20201206220840200

编辑节点配置,协商执行类以及执行类所在的jar包名,其他参数也可以在这里选填,并保存,既可以运行了

image-20201213212759023

这样自行编写的Spark任务,就可以运行了。

df.show() ,执行模式选择 client模式, 日志中直接打印了count后的结果

image-20201213212452411