DEMO-有数中台接入DataX

适用模块

中台

具体说明

有数中台接入DataX

使用示例

mysql8.0.18为例

# 一.安装maven
下载maven
https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.8.1/binaries/apache-maven-3.8.1-bin.tar.gz

解压maven 
修改./conf/setting.xml
修改阿里源
```xml
<mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>central</mirrorOf>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>


修改localRepository  找一个磁盘大的路径
```xml
<localRepository>/mnt/dfs/0/apache-maven-3.8.1/repo</localRepository>


# 二.安装jdk8 
配置环境变量等。。。。

# 三.下载datax并修改源码打包,并配置datax环境变量(在azkaban上都要部署)
https://github.com/alibaba/DataX
解压 datax源码包。

下载的包暂不支持mysql8.0.18,需要编译datax
## (1)修改DataX-master/pom.xml
```xml
将<mysql.driver.version>5.1.47</mysql.driver.version>修改成自己需要的版本<mysql.driver.version>8.0.18</mysql.driver.version>

## (2)修改
```BASH
vim DataX-master/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java
然后命令模式下全文替换
:%s/convertToNull/CONVERT_TO_NULL/

vi DataX-master/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java
然后命令模式下全文替换
:%s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/
vi DataX-master /adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/AdsHelper.java
然后命令模式下全文替换
:%s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/


## (3)在DataX-master目录下执行
`mvn -U clean package assembly:assembly -Dmaven.test.skip=true`

编译过程中,神舟数据库writer异常

![](/documents/uploads/projects/service_support/1677c2b91f8dd3e5.png)

下载oscarwriterjdbc包,并修改该模块pom,放入该模块路径下

![](/documents/uploads/projects/service_support/1677c2bd49446e6f.png)

重新编译成功如下图所示

![](/documents/uploads/projects/service_support/1677c2bf8d924e02.png)

包在DataX-master/target/datax/datax/

# 四.在猛犸上测试
1)创建hive 以逗号为分隔符
```sql
CREATE TABLE `doi.stdatax`(
  `sno` string COMMENT '',
  `sname` string COMMENT '',
  `ssex` string COMMENT '',
  `sbirthday` timestamp COMMENT '',
  `sclass` string COMMENT '')
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';


(2)mysql8 学生数据,如下图

Demo - 有数中台接入DataX - 图1

3)配置猛犸调度

Demo - 有数中台接入DataX - 图2 Demo - 有数中台接入DataX - 图3 Demo - 有数中台接入DataX - 图4 Demo - 有数中台接入DataX - 图5 Demo - 有数中台接入DataX - 图6 Demo - 有数中台接入DataX - 图7

### dx.sh 内容
```shell
python /mnt/dfs/0/datax/bin/datax.py ./dx.json

### dx.json 内容
```json
{
    "job":{
        "setting":{
            "speed":{
                "channel":3
            },
            "errorLimit":{
                "record":0,
                "percentage":0.02
            }
        },
        "content":[
            {
                "reader":{
                    "name":"mysqlreader",
                    "parameter":{
                        "username":"datax",
                        "password":"Ab@123456",
                        "column":[
                            "sno",
                            "sname",
                            "ssex",
                            "sbirthday",
                            "sclass"
                        ],
                        "connection":[
                            {
                                "table":[
                                    "student1"
                                ],
                                "jdbcUrl":[
                                    "jdbc:mysql://59.111.211.47:3306/datax?characterEncoding=utf8"
                                ]
                            }
                        ]
                    }
                },
                "writer":{
                    "name":"hdfswriter",
                    "parameter":{
                        "column":[
                            {
                                "name":"sno",
                                "type":"string"
                            },
                            {
                                "name":"sname",
                                "type":"string"
                            },
                            {
                                "name":"ssex",
                                "type":"string"
                            },
                            {
                                "name":"sbirthday",
                                "type":"timestamp"
                            },
                            {
                                "name":"sclass",
                                "type":"string"
                            }
                        ],
                        "defaultFS":"hdfs://easyops-cluster:8020",
                        "hadoopConfig":{
                            "dfs.nameservices":"easyops-cluster",
                            "dfs.ha.namenodes.easyops-cluster":"nn1,nn2",
                            "dfs.namenode.rpc-address.easyops-cluster.nn1":"bigdata-demo1.jdlt.163.org:8020",
                            "dfs.namenode.rpc-address.easyops-cluster.nn2":"bigdata-demo2.jdlt.163.org:8020",
                            "dfs.client.failover.proxy.provider.easyops-cluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
                        },
                        "haveKerberos":"true",
                        "kerberosKeytabFilePath":"/mnt/dfs/0/datax/qianzhaoyuan.keytab",
                        "kerberosPrincipal":"bdms_qianzhaoyuan/dev@BDMS.163.COM",
                        "encoding":"UTF-8",
                        "fileType":"text",
                        "fileName":"test1",
                        "path":"/user/hangkong/hive_db/doi.db/stdatax",
                        "writeMode":"append",
                        "fieldDelimiter":","
                    }
                }
            }
        ]
    }
}



说明:datax相关配置说明请参考官网
https://github.com/alibaba/DataX

作者:qianzhaoyuan