Demo - 有数中台接入DataX
更新时间: 2024-03-11 02:52:36
阅读 1787
DEMO-有数中台接入DataX
适用模块
中台
具体说明
有数中台接入DataX
使用示例
以mysql8.0.18为例
# 一.安装maven
下载maven
https://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.8.1/binaries/apache-maven-3.8.1-bin.tar.gz
解压maven
修改./conf/setting.xml
修改阿里源
```xml
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
修改localRepository 找一个磁盘大的路径
```xml
<localRepository>/mnt/dfs/0/apache-maven-3.8.1/repo</localRepository>
# 二.安装jdk8
配置环境变量等。。。。
# 三.下载datax并修改源码打包,并配置datax环境变量(在azkaban上都要部署)
https://github.com/alibaba/DataX
解压 datax源码包。
下载的包暂不支持mysql8.0.18,需要编译datax。
## (1)修改DataX-master/pom.xml
```xml
将<mysql.driver.version>5.1.47</mysql.driver.version>修改成自己需要的版本<mysql.driver.version>8.0.18</mysql.driver.version>
## (2)修改
```BASH
vim DataX-master/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java
然后命令模式下全文替换
:%s/convertToNull/CONVERT_TO_NULL/
vi DataX-master/plugin-rdbms-util/src/main/java/com/alibaba/datax/plugin/rdbms/util/DataBaseType.java
然后命令模式下全文替换
:%s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/
vi DataX-master /adswriter/src/main/java/com/alibaba/datax/plugin/writer/adswriter/load/AdsHelper.java
然后命令模式下全文替换
:%s/com.mysql.jdbc.Driver/com.mysql.cj.jdbc.Driver/
## (3)在DataX-master目录下执行
`mvn -U clean package assembly:assembly -Dmaven.test.skip=true`
编译过程中,神舟数据库writer异常
![](/documents/uploads/projects/service_support/1677c2b91f8dd3e5.png)
下载oscarwriterjdbc包,并修改该模块pom,放入该模块路径下
![](/documents/uploads/projects/service_support/1677c2bd49446e6f.png)
重新编译成功如下图所示
![](/documents/uploads/projects/service_support/1677c2bf8d924e02.png)
包在DataX-master/target/datax/datax/
# 四.在猛犸上测试
(1)创建hive表 以逗号为分隔符
```sql
CREATE TABLE `doi.stdatax`(
`sno` string COMMENT '',
`sname` string COMMENT '',
`ssex` string COMMENT '',
`sbirthday` timestamp COMMENT '',
`sclass` string COMMENT '')
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
(2)mysql8 学生数据,如下图
(3)配置猛犸调度
### dx.sh 内容
```shell
python /mnt/dfs/0/datax/bin/datax.py ./dx.json
### dx.json 内容
```json
{
"job":{
"setting":{
"speed":{
"channel":3
},
"errorLimit":{
"record":0,
"percentage":0.02
}
},
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"username":"datax",
"password":"Ab@123456",
"column":[
"sno",
"sname",
"ssex",
"sbirthday",
"sclass"
],
"connection":[
{
"table":[
"student1"
],
"jdbcUrl":[
"jdbc:mysql://59.111.211.47:3306/datax?characterEncoding=utf8"
]
}
]
}
},
"writer":{
"name":"hdfswriter",
"parameter":{
"column":[
{
"name":"sno",
"type":"string"
},
{
"name":"sname",
"type":"string"
},
{
"name":"ssex",
"type":"string"
},
{
"name":"sbirthday",
"type":"timestamp"
},
{
"name":"sclass",
"type":"string"
}
],
"defaultFS":"hdfs://easyops-cluster:8020",
"hadoopConfig":{
"dfs.nameservices":"easyops-cluster",
"dfs.ha.namenodes.easyops-cluster":"nn1,nn2",
"dfs.namenode.rpc-address.easyops-cluster.nn1":"bigdata-demo1.jdlt.163.org:8020",
"dfs.namenode.rpc-address.easyops-cluster.nn2":"bigdata-demo2.jdlt.163.org:8020",
"dfs.client.failover.proxy.provider.easyops-cluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
},
"haveKerberos":"true",
"kerberosKeytabFilePath":"/mnt/dfs/0/datax/qianzhaoyuan.keytab",
"kerberosPrincipal":"bdms_qianzhaoyuan/dev@BDMS.163.COM",
"encoding":"UTF-8",
"fileType":"text",
"fileName":"test1",
"path":"/user/hangkong/hive_db/doi.db/stdatax",
"writeMode":"append",
"fieldDelimiter":","
}
}
}
]
}
}
说明:datax相关配置说明请参考官网
https://github.com/alibaba/DataX
作者:qianzhaoyuan
文档反馈
以上内容对您是否有帮助?