INFO-跨集群distcp
更新时间: 2024-03-11 02:47:50
阅读 1296
INFO-跨集群distcp
适用模块
离线开发、shell具体说明
跨集群hdfs2hdfs文件传输使用示例
一、两端均需kerberos认证
1、环境准备:
a、执行节点已部署源端hadoop客户端
b、如双方存在kerberos认证需完成跨域认证,参照:https://study.sf.163.com/documents/read/service_support/date20221215101116.md
c、确保去向端执行节点可正常连接到源端hdfs
2、shell 示例
#!/bin/bash
#获取Principal信息,随任务提交的主机名变化
Principal=`klist -kt /etc/security/keytabs/hdfs/hdfs.keytab | awk 'NR==4 {print $NF}'`
#hdfs认证,仅hdfs用户有权限执行hdfs haadmin -getAllServiceState
kinit -kt /etc/security/keytabs/hdfs/hdfs.keytab ${Principal}
#初始化源端环境获取nn active节点
source /usr/easyops/hdfs/dev_hdfs_client/config/hadoop-env.sh
source_nn=`/usr/easyops/hdfs/dev_hdfs_client/current/bin/hdfs haadmin -getAllServiceState | awk '/active/{print $1}' | cut -d: -f1`
#初始化去向端环境获取nn active节点
source /usr/easyops/hdfs/default_hdfs_client/config/hadoop-env.sh
target_nn=`/usr/easyops/hdfs/default_hdfs_client/current/bin/hdfs haadmin -getAllServiceState | awk '/active/{print $1}' | cut -d: -f1`
#打印不同集群nn active主机信息
echo "源端active节点主机名为${source_nn}"
echo "去向端active节点主机名为=${target_nn}"
#跨集群复制文件(覆盖)
/usr/easyops/yarn/default_yarn_client/current/bin/hadoop --config /usr/easyops/yarn/default_yarn_client/config/ jar /usr/easyops/yarn/package_shared/hadoop-2.9.2-1.4.0/share/hadoop/tools/lib/hadoop-distcp-2.9.2.jar -Dmapreduce.task.timeout=1200000 -overwrite -bandwidth 20 -m 30 hdfs://${source_nn}:8020/user/dsc_support/easysubmit_102.db hdfs://${target_nn}:8020/user/mammut_user/distcp
二、单套kerberos认证
一套kerberos认证的集群,
一套无kerberos认证的集群。
先把/etc/krb5.conf 发送到没有认证的集群namenode服务器的/etc下。
以下是在有认证的环境操作
1.先登录到hdfs用户,在有认证的kerberos的namenode上
su - hdfs
2.予环境变量
source /usr/ndp/bdms/hdfs/20200929091617654cac872f/namenode/202009290921296616412075/current/conf/hadoop-env.sh
3.将yarn-site.xml 拷贝到 hdfs部署目录的conf下面
4.keberos 认证
kinit -kt /usr/ndp/bdms/hdfs/20200929091617654cac872f/namenode/202009290921296616412075/keytab/hdfs.keytab hdfs/jointown2@BDMS.163.COM
5.传输文件
/usr/ndp/bdms/hdfs/20200929091617654cac872f/namenode/202009290921296616412075/current/bin/hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://10.4.14.28:8020/user/hive/foo_1119/foo_1119.txt /testfile/
三、同平台(同一套kerberos)跨集群
##### 代码示例
#! /bin/bash
# 通过有数中台script节点执行脚本任务则可省略kerberos认证,否则需通过kinit 完成kerberos认证
# poc.keytab文件需与脚本文件处于同级目录,用户需替换为自己项目对应keytab文件
# poc/dev用户需替换为自己项目keytab对应principal
# kerberos认证
kinit -kt poc.keytab poc/dev
# 以下为目标集群NN节点主机名
namenodes='hadoop1232.hz.163.org hadoop1233.hz.163.org'
for nn in ${namenodes}
do
# hadoop dfs -test -e 判断一个地址是否存在,如果访问的是secondary namenode,会返回"test: Operation category READ is not supported in state standby"
# 由于客户端配置问题,hadoop dfs -test如果报错:/usr/ndp/current/mapreduce_client/libexec/hdfs-config.sh: No such file or directory,就使用hdfs dfs -test
hadoop dfs -test -e hdfs://${nn}:8020/
if [ $? -eq 0 ];then
activenamenode=${nn}
break
fi
done
if [ -z ${activenamenode} ];then
echo "Can not find current active namenode, please check logs."
exit -1
else
echo "Active namenode is ${activenamenode}, distcp start."
# Distcp command
# hdfs://hz-cluster10 为当前集群hdfs逻辑名,需根据实际集群逻辑名进行替换
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://hz-cluster10/user/mammut/wangsong/dataroot/partition_demo/seq=a hdfs://${activenamenode}:8020/user/mammut/wangsong
if [ $? -eq 0 ];then
echo "Job succcess"
else
echo "Job failed"
exit -1
fi
fi
作者:林帅
文档反馈
以上内容对您是否有帮助?