FAQ-SQOOP使用query多并发导致数据重复

当使用sqoop query模式进行数据同步时,query中限制条件带or,且配置多并发时,会导致数据出现重复;

假定

用户SQL为:

SELECT * FROM DEMO T WHERE ID =1 OR ID <100;

切分键为: ID

并发数为:N

MAX = SELECT MAX(ID) FROM (SELECT * FROM DEMO T WHERE ID =1 OR ID <100)

MIN =SELECT MIN(ID) FROM (SELECT * FROM DEMO T WHERE ID =1 OR ID <100)

步长 S = (MAX-MIN) /N

最终生成第一个并发的查询语句为:

SELECT * FROM DEMO T WHERE ID =1 OR ID =2 AND ID >=MIN AND ID<MIN + S