HIVE插入数据数据少时可以数据大时失败

tez为hive引擎,执行sql(7G表到orc动态分组表 30多个分区):

insert overwrite table dmtest.business_licence_SOFT_INFOE_s partition(AREA_CODE) select enterprise_id,revoke_flag,REGISTER_TIME,update_time, AREA_CODE from dmtest.business_licence_SOFT_INFOE_H ;

动态分区插入

Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"enterprise_id":"34000004046964","area_code":"340000","register_time":"2019-10-06 00:00:00","revoke_flag":"1","update_time":"2020-07-03"} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"enterprise_id":"34000004046964","area_code":"340000","register_time":"2019-10-06 00:00:00","revoke_flag":"1","update_time":"2020-07-03"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive/staging/.hive-staging_hive_2020-07-03_15-55-00_467_6569156417071306755-1/_task_tmp.-ext-10000/area_code=340000/_tmp.000009_3 could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation.

在执行

insert overwrite table dmtest.business_licence_SOFT_INFOE_s partition(AREA_CODE) select enterprise_id,revoke_flag,REGISTER_TIME,update_time, AREA_CODE from dmtest.business_licence_SOFT_INFOE_H limit 1000;

时无错

原因,由于map数量多(58个)导入orc文件时 由于orc进行压缩,压缩过后小文件太多(30分区*58),崩溃了

解决办法:

tez引擎: SET tez.grouping.min-size=1024000000; -- 最小1024M,合并map端小文件 SET tez.grouping.max-size=2048000000; -- 最大2048M,可增加处理Map数量 可有可无 mr引擎设置缩小map数量即可

设置之后由 58个map到 9个map

执行成功

之后再设置点 合并小文件

经验分享 程序员 微信小程序 职场和发展