如何在hive分区表插入数据
手动在hive表插入数据时,需特别注意两点: 1、hive表不可以指定字段插入 2、如果hive表是分区表,插入数据时,需指定分区
下面以分区表为例,插入数据: 查看表结构
hive> show create table acct_wt_user_avg_stock_hold_days; OK CREATE EXTERNAL TABLE `acct_wt_user_avg_stock_hold_days`( `init_date` string COMMENT ????, `fund_account` string COMMENT ????, `avg_hold_days` decimal(19,0) COMMENT ????????) PARTITIONED BY ( `part_init_date` string, `interval_type` string) CLUSTERED BY ( fund_account) SORTED BY ( fund_account ASC) INTO 100 BUCKETS ROW FORMAT SERDE org.apache.hadoop.hive.ql.io.orc.OrcSerde WITH SERDEPROPERTIES ( colelction.delim=u0002, field.delim=u0001, line.delim= , mapkey.delim=u0003, serialization.format=u0001) STORED AS INPUTFORMAT org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat LOCATION hdfs://xxx.xxx.xxx.xxx:8020/user/hive/warehouse/hs_wt/acct_wt_user_avg_stock_hold_days TBLPROPERTIES ( transient_lastDdlTime=1533279332) Time taken: 0.036 seconds, Fetched: 28 row(s)
插入数据
insert into acct_wt_user_avg_stock_hold_days partition (part_init_date=20190425,interval_type=1) select 20190425,4567890,30 from acct_wt_user_hold_stock_earnging_ratio limit 1;
查看结果:
hive> select * from acct_wt_user_avg_stock_hold_days where part_init_date = 20190425 and interval_type = 1; OK Time taken: 0.156 seconds
没有数据,考虑了一下,可能是上述表acct_wt_user_hold_stock_earnging_ratio中无数据。 查看表acct_wt_user_hold_stock_earnging_ratio是否有数据
hive> select * from acct_wt_user_hold_stock_earnging_ratio limit 1; OK Time taken: 0.069 seconds
确实无数据。 找一张存在数据的表,重新执行上述命令:
hive> select 1 from ods_origindb.hs_ods_secumain limit 1; OK 1 Time taken: 0.04 seconds, Fetched: 1 row(s)
替换表名,重新执行:
insert into acct_wt_user_avg_stock_hold_days partition (part_init_date=20190425,interval_type=1) select 20190425,4567890,30 from ods_origindb.hs_ods_secumain limit 1;
查验数据是否插入:
hive> select * from ztx1.acct_wt_user_avg_stock_hold_days where part_init_date = 20190425 and interval_type = 1; OK 20190425 4567890 30 20190425 1 Time taken: 0.074 seconds, Fetched: 1 row(s)
数据插入成功;
注意:插入语句中,采用了select … from table…limit … 的语句,后面的 limit 只是为了限定插入条数;如果 table 表里有 100w数据,没有什么冲突的话,又不加限制,可能会插入 10w条数据; 如果 table 表中 没有数据,那么 select 结果为空,数据插入为空;