本文共 2214 字,大约阅读时间需要 7 分钟。
项目github地址:
欢迎大家star,留言,一起学习进步如果数据在hdfs上存好,并且是结构化的数据。最常见的就是按天增量的结构化的日志或者计算结果,此时这部门数据基本不用后期维护,只需要后台程序每天正常运行。这样,在建表的时候直接用location指定即可。
create external table rpt_search_flow_experiment(date_num String,flowId string,pvlist int,uvlist int,pvdetail int,uvdetail int, uvall int,ord int,pvctr1 string comment "pvdetail/pvlist",pvctr2 string comment "ord/pvdetail",pvctr string comment "ord/pvlist",uvctr1 string comment "uvdetail/uvlist",uvctr2 string comment "ord/uvdetail",uvctr string comment "ord/uvlist",gross int comment "毛利润",net int comment "净利润",gross_div_uv float comment "毛利润/总uv",net_div_uv float comment "净利润/总uv")partitioned by (day string)row format delimited fields terminated by '\t'location '/xxx/xxx/rpt_search_flow_experiment'
一般这种情况下,都以建外部表为好。因为如果是内部表,drop表的时候会把数据也给删除掉。计算结果集还好说,一般数据量都不会特别大。如果是日志文件,现在的数据量大小,你懂的。。。所以为了保险起见,还是建个外部表为好。
hive> load data local inpath '/home/webopa/lei.wang/incubator/new_pv_uv/files/zzz' > into table rpt_search_flow_experiment partition(day = '20160101');Copying data from file:/home/webopa/lei.wang/incubator/new_pv_uv/files/zzzCopying file: file:/home/webopa/lei.wang/incubator/new_pv_uv/files/zzzLoading data to table rpt.rpt_search_flow_experiment partition (day=20160101)OKTime taken: 0.663 seconds
与从本地加载类似,去掉local关键字就可以。不再举例
hive> insert overwrite table rpt.rpt_search_flow_experiment partition (day="20160102") > select date_num, flowid, pvlist, uvlist, pvdetail, uvdetail, uvall, ord, pvctr1, pvctr2, pvctr, uvctr1, uvctr2, uvctr, gross, net, gross_div_uv, net_div_uv from rpt.rpt_search_flow_experiment where day = "20160101";...Stage-3 is selected by condition resolver.Stage-2 is filtered out by condition resolver.Stage-4 is filtered out by condition resolver.Moving data to: hdfs://mycluster/tmp/hive-webopa/hive_2016-04-27_14-43-59_493_1688413412217862660-1/-ext-10000Loading data to table rpt.rpt_search_flow_experiment partition (day=20160102)MapReduce Jobs Launched:Stage-Stage-1: Map: 1 Cumulative CPU: 2.69 sec HDFS Read: 2606 HDFS Write: 2361 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 690 msecOKTime taken: 26.5 seconds
注意因为有分区,所以不能直接select *,因为存在有分区字段,直接select * 会报字段不匹配的错误。
转载地址:http://sublf.baihongyu.com/