hive表加载数据的不同方式-白红宇

hive表加载数据的不同方式

阅读量：2053 次

发布时间：2019-04-28

本文共 2214 字，大约阅读时间需要 7 分钟。

项目github地址：

欢迎大家star，留言，一起学习进步

1.建表时候指定location

如果数据在hdfs上存好，并且是结构化的数据。最常见的就是按天增量的结构化的日志或者计算结果，此时这部门数据基本不用后期维护，只需要后台程序每天正常运行。这样，在建表的时候直接用location指定即可。

create external table rpt_search_flow_experiment(date_num String,flowId string,pvlist int,uvlist int,pvdetail int,uvdetail int, uvall int,ord int,pvctr1 string comment "pvdetail/pvlist",pvctr2 string comment "ord/pvdetail",pvctr string comment "ord/pvlist",uvctr1 string comment "uvdetail/uvlist",uvctr2 string comment "ord/uvdetail",uvctr string comment "ord/uvlist",gross int comment "毛利润",net int comment "净利润",gross_div_uv float comment "毛利润/总uv",net_div_uv float comment "净利润/总uv")partitioned by (day string)row format delimited fields terminated by '\t'location '/xxx/xxx/rpt_search_flow_experiment'

一般这种情况下，都以建外部表为好。因为如果是内部表，drop表的时候会把数据也给删除掉。计算结果集还好说，一般数据量都不会特别大。如果是日志文件，现在的数据量大小，你懂的。。。所以为了保险起见，还是建个外部表为好。

2.从本地或hdfs中加载数据

从本地加载：

hive> load data local inpath '/home/webopa/lei.wang/incubator/new_pv_uv/files/zzz'    > into table rpt_search_flow_experiment partition(day = '20160101');Copying data from file:/home/webopa/lei.wang/incubator/new_pv_uv/files/zzzCopying file: file:/home/webopa/lei.wang/incubator/new_pv_uv/files/zzzLoading data to table rpt.rpt_search_flow_experiment partition (day=20160101)OKTime taken: 0.663 seconds

从hdfs加载

与从本地加载类似，去掉local关键字就可以。不再举例

3.从子查询中加载数据

hive> insert overwrite table rpt.rpt_search_flow_experiment partition (day="20160102")	> select date_num, flowid, pvlist, uvlist, pvdetail, uvdetail, uvall, ord, pvctr1, pvctr2, pvctr, uvctr1, uvctr2, uvctr, gross, net, gross_div_uv, net_div_uv from rpt.rpt_search_flow_experiment where day = "20160101";...Stage-3 is selected by condition resolver.Stage-2 is filtered out by condition resolver.Stage-4 is filtered out by condition resolver.Moving data to: hdfs://mycluster/tmp/hive-webopa/hive_2016-04-27_14-43-59_493_1688413412217862660-1/-ext-10000Loading data to table rpt.rpt_search_flow_experiment partition (day=20160102)MapReduce Jobs Launched:Stage-Stage-1: Map: 1   Cumulative CPU: 2.69 sec   HDFS Read: 2606 HDFS Write: 2361 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 690 msecOKTime taken: 26.5 seconds

注意因为有分区，所以不能直接select *，因为存在有分区字段，直接select * 会报字段不匹配的错误。

转载地址：http://sublf.baihongyu.com/

你可能感兴趣的文章

wxzh001，进来看关于APACHE+PHP+MYSQL+SSL的LINUX下安装配置（转自奥索）