在此之前,已经完成了Ubuntu16.04环境下安装配置Hadoop2.8.1集群、Ubuntu16.04环境下安装配置HBase2.1.0集群和Ubuntu16.04环境下安装配置Hive2.3.3。
- 系统:Ubuntu
- 版本:16.04
- 处理器:Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
- 内存:2.0GB
- 类型:64位操作系统 64位处理器
- 版本:Hadoop-2.8.1、HBase-2.1.0
- Master:192.168.73.130
- Slave1:192.168.73.132
- Slave2:192.168.73.133
查看版本信息
root@Master:/usr/local/hive/bin# ./hive --version
Hive 2.3.3
Git git://daijymacpro-2.local/Users/daijy/commit/hive -r 8a511e3f79b43d4be41cd231cf5c99e43b248383
Compiled by daijy on Wed Mar 28 16:58:33 PDT 2018
From source with checksum 8873bba6c55a058614e74c0e628ab022输入命令 /usr/local/hive/bin/hive 进入Hive的shell控制台。
root@Master:/usr/local/hive/bin# ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true显示数据库:
hive> show databases;
OK
default
Time taken: 3.863 seconds, Fetched: 1 row(s)创建数据库:
hive> create database test;
OK
Time taken: 0.172 seconds查看数据库描述
hive> desc database test;
OK
test hdfs://Master:9000/user/hive/warehouse/test.db root USER
Time taken: 0.015 seconds, Fetched: 1 row(s)切换当前数据库
hive> use test;
OK
Time taken: 0.015 seconds在test数据库下创建表tb1:
hive> create table test.tb1(
> id int,
> name string,
> sex tinyint);
OK
Time taken: 0.52 seconds查看当前数据库中的表:
hive> show tables;
OK
tb1
Time taken: 0.033 seconds, Fetched: 1 row(s)查看指定数据库中的表:
hive> show tables in default;
OK
Time taken: 0.027 seconds查看表结构:
hive> desc test.tb1;
OK
id int
name string
sex tinyint
Time taken: 0.228 seconds, Fetched: 3 row(s)显示插入数据:
hive> insert into tb1(id, name, sex) values(1, 'hmj', 1);
Query ID = root_20180824192203_e0fda5a8-43e0-4220-8638-1f87cebff71f
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1535097816443_0001, Tracking URL = http://Master:8088/proxy/application_1535097816443_0001/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1535097816443_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-08-24 19:22:14,230 Stage-1 map = 0%, reduce = 0%
2018-08-24 19:22:21,797 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.02 sec
MapReduce Total cumulative CPU time: 1 seconds 20 msec
Ended Job = job_1535097816443_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://Master:9000/user/hive/warehouse/test.db/tb1/.hive-staging_hive_2018-08-24_19-22-03_954_561109500998164964-1/-ext-10000
Loading data to table test.tb1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.02 sec HDFS Read: 4476 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 20 msec
OK
Time taken: 20.546 seconds隐式插入数据:
hive> insert into tb1 values(2, 'Leo', 2);
Query ID = root_20180824192339_fa27b70d-4869-4698-8ba3-cb7e4e5c2840
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1535097816443_0002, Tracking URL = http://Master:8088/proxy/application_1535097816443_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1535097816443_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-08-24 19:23:48,843 Stage-1 map = 0%, reduce = 0%
2018-08-24 19:23:54,250 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.97 sec
MapReduce Total cumulative CPU time: 970 msec
Ended Job = job_1535097816443_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://Master:9000/user/hive/warehouse/test.db/tb1/.hive-staging_hive_2018-08-24_19-23-39_116_2386772226148603446-1/-ext-10000
Loading data to table test.tb1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.97 sec HDFS Read: 4481 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 970 msec
OK
Time taken: 17.759 seconds插入多条数据
hive> insert into tb1 values(3, 'ccc', 1),(4, 'ddd', 1),(5, 'eee', 2);
Query ID = root_20180824192530_2ba35a13-89f7-4e04-8a5a-513316bb458e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1535097816443_0003, Tracking URL = http://Master:8088/proxy/application_1535097816443_0003/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1535097816443_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-08-24 19:25:39,603 Stage-1 map = 0%, reduce = 0%
2018-08-24 19:25:44,989 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_1535097816443_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://Master:9000/user/hive/warehouse/test.db/tb1/.hive-staging_hive_2018-08-24_19-25-30_045_8743069087074347498-1/-ext-10000
Loading data to table test.tb1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.06 sec HDFS Read: 4499 HDFS Write: 88 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 60 msec
OK
Time taken: 17.548 seconds可见,Hive每次向表中插入一条记录,都会转换为MapReduce程序,效率低下。
字段查询:
hive> select id, name from tb1;
OK
1 hmj
2 Leo
3 ccc
4 ddd
5 eee
Time taken: 0.166 seconds, Fetched: 5 row(s)条件查询:
hive> select * from tb1 where sex=2;
OK
2 Leo 2
5 eee 2
Time taken: 0.369 seconds, Fetched: 2 row(s)统计查询:
hive> select max(id) from tb1;
Query ID = root_20180824193115_37f65de2-b102-4d16-b38a-f62c2f8dff65
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1535097816443_0004, Tracking URL = http://Master:8088/proxy/application_1535097816443_0004/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1535097816443_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-08-24 19:31:23,079 Stage-1 map = 0%, reduce = 0%
2018-08-24 19:31:28,269 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.67 sec
2018-08-24 19:31:34,593 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.56 sec
MapReduce Total cumulative CPU time: 1 seconds 560 msec
Ended Job = job_1535097816443_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 1.56 sec HDFS Read: 8195 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 560 msec
OK
5
Time taken: 20.314 seconds, Fetched: 1 row(s)说明
Hive简单查询不启用MapReduce Job 而启用Fetch task,在YARN Applications看不到ApplicationID。
启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT from
LIMIT n语句,不需要启动MapReduce job,直接通过Fetch task获取数据。