hive创建一个简单外联表

hive 创建一个简单外联表

hdfs 创建目录

首先需要在hdfs上创建一个目录,用于存放hive表得数据。

1
hdfs dfs -mkdir /tmp/yarn-count-application

hive 创建表

这时候可以再hive上创建外联表了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
create external table op_count_yarn(
dt string,
num int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/tmp/yarn-count-application';

or

create external table op_yarn_job(
jobid string,
jobuser string,
queue string,
submittime string,
starttime string,
finishtime string,
elapsedtime bigint,
totalmaps bigint,
totalreduces bigint,
finishmaps bigint,
finishreduces bigint,
failedmaps bigint,
failedreducess bigint,
hdfsread bigint,
hdfswrite bigint,
vcoremaps bigint,
vcorereduces bigint,
cpuseconds bigint,
physicalmem bigint,
virtualmem bigint,
sql_urlencode string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/home/hadoop/yarn-job-application';

location 指向 hdfs 上创建的目录就可以了

产生数据

把你需要的数据放在hdfs得目录下

1
hdfs dfs -put hour_all_count.txt /tmp/yarn-count-application/

查询数据

这时候就可以用hive还查询数据量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
hive> select * from op_count_yarn;
OK
2018-12-19-19 707
2018-12-19-20 818
2018-12-19-21 801
2018-12-19-22 1027
2018-12-19-23 806
2018-12-20-00 859
2018-12-20-01 629
2018-12-20-02 1058
2018-12-20-03 1190
2018-12-20-04 878
2018-12-20-05 889
2018-12-20-06 961
2018-12-20-07 945
2018-12-20-08 927
2018-12-20-09 919
Time taken: 1.063 seconds, Fetched: 15 row(s)
1
2
3
4
5
6
7
8
9
10
11
12
13
hive> select sum(num) from op_count_yarn where dt like "2018-12-20-%";
......
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-12-20 10:42:31,362 Stage-1 map = 0%, reduce = 0%
2018-12-20 10:42:32,432 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.75 sec
MapReduce Total cumulative CPU time: 4 seconds 750 msec
Ended Job = job_1534758460905_2857417
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.75 sec HDFS Read: 13354 HDFS Write: 594837 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 750 msec
OK
9255
Time taken: 13.764 seconds, Fetched: 1 row(s)

查看分区

1
show partitions  op_yarn_job;

添加分区

1
alter table default.op_yarn_job ADD if NOT exists partition(dt='$YY-$MM-$DD');

删除分区

1
2
3
4
5
ALTER TABLE  table_name DROP PARTITION (day='20140722');

or

alter table op_yarn_job drop if exists partition(dt='2018-12-23');

查看是内部表还是外部表

1
2
3
4
5
describe extended tablename;

or

desc formatted tablename;
感谢您的支持!