Hive _偏门常用查询函数(二)附带实例(列转行、窗口函数)
接上篇博客:
Hive _偏門(mén)常用查詢(xún)函數(shù)(一)附帶實(shí)例
https://blog.csdn.net/qq_41946557/article/details/102904642
列轉(zhuǎn)行
1.函數(shù)說(shuō)明
EXPLODE(col):將hive一列中復(fù)雜的array或者map結(jié)構(gòu)拆分成多行。
LATERAL VIEW
用法:LATERAL VIEW udtf(expression) tableAlias AS columnAlias
解釋:用于和split, explode等UDTF一起使用,它能夠?qū)⒁涣袛?shù)據(jù)拆成多行數(shù)據(jù),在此基礎(chǔ)上可以對(duì)拆分后的數(shù)據(jù)進(jìn)行聚合。
2.數(shù)據(jù)準(zhǔn)備
《疑犯追蹤》 懸疑,動(dòng)作,科幻,劇情 《Lie to me》 懸疑,警匪,動(dòng)作,心理,劇情 《戰(zhàn)狼2》 戰(zhàn)爭(zhēng),動(dòng)作,災(zāi)難建表:
create table movie_info(movie string, category array<string>) row format delimited fields terminated by "\t" collection items terminated by ","; load data local inpath "/root/movie" into table movie_info;?需求:將電影分類(lèi)中的數(shù)組數(shù)據(jù)展開(kāi)。
selectmovie,category_name from movie_info lateral view explode(category) table_tmp as category_name;結(jié)果展示:
窗口函數(shù)
1.相關(guān)函數(shù)說(shuō)明
OVER():指定分析函數(shù)工作的數(shù)據(jù)窗口大小,這個(gè)數(shù)據(jù)窗口大小可能會(huì)隨著行的變而變化
CURRENT ROW:當(dāng)前行
n?PRECEDING:往前n行數(shù)據(jù)
n?FOLLOWING:往后n行數(shù)據(jù)
UNBOUNDED:起點(diǎn),UNBOUNDED PRECEDING 表示從前面的起點(diǎn), UNBOUNDED?FOLLOWING表示到后面的終點(diǎn)
LAG(col,n):往前第n行數(shù)據(jù)
LEAD(col,n):往后第n行數(shù)據(jù)
NTILE(n):把有序分區(qū)中的行分發(fā)到指定數(shù)據(jù)的組中,各個(gè)組有編號(hào),編號(hào)從1開(kāi)始,對(duì)于每一行,NTILE返回此行所屬的組的編號(hào)。注意:n必須為int類(lèi)型。
2.數(shù)據(jù)準(zhǔn)備:name,orderdate,cost
jack,2017-01-01,10 tony,2017-01-02,15 jack,2017-02-03,23 tony,2017-01-04,29 jack,2017-01-05,46 jack,2017-04-06,42 tony,2017-01-07,50 jack,2017-01-08,55 mart,2017-04-08,62 mart,2017-04-09,68 neil,2017-05-10,12 mart,2017-04-11,75 neil,2017-06-12,80 mart,2017-04-13,94建表:
create table business( name string, orderdate string, cost int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; load data local inpath "/root/business" into table business;需求:
- 查詢(xún)?cè)?017年4月份購(gòu)買(mǎi)過(guò)的顧客及總?cè)藬?shù)
結(jié)果展示:
- 查詢(xún)顧客的購(gòu)買(mǎi)明細(xì)及月購(gòu)買(mǎi)總額
結(jié)果展示:
+-------+-------------+-------+---------------+--+ | name | orderdate | cost | sum_window_0 | +-------+-------------+-------+---------------+--+ | jack | 2017-01-01 | 10 | 205 | | jack | 2017-01-08 | 55 | 205 | | tony | 2017-01-07 | 50 | 205 | | jack | 2017-01-05 | 46 | 205 | | tony | 2017-01-04 | 29 | 205 | | tony | 2017-01-02 | 15 | 205 | | jack | 2017-02-03 | 23 | 23 | | mart | 2017-04-13 | 94 | 341 | | jack | 2017-04-06 | 42 | 341 | | mart | 2017-04-11 | 75 | 341 | | mart | 2017-04-09 | 68 | 341 | | mart | 2017-04-08 | 62 | 341 | | neil | 2017-05-10 | 12 | 12 | | neil | 2017-06-12 | 80 | 80 | +-------+-------------+-------+---------------+--+?
- 上述的場(chǎng)景,要將cost按照日期進(jìn)行累加
按部分析:
0: jdbc:hive2://henu2:10000> select * from business order by orderdate;日期 排序查詢(xún)后的結(jié)果:
+----------------+---------------------+----------------+--+ | business.name | business.orderdate | business.cost | +----------------+---------------------+----------------+--+ | jack | 2017-01-01 | 10 | | tony | 2017-01-02 | 15 | | tony | 2017-01-04 | 29 | | jack | 2017-01-05 | 46 | | tony | 2017-01-07 | 50 | | jack | 2017-01-08 | 55 | | jack | 2017-02-03 | 23 | | jack | 2017-04-06 | 42 | | mart | 2017-04-08 | 62 | | mart | 2017-04-09 | 68 | | mart | 2017-04-11 | 75 | | mart | 2017-04-13 | 94 | | neil | 2017-05-10 | 12 | | neil | 2017-06-12 | 80 | +----------------+---------------------+----------------+--+?cost按照日期進(jìn)行累加
0: jdbc:hive2://henu2:10000> select *,sum(cost) over(sort by orderdate rows between UN BOUNDED PRECEDING and CURRENT ROW) from business;結(jié)果展示:
?
三行一算:
0: jdbc:hive2://henu2:10000> select *,sum(cost) over(sort by orderdate rows between 1 preceding and 1 following) from business;?結(jié)果展示:
另外補(bǔ)充,自行演示:?
select name,orderdate,cost, sum(cost) over() as sample1,--所有行相加 sum(cost) over(partition by name) as sample2,--按name分組,組內(nèi)數(shù)據(jù)相加 sum(cost) over(partition by name order by orderdate) as sample3,--按name分組,組內(nèi)數(shù)據(jù)累加 sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一樣,由起點(diǎn)到當(dāng)前行的聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --當(dāng)前行和前面一行做聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--當(dāng)前行和前邊一行及后面一行 sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --當(dāng)前行及后面所有行 from business;?結(jié)果展示:
- 查詢(xún)顧客上次的購(gòu)買(mǎi)時(shí)間
結(jié)果展示:
補(bǔ)充:
select *, lag(orderdate,1) over(distribute by name sort by orderdate), lead(orderdate,1) over(distribute by name sort by orderdate) from business;結(jié)果展示:?
?
- 查詢(xún)前20%時(shí)間的訂單信息
首先:分為5個(gè)組
select *,ntile(5) over(sort by orderdate) from business;結(jié)果展示:
最終語(yǔ)句:
select name,orderdate,cost from (select name,orderdate,cost,ntile(5) over(sort by orderdate) gid from business) t where t.gid = 1;結(jié)果展示:
?
總結(jié)
以上是生活随笔為你收集整理的Hive _偏门常用查询函数(二)附带实例(列转行、窗口函数)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Hive belline提交命令Erro
- 下一篇: no.3_九球称重