當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

HiveSQL常用数据处理语句

發布時間：2025/3/21 数据库 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 HiveSQL常用数据处理语句小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

HiveSQL運行優化參數配置

HiveSQL常用數據處理語句

HiveSQL中復雜數據類型操作

? ? ? ? Hive?采用了類似SQL?的查詢語言?HQL(Hive Query Language)，因此很容易將?Hive?理解為數據庫。其實從結構上來看，Hive?和數據庫除了擁有類似的查詢語言，再無類似之處。

Hive中的庫表基礎信息查看

Hive中增刪改查操作

Hive中分區操作

正則表達式替換函數 Regexp_replace

正則表達式解析函數 Regexp_extract

Hive中日期轉換函數

Hive中行轉列/列轉行函數

其他

Hive中的庫表基礎信息查看

--查看有哪些數據庫
show databases;?

--查看當前數據庫下有哪些表
show tables; ?

--正則表達式顯示表
show tables like '*cc*';?

--查看Hive支持函數
show functions;

--查詢表字段及注釋
desc table_name;

--查詢建表語句
show create table_name;

--查詢表分區
show partitions table_name;

--查看表的結構，字段，分區等情況
desc?table_name; ?

Hive中增刪改查操作

--給表重命名
alter table old_table_name rename to new_old_name;

--清除表數據
truncate table table_name;

--刪除表
drop table table_name;

--創建員工1信息表，并指定行的格式
create table table_name(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)?
PARTITIONED by(inc_day string)
STORED AS ORC
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';?

--加載數據入員工表，local 表示數據從本地獲取，未指定則是指hdfs，這是DML語法
LOAD DATA LOCAL INPATH '/home/hadoop/data/table_name_data.txt' OVERWRITE INTO TABLE table_name ;
?
--拷貝表結構
create table?table_name_01?like table_name;
?
--拷貝表結構以及表數據，需要跑MR
create table table_name_01 as select * from table_name;
?
--創建外部表，用external修飾，其它語法與內部表一致
create external table table_name_01 like table_name;
?
--增加列字段（最后）
alter table table_name add columns (
name string comment '名字',
age int comment '年齡',
height float comment '身高');

--改變列字段名稱
alter table table_name change column old_column_name new_column_name column_type;

--修改表字段注釋
alter table table_name change [cloumn] col_old_name col_new_name column_type [conmment col_conmment] [first|after column_name];
alter table table_name change cloumn col_old_name col_new_name column_type conmment col_conmment;

--增加一個字段并添加注釋
alter table table_name add columns (col_name data_type comment 'col_comment');

--刪除列
alter table table_name replace columns (col_name data_type, col_name_2 data_type_2);

--插入數據
insert into table_1 select * from table_2;? ? ? ? ? ? ? ? ?? --在table_1后追加數據
insert overwrite table table_1 select * from table_2; ?--先將table_1中數據清空，然后添加數據

Hive中分區操作

--增加分區
alter table table_name add if not exists partition (inc_day = '20190601');

--刪除分區
alter table table_name add if not exists partition (inc_day = '20190601');

--分區插入數據
insert into table_name partition (inc_day = '20190601')?
select * from table_2;

insert overwrite table table_name partition (inc_day = '20190601')?
select * from table_2;

正則表達式替換函數 Regexp_replace

正則替換是常用的字符串替換函數

語法：regexp_replace(string subject, string pattern, string str)
參數說明：subject為被替換的字符串，pattern為正則表達式，str為替換正則表達式(匹配到的字符串)的字符串
描述：將字符串subject中，符合正則表達式pattern的字符串，替換為字符串str，然后將替換后的整個字符串返回

select regexp_replace('abcdefg','abc','ABA') as res;
--ABAdefg

select regexp_replace('h234ney', '\\d+', 'o');
--honey

正則表達式解析函數 Regexp_extract

語法：regexp_extract(string subject, string pattern, int index)
參數說明：subject為被解析的字符串，pattern為正則表達式，index為正則表達式中捕獲分組的序號，取值范圍是0~n(n為捕獲分組個數)。
其中，index取值為 0：返回pattern匹配到的整個結果；
index取值為1：返回pattern中第1個捕獲分組匹配到的結果；
index取值為n：返回pattern中第n個捕獲分組匹配到的結果；
index取值小于0或者大于n：報錯。
描述：將字符串subject，按照pattern正則表達式的規則拆分，返回index指定的字符。

select regexp_extract('abcdefg','a(b)(.*?)(e)',0) as res;
--abcde

select regexp_extract('abcdefg','a(b)(.*?)(e)',1) as res;
--b

select regexp_extract('abcdefg','a(b)(.*?)(e)',2) as res;
--cd

select regexp_extract('abcdefg','a(b)(.*?)(e)',3) as res;
--e

select regexp_extract('honeymoon', 'hon(.*?)(moon)', 0);
--honeymoon

select regexp_extract('honeymoon', 'hon(.*?)(moon)', 1);
--ey

select regexp_extract('honeymoon', 'hon(.*?)(moon)', 2);
--moon

select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','x=([0-9]+)([a-z]+)',0), ?
--x=18abc

select regexp_extract('x=a3&x=18abc&x=2&y=3&x=4','^x=([a-z]+)([0-9]+)',0),?
--x=a3&x

正則表達式由標準的元字符（metacharacters）所構成：

Hive中日期轉換函數

unix_timestamp:返回當前或指定時間的時間戳
from_unixtime：將時間戳轉為日期格式
current_date：當前日期
current_timestamp：當前的日期加時間
to_date：抽取日期部分
year：獲取年
month：獲取月
day：獲取日
hour：獲取時
minute：獲取分
second：獲取秒
weekofyear：當前時間是一年中的第幾周
dayofmonth：當前時間是一個月中的第幾天
months_between：兩個日期間的月份，前-后
add_months：日期加減月
datediff：兩個日期相差的天數，前-后
date_add：日期加天數
date_sub：日期減天數
last_day：日期的當月的最后一天

--時間戳函數
--日期轉時間戳：從1970-01-01 00:00:00 UTC到指定時間的秒數
select unix_timestamp(); --獲得當前時區的UNIX時間戳
select unix_timestamp('2017-09-15 14:23:00');
select unix_timestamp('2017-09-15 14:23:00','yyyy-MM-dd HH:mm:ss');
select unix_timestamp('20170915 14:23:00','yyyyMMdd HH:mm:ss');

--時間戳轉日期
select from_unixtime(1505456567);
select from_unixtime(1505456567,'yyyyMMdd');
select from_unixtime(1505456567,'yyyy-MM-dd HH:mm:ss');
select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'); --獲取系統當前時間

--獲取當前日期: current_date
select current_date from dual
2017-09-15

--日期時間轉日期：to_date(string timestamp)
select to_date('2017-09-15 11:12:00') from dual;
2017-09-15

--計算兩個日期之間的天數: datediff
select datediff('2017-09-15','2017-09-01') from dual;
14

--日期增加和減少: date_add/date_sub(string startdate,int days)
select date_add('2017-09-15',1) from dual;
2017-09-16
select date_sub('2017-09-15',1) from dual;
2017-09-14

--to_date：日期時間轉日期函數
select to_date('2015-04-02 13:34:12');
輸出：2015-04-02

--year：返回日期中的年
select year('2015-04-02?11:32:12');
輸出：2015

--month：返回日期中的月份
select month('2015-12-02 11:32:12');
輸出：12

--day：返回日期中的天
select day('2015-04-13 11:32:12');
輸出：13

--hour：返回日期中的小時
select hour('2015-04-13 11:32:12');
輸出：11

--minute：返回日期中的分鐘
select minute('2015-04-13 11:32:12');
輸出：32

--second：返回日期中的秒
select second('2015-04-13 11:32:56');
輸出：56

--weekofyear：返回日期在當前周數
select weekofyear('2015-05-05 12:11:1');
輸出：19

--datediff：返回開始日期減去結束日期的天數
select datediff('2015-04-09','2015-04-01');
輸出：8

--date_sub：返回日期前n天的日期
select date_sub('2015-04-09',4);
輸出：2015-04-05

--date_add：返回日期后n天的日期
select date_add('2015-04-09',4);
輸出：2015-04-13

Hive中行轉列/列轉行函數

列轉行操作

concat(string s1, string s2, string s3)
這個函數能夠把字符串類型的數據連接起來，連接的某個元素可以是列值。
如 concat( aa, ':', bb) 就相當于把aa列和bb列用冒號連接起來了，aa:bb。

cast?
用法：cast(value as type)
功能：將某個列的值顯示的轉化為某個類型
例子：cast(age as string ) 將int類型的數據轉化為了String類型

concat_ws(seperator, string s1, string s2...)
功能：制定分隔符將多個字符串連接起來，實現“列轉行”
例子：常常結合group by與collect_set使用

collect_set()：縱向轉橫向，將多行數寫入一行，此方法會對該列去重
collect_list():作用同上，但是該方法不會對該列去重
此時，默認的數據之間的分割符是逗號（,）,也可以使用concat_ws()方法來指定分隔符

user_id ? order_id
104399 ? ?1715131
104399 ? ?2105395
104399 ? ?1758844
104399 ? ?981085
104399 ? ?2444143
104399 ? ?1458638
104399 ? ?968412
104400 ? ?1609001
104400 ? ?2986088
104400 ? ?1795054

select user_id,
concat_ws(',',collect_list(order_id)) as order_value?
from table_name
group by user_id
;

--結果(簡寫)
user_id ? ?order_value
104399 ? ?1715131,2105395,1758844,981085,2444143

使用函數：concat_ws(',',collect_set(column)) ?
說明：collect_list 不去重，collect_set 去重。 column的數據類型要求是string

split 切割函數
語法: split(string str, string pat)
返回值: array
說明: 按照pat字符串分割str，會返回分割后的字符串數組
select split(order_value, ',') from table_name;
--1715131 2105395 1758844 981085 2444143

select split(order_value, ',')[0] from table_name;
--1715131

行轉列操作

? ? ? ?lateral view用于和split、explode等UDTF一起使用的，能將一行數據拆分成多行數據，在此基礎上可以對拆分的數據進行聚合，lateral view首先為原始表的每行調用UDTF，UDTF會把一行拆分成一行或者多行，lateral view在把結果組合，產生一個支持別名表的虛擬表。其中explode還可以用于將hive一列中復雜的array或者map結構拆分成多行。

user_id ? order_value
104408 ? ?2909888,2662805,2922438,674972,2877863,190237
104407 ? ?2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128
104406 ? ?1463273,2351480,1958037,2606570,3226561,3239512,990271,1436056,2262338,2858678
104405 ? ?153023,2076625,1734614,2796812,1633995,2298856,2833641,3286778,2402946,2944051,181577,464232
104404 ? ?1815641,108556,3110738,2536910,1977293,424564
104403 ? ?253936,2917434,2345879,235401,2268252,2149562,2910478,375109,932923,1989353
104402 ? ?3373196,1908678,291757,1603657,1807247,573497,1050134,3402420
104401 ? ?814760,213922,2008045,3305934,2130994,1602245,419609,2502539,3040058,2828163,3063469
104400 ? ?1609001,2986088,1795054,429550,1812893
104399 ? ?1715131,2105395,1758844,981085,2444143,1458638,968412

select user_id,order_value,order_id
from table_name
lateral view explode(split(order_value,',')) num as order_id
;

--結果
user_id ? ?order_value ? ?? ??? ??? ??? ??? ??? ??? ??? ? order_id
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?2909888
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?2662805
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?2922438
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?674972
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?2877863
104408 ? ?2909888,2662805,2922438,674972,2877863,190237 ? ?190237
104407 ? ?2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 ? ?2982655
104407 ? ?2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 ? ?814964
104407 ? ?2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 ? ?1484250
104407 ? ?2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128 ? ?2323912

其他

case when函數
case?
when os = 'android' then 'android'
when os = 'ios' then 'iPhone'
else 'PC'
end as os,

row_number()函數
語法：row_number() over (partition by 字段a order by 計算項b desc ) rank?

partition by：類似hive的建表，分區的意思；
order by ：排序，默認是升序，加desc降序；
這里按字段a分區，對計算項b進行降序排序

總結

以上是生活随笔為你收集整理的HiveSQL常用数据处理语句的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： HiveSQL运行优化参数配置
下一篇： HiveSQL中复杂数据类型操作