hivesql修改字段类型_Hive SQL语法总结
Hive是一個(gè)數(shù)據(jù)倉(cāng)庫(kù)基礎(chǔ)的應(yīng)用工具,在Hadoop中用來(lái)處理結(jié)構(gòu)化數(shù)據(jù),它架構(gòu)在Hadoop之上,通過(guò)SQL來(lái)對(duì)數(shù)據(jù)進(jìn)行操作,了解SQL的人,學(xué)起來(lái)毫不費(fèi)力。
Hive 查詢操作過(guò)程嚴(yán)格遵守Hadoop MapReduce 的作業(yè)執(zhí)行模型,Hive 將用戶的Hive SQL 語(yǔ)句通過(guò)解釋器轉(zhuǎn)換為MapReduce 作業(yè)提交到Hadoop 集群上,Hadoop 監(jiān)控作業(yè)執(zhí)行過(guò)程,然后返回作業(yè)執(zhí)行結(jié)果給用戶。Hive 并非為聯(lián)機(jī)事務(wù)處理而設(shè)計(jì),Hive 并不提供實(shí)時(shí)的查詢和基于行級(jí)的數(shù)據(jù)更新操作。Hive 的最佳使用場(chǎng)合是大數(shù)據(jù)集的批處理作業(yè),例如,網(wǎng)絡(luò)日志分析。
下面我們就為大家總結(jié)了一些Hive的常用 SQL語(yǔ)法:
"[ ]"括起來(lái)的代表我們可以寫(xiě)也可以不寫(xiě)的語(yǔ)句。
創(chuàng)建數(shù)據(jù)庫(kù):
CREATE DATABASE name;顯示命令:
show tables;show databases;show partitions ;show?functions;describe extended table_name dot col_name;DDL(Data Defination Language):數(shù)據(jù)庫(kù)定義語(yǔ)言
建表:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name?[(col_name data_type [COMMENT col_comment], ...)]?[COMMENT table_comment]?[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]?[CLUSTERED BY (col_name, col_name, ...)?[SORTED?BY?(col_name?[ASC|DESC],?...)]?INTO?num_buckets?BUCKETS]?[ROW?FORMAT?row_format]?[STORED?AS?file_format]?[LOCATION?hdfs_path]CREATE TABLE 創(chuàng)建一個(gè)指定名字的表。如果相同名字的表已經(jīng)存在,則拋出異常;用戶可以用 IF NOT EXIST 選項(xiàng)來(lái)忽略這個(gè)異常
EXTERNAL 關(guān)鍵字可以讓用戶創(chuàng)建一個(gè)外部表,在建表的同時(shí)指定一個(gè)指向?qū)嶋H數(shù)據(jù)的路徑(LOCATION)
LIKE 允許用戶復(fù)制現(xiàn)有的表結(jié)構(gòu),但是不復(fù)制數(shù)據(jù)
COMMENT可以為表與字段增加描述
ROW FORMAT
STORED AS
創(chuàng)建簡(jiǎn)單表:
CREATE?TABLE?person(name?STRING,age?INT);?創(chuàng)建外部表:
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,???? page_url STRING, referrer_url STRING,???? ip STRING COMMENT 'IP Address of the User',???? country STRING COMMENT 'country of origination')?COMMENT '這里寫(xiě)表的描述信息'?ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'?STORED AS TEXTFILE?LOCATION?'';創(chuàng)建分區(qū)表:
CREATE TABLE par_table(viewTime INT, userid BIGINT,???? page_url STRING, referrer_url STRING,???? ip STRING COMMENT 'IP Address of the User')?COMMENT 'This is the page view table'?PARTITIONED BY(date STRING, pos STRING)ROW FORMAT DELIMITED ‘\t’?? FIELDS TERMINATED BY '\n'STORED?AS?SEQUENCEFILE;創(chuàng)建分桶表:
CREATE TABLE par_table(viewTime INT, userid BIGINT,???? page_url STRING, referrer_url STRING,???? ip STRING COMMENT 'IP Address of the User')?COMMENT 'This is the page view table'?PARTITIONED BY(date STRING, pos STRING)?CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS?ROW FORMAT DELIMITED ‘\t’?? FIELDS TERMINATED BY '\n'STORED?AS?SEQUENCEFILE;創(chuàng)建帶索引字段的表:
CREATE?TABLE?invites?(foo?INT,?bar?STRING)?PARTITIONED?BY?(dindex?STRING);?復(fù)制一個(gè)空表:
CREATE TABLE empty_key_value_storeLIKE?key_value_store;顯示所有表:
SHOW?TABLES;按正則表達(dá)式顯示表:
hive>?SHOW?TABLES?'.*s';表中添加一個(gè)字段:
ALTER?TABLE?pokes?ADD?COLUMNS?(new_col?INT);添加一個(gè)字段并為其添加注釋:
hive>?ALTER?TABLE?invites?ADD?COLUMNS?(new_col2?INT?COMMENT?'a?comment');刪除列:
hive>?ALTER?TABLE?test?REPLACE?COLUMNS(id?BIGINT,?name?STRING);更改表名:
hive>?ALTER?TABLE?events?RENAME?TO?3koobecaf;增加、刪除分區(qū):
#增加:ALTER TABLE table_name ADD [IF NOT EXISTS] partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ...????? partition_spec:? : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...)#刪除:ALTER?TABLE?table_name?DROP?partition_spec,?partition_spec,...改變表的文件格式與組織:
ALTER TABLE table_name SET FILEFORMAT file_formatALTER TABLE table_name CLUSTERED BY(userid) SORTED BY(viewTime) INTO num_buckets BUCKETS#這個(gè)命令修改了表的物理存儲(chǔ)屬性創(chuàng)建和刪除視圖:
#創(chuàng)建視圖:CREATE VIEW [IF NOT EXISTS] view_name [ (column_name [COMMENT column_comment], ...) ][COMMENT view_comment][TBLPROPERTIES (property_name = property_value, ...)] AS SELECT;#刪除視圖:DROP VIEW view_name;DML(Data manipulation language):數(shù)據(jù)操作語(yǔ)言,主要是數(shù)據(jù)庫(kù)增刪改三種操作,DML包括:INSERT插入、UPDATE更新、DELETE刪除。
向數(shù)據(jù)表內(nèi)加載文件:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]#load操作只是單純的復(fù)制/移動(dòng)操作,將數(shù)據(jù)文件移動(dòng)到Hive表對(duì)應(yīng)的位置。#加載本地LOAD?DATA?LOCAL?INPATH?'./examples/files/kv1.txt'?OVERWRITE?INTO?TABLE?pokes;#加載HDFS數(shù)據(jù),同時(shí)給定分區(qū)信息hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');將查詢結(jié)果插入到Hive表:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;#多插入模式:FROM from_statementINSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] select_statement2] ...#自動(dòng)分區(qū)模式INSERT?OVERWRITE?TABLE?tablename?PARTITION?(partcol1[=val1],?partcol2[=val2]?...)?select_statement?FROM?from_statement;將查詢結(jié)果插入到HDFS文件系統(tǒng)中:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ...??????? FROM from_statement??????? INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1?????[INSERT?OVERWRITE?[LOCAL]?DIRECTORY?directory2?select_statement2]INSERT INTO
INSERT?INTO??TABLE?tablename1?[PARTITION?(partcol1=val1,?partcol2=val2?...)]?select_statement1?FROM?from_statement;insert overwrite和insert into的區(qū)別:
insert overwrite 會(huì)覆蓋已經(jīng)存在的數(shù)據(jù),假如原始表使用overwrite 上述的數(shù)據(jù),先現(xiàn)將原始表的數(shù)據(jù)remove,再插入新數(shù)據(jù)。
insert into 只是簡(jiǎn)單的插入,不考慮原始表的數(shù)據(jù),直接追加到表中。最后表的數(shù)據(jù)是原始數(shù)據(jù)和新插入數(shù)據(jù)。
DQL(data query language)數(shù)據(jù)查詢語(yǔ)言 select操作
SELECT查詢結(jié)構(gòu):
SELECT [ALL | DISTINCT] select_expr, select_expr, ...FROM table_reference[WHERE where_condition][GROUP BY col_list [HAVING condition]][?? CLUSTER BY col_list? | [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]][LIMIT?number]使用ALL和DISTINCT選項(xiàng)區(qū)分對(duì)重復(fù)記錄的處理。默認(rèn)是ALL,表示查詢所有記錄DISTINCT表示去掉重復(fù)的記錄
Where 條件 類(lèi)似我們傳統(tǒng)SQL的where 條件
ORDER BY 全局排序,只有一個(gè)Reduce任務(wù)
SORT BY 只在本機(jī)做排序
LIMIT限制輸出的個(gè)數(shù)和輸出起始位置
將查詢數(shù)據(jù)輸出至目錄:
hive>?INSERT?OVERWRITE?DIRECTORY?'/tmp/hdfs_out'?SELECT?a.*?FROM?invites?a?WHERE?a.ds='';將查詢結(jié)果輸出至本地目錄:
hive>?INSERT?OVERWRITE?LOCAL?DIRECTORY?'/tmp/local_out'?SELECT?a.*?FROM?pokes?a;將一個(gè)表的結(jié)果插入到另一個(gè)表:
FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(1) WHERE a.foo > 0 GROUP BY a.bar;INSERT OVERWRITE TABLE events SELECT a.bar, count(1) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;JOINFROM?pokes?t1?JOIN?invites?t2?ON?(t1.bar?=?t2.bar)?INSERT?OVERWRITE?TABLE?events?SELECT?t1.bar,?t1.foo,?t2.foo;將多表數(shù)據(jù)插入到同一表中
FROM srcINSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 and src.key < 300INSERT?OVERWRITE?LOCAL?DIRECTORY?'/tmp/dest4.out'?SELECT?src.value?WHERE?src.key?>=?300;?
Hive 只支持等值連接(equality joins)、外連接(outer joins)和(left semi joins)。Hive 不支持所有非等值的連接,因?yàn)榉堑戎颠B接非常難轉(zhuǎn)化到 map/reduce 任務(wù)。
LEFT,RIGHT和FULL OUTER關(guān)鍵字用于處理join中空記錄的情況
LEFT SEMI JOIN 是 IN/EXISTS 子查詢的一種更高效的實(shí)現(xiàn)
join 時(shí),每次 map/reduce 任務(wù)的邏輯是這樣的:reducer 會(huì)緩存 join 序列中除了最后一個(gè)表的所有表的記錄,再通過(guò)最后一個(gè)表將結(jié)果序列化到文件系統(tǒng)
實(shí)際應(yīng)用過(guò)程中應(yīng)盡量使用小表join大表
join查詢時(shí)應(yīng)注意的點(diǎn):
#只支持等值連接SELECT a.* FROM a JOIN b ON (a.id = b.id)SELECT a.* FROM a JOIN b???? ON (a.id = b.id AND a.department = b.department)#可以 join 多個(gè)表SELECT a.val, b.val, c.val FROM a JOIN b???? ON (a.key = b.key1) JOIN c ON (c.key = b.key2)#如果join中多個(gè)表的?join?key?是同一個(gè),則?join?會(huì)被轉(zhuǎn)化為單個(gè)?map/reduce?任務(wù)LEFT,RIGHT和FULL OUTER關(guān)鍵字
#左外連接SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)#右外鏈接SELECT a.val, b.val FROM a RIGHT OUTER JOIN b ON (a.key=b.key)#滿外連接SELECT?a.val,?b.val?FROM?a?FULL?OUTER?JOIN?b?ON?(a.key=b.key)LEFT SEMI JOIN關(guān)鍵字
#LEFT SEMI JOIN 的限制是, JOIN 子句中右邊的表只能在 ON 子句中設(shè)置過(guò)濾條件,在 WHERE 子句、SELECT 子句或其他地方過(guò)濾都不行SELECT a.key, a.value?? FROM a?? WHERE a.key in??? (SELECT b.key???? FROM B);#可以被寫(xiě)為:SELECT a.key, a.val????FROM?a?LEFT?SEMI?JOIN?b?on?(a.key?=?b.key)UNION 與 UNION ALL
#用來(lái)合并多個(gè)select的查詢結(jié)果,需要保證select中字段須一致select_statement UNION ALL select_statement UNION ALL select_statement ...#UNION 和 UNION ALL的區(qū)別#UNION只會(huì)查詢到兩個(gè)表中不同的數(shù)據(jù),相同的部分不會(huì)被查出#UNION ALL會(huì)把兩個(gè)表的所有數(shù)據(jù)都查詢出總結(jié)
以上是生活随笔為你收集整理的hivesql修改字段类型_Hive SQL语法总结的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 上海定妆喷雾能带上地铁吗?
- 下一篇: 乐高为什么这么贵?