Hive总结 --hive表的创建,删除和修改
一、創(chuàng)建表
官網(wǎng)創(chuàng)建HIVE表定義如下:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])][COMMENT table_comment][PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)][CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS][SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)[STORED AS DIRECTORIES][[ROW FORMAT row_format] [STORED AS file_format]| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)][LOCATION hdfs_path][TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)`以下官網(wǎng)中的實例也可進行參考:
CREATE TABLE new_key_value_storeROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"STORED AS RCFileAS SELECT (key % 1024) new_key, concat(key, value) key_value_pair FROM key_value_store SORT BY new_key, key_value_pair; CREATE TABLE page_view(viewTime INT, userid BIGINT,page_url STRING, referrer_url STRING,ip STRING COMMENT 'IP Address of the User')COMMENT 'This is the page view table'PARTITIONED BY(dt STRING, country STRING)CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETSROW FORMAT DELIMITEDFIELDS TERMINATED BY '\001'COLLECTION ITEMS TERMINATED BY '\002'MAP KEYS TERMINATED BY '\003'STORED AS SEQUENCEFILE;二、刪除表
DROP TABLE刪除該表的元數(shù)據(jù)和數(shù)據(jù)。如果配置了Trash(并且沒有指定PURGE),數(shù)據(jù)實際上會移動到.Trash/Current目錄。元數(shù)據(jù)完全丟失。
需要注意的是:在hive 4.X以后,設置 external.table.purge=true,也可以刪除外部表數(shù)據(jù)了。
使用該參數(shù)刪除表數(shù)據(jù),不會刪除表結(jié)構(gòu),還可以通過設置分區(qū)值,刪除指定分區(qū)內(nèi)的數(shù)據(jù)。
三、修改表
修改表的語法如下:
--修改表名 ALTER TABLE table_name RENAME TO new_table_name;--修改表特性 ALTER TABLE table_name SET TBLPROPERTIES table_properties;table_properties:: (property_name = property_value, property_name = property_value, ... )--例如: ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);--修改分桶 ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]INTO num_buckets BUCKETS; --增加分區(qū) hive0.8+ ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];partition_spec:: (partition_column = partition_col_value, partition_column = partition_col_value, ...)--Example: ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/part080808'PARTITION (dt='2008-08-09', country='us') location '/part080809';--修改分區(qū) ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;--可以使用如下語句進行添加,刪除,更新分區(qū),一般用于使用hdfs命令修改數(shù)據(jù),沒同步元數(shù)據(jù)情況 MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];--刪除分區(qū) ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...][IGNORE PROTECTION] [PURGE]; -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)--ExampleALTER TABLE page_view DROP PARTITION (dt='2008-08-08', country='us');--修改列 ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT]; Example: CREATE TABLE test_change (a int, b int, c int);// First change column a's name to a1. ALTER TABLE test_change CHANGE a a1 INT;// Next change column a1's name to a2, its data type to string, and put it after column b. ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b; // The new table's structure is: b int, a2 string, c int.// Then change column c's name to c1, and put it as the first column. ALTER TABLE test_change CHANGE c c1 INT FIRST; // The new table's structure is: c1 int, b int, a2 string.// Add a comment to column a1 ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';--增加刪除列 ALTER TABLE table_name [PARTITION partition_spec] -- (Note: Hive 0.14.0 and later)ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)[CASCADE|RESTRICT] -- (Note: Hive 1.1.0 and later) --ADD COLUMNS 允許將新列添加到現(xiàn)有列的末尾,但在分區(qū)列之前。這也支持Avro備份表,Hive 0.14和更高版本參考文檔:hive官方文檔–DDL
四、查詢數(shù)據(jù)
查詢語句的基本規(guī)則如下:
[WITH CommonTableExpression (, CommonTableExpression)*] (Note: Only available starting with Hive 0.13.0) SELECT [ALL | DISTINCT] select_expr, select_expr, ...FROM table_reference[WHERE where_condition][GROUP BY col_list][ORDER BY col_list][CLUSTER BY col_list| [DISTRIBUTE BY col_list] [SORT BY col_list]][LIMIT [offset,] rows]having語句一般和group by 語句一起使用,和where的區(qū)別主要是having可以使用聚合函數(shù)作為條件,如下所示:
SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10;limit語句可以使用limit n展示n條數(shù)據(jù),也可以使用 limit start,n 表示從start開始,到start+n條數(shù)據(jù),結(jié)果默認從第0行開始。
SELECT * FROM customers ORDER BY create_date LIMIT 5; SELECT * FROM customers ORDER BY create_date LIMIT 2,5; --展示第3到7條Order, Sort, Cluster, and Distribute By的區(qū)別
order by:是對查詢出的結(jié)果進行全部排序,在嚴格模式下(hive.mapred.mode=strict),必須和limit一起使用,使用order by的字句最終肯定只有一個reducer,默認是升序(asc);
在Hive 2.1.0及以后版本中,支持在“order by”子句中為null值排序。ASC默認是null在最前,而DESC默認排在最后。
sort by:sort by在reducer之前就會排序,默認的排序方式是按照數(shù)字大小和字典編撰順序進行排序,在hive3.0.0及以后,在子查詢和view中,使用沒有l(wèi)imit的sort by會被優(yōu)化器移除,要想使用可通過設置參數(shù)hive.remove.orderby.in.subquery=false進行使用。
order by和sort by的區(qū)別:Hive支持SORT BY,按照每個reducer對數(shù)據(jù)進行排序。order by和sort by之間的區(qū)別是,前者保證輸出中的總順序,而后者只保證reducer中的行順序。如果有多個reducer,sort by排序可能會給出部分有序的最終結(jié)果。
distribute by:distribute by 的分區(qū)規(guī)則是根據(jù)分區(qū)字段的 hash 碼與 reduce 的個數(shù)進行模除后,
余數(shù)相同的分到一個區(qū),Hive 要求 DISTRIBUTE BY 語句要寫在 SORT BY 語句之前。
cluster by:當 distribute by 和 sorts by 字段相同時,可以使用 cluster by 方式。cluster by 除了具有 distribute by 的功能外還兼具 sort by 的功能。但是排序只能是升序排序,不能指定排序規(guī)則為 ASC 或者 DESC。
參考文檔:hive官方文檔–Sort by
============================= 未完,待續(xù) ===================================
總結(jié)
以上是生活随笔為你收集整理的Hive总结 --hive表的创建,删除和修改的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 求一份siki学院的python a计划
- 下一篇: C++完成淄博烧烤节管理系统