當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

imp命令导入指定表_Sqoop 使用shell命令的各种参数的配置及使用方法

發(fā)布時(shí)間：2023/12/9 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 imp命令导入指定表_Sqoop 使用shell命令的各种参数的配置及使用方法小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

點(diǎn)擊上方藍(lán)色字體，選擇“設(shè)為星標(biāo)”

回復(fù)”資源“獲取更多資源

本文作者：Sheep Sun本文鏈接：https://www.cnblogs.com/yangxusun9/p/12558683.html

大數(shù)據(jù)技術(shù)與架構(gòu)點(diǎn)擊右側(cè)關(guān)注，大數(shù)據(jù)開發(fā)領(lǐng)域最強(qiáng)公眾號(hào)！

暴走大數(shù)據(jù)點(diǎn)擊右側(cè)關(guān)注，暴走大數(shù)據(jù)！一、Sqoop簡(jiǎn)介Sqoop將用戶編寫的Sqoop命令翻譯為MR程序，MR程序讀取關(guān)系型數(shù)據(jù)庫(kù)中的數(shù)據(jù)，寫入到HDFS或讀取HDFS上的數(shù)據(jù)，寫入到關(guān)系型數(shù)據(jù)庫(kù)！在MR程序中如果要讀取關(guān)系型數(shù)據(jù)庫(kù)中的數(shù)據(jù)，必須指定輸入格式為DBInputformat！在MR程序中如果要向關(guān)系型數(shù)據(jù)庫(kù)寫入數(shù)據(jù)，必須指定輸出格式為DBOutputformat！Sqoop命令運(yùn)行的MR程序，只有Map階段，沒有Reduce階段！只需要做數(shù)據(jù)傳輸，不需要對(duì)數(shù)據(jù)進(jìn)行合并和排序！二、sqoop導(dǎo)入數(shù)據(jù)(將關(guān)系型數(shù)據(jù)庫(kù)的數(shù)據(jù)導(dǎo)入到 HDFS)數(shù)據(jù)如下2.1 直接導(dǎo)入HDFS

2.1.1 全表導(dǎo)入(部分導(dǎo)入)

bin/sqoop import \##連接的關(guān)系型數(shù)據(jù)庫(kù)的url,用戶名，密碼--connect jdbc:mysql://hadoop102:3306/test \--username root \--password 123 \##連接的表--table t_emp \##導(dǎo)出數(shù)據(jù)在hdfs上存放路徑--target-dir /sqoopTest \##如果路徑已存在則先刪除--delete-target-dir \##導(dǎo)入到Hdfs上后，每個(gè)字段使用什么參數(shù)進(jìn)行分割--fields-terminated-by "\t" \##要啟動(dòng)幾個(gè)MapTask，默認(rèn)4個(gè)--num-mappers 2 \##數(shù)據(jù)集根據(jù)哪個(gè)字段進(jìn)行切分，切分后每個(gè)MapTask負(fù)責(zé)一部分--split-by id \##要實(shí)現(xiàn)部分導(dǎo)入，加入下面的參數(shù)，表示導(dǎo)入哪些列##columns中如果涉及到多列，用逗號(hào)分隔，分隔時(shí)不要添加空格--columns id,name,age

2.1.2?使用sqoop關(guān)鍵字篩選查詢導(dǎo)入數(shù)據(jù)

bin/sqoop import \--connect jdbc:mysql://hadoop102:3306/test \--username root \--password 123 \--table t_emp \##指定過(guò)濾的where語(yǔ)句,where語(yǔ)句最好使用引號(hào)包裹--where 'id>6' \--target-dir /sqoopTest \--delete-target-dir \--fields-terminated-by "\t" \--num-mappers 1 \--split-by?id?

2.1.3 使用查詢語(yǔ)句導(dǎo)入

bin/sqoop import \--connect jdbc:mysql://hadoop102:3306/test \--username root \--password 123 \##查詢語(yǔ)句最好使用單引號(hào)##如果query后使用的是雙引號(hào)，則$CONDITIONS前必須加轉(zhuǎn)移符，防止shell識(shí)別為自己的變量--query 'select * from t_emp where id>3 and $CONDITIONS' \--target-dir /sqoopTest \--delete-target-dir \--fields-terminated-by "\t" \--num-mappers 1 \--split-by?id?注意：1、如果使用了--query，就不能指定--table，和--columns和--where　　--query 和 --table一定不能同時(shí)存在！　　--where和--query同時(shí)存在時(shí)，--where失效　　--columns和--query同時(shí)存在時(shí)，還有效！2、--query 必須跟--target-dir2.2?導(dǎo)入到Hivebin/sqoop import \--connect jdbc:mysql://hadoop102:3306/test \--username root \--password 123 \--query 'select * from t_emp where id>3 and $CONDITIONS' \--target-dir /sqoopTest \##如果不限定分隔符，那么hive存儲(chǔ)的數(shù)據(jù)將不帶分隔符，之后再想操作很麻煩，所以建議加上--fields-terminated-by "\t" \--delete-target-dir \##導(dǎo)入到hive--hive-import \##是否覆蓋寫，不加這個(gè)參數(shù)就是追加寫--hive-overwrite \##指定要導(dǎo)入的hive的表名--hive-table t_emp \--num-mappers 1 \--split-by?id原理還是分倆步：先把數(shù)據(jù)從關(guān)系數(shù)據(jù)庫(kù)里導(dǎo)到hdfs中，然后再?gòu)膆dfs中導(dǎo)到hive中，此時(shí)hdfs中的文件會(huì)被刪除注意：如果hive中沒表會(huì)自動(dòng)創(chuàng)建表，但是類型是自動(dòng)生成的，所以還是建議手動(dòng)創(chuàng)建?也可以分倆步走：先導(dǎo)入hdfs#!/bin/bashimport_data(){$sqoop import \--connect jdbc:mysql://hadoop102:3306/gmall \--username root \--password 123 \--target-dir /origin_data/gmall/db/$1/$do_date \--delete-target-dir \--query "$2 and \$CONDITIONS" \--num-mappers 1 \--fields-terminated-by '\t' \# 使用壓縮，和指定壓縮格式為lzop--compress \--compression-codec lzop \#將String類型和非String類型的空值替換為\N,方便Hive讀取--null-string '\\N' \--null-non-string '\\N'}然后利用? load data? 命令導(dǎo)入hive注意：這里使用到了空值處理 ——Hive中的Null在底層是以“\N”來(lái)存儲(chǔ)，而MySQL中的Null在底層就是Null，為了保證數(shù)據(jù)兩端的一致性。在導(dǎo)出數(shù)據(jù)時(shí)采用--input-null-string和--input-null-non-string兩個(gè)參數(shù)。導(dǎo)入數(shù)據(jù)時(shí)采用--null-string和--null-non-string。2.3?導(dǎo)入到 Hbasebin/sqoop import \--connect jdbc:mysql://hadoop102:3306/test \--username root \--password 123 \--query 'select * from t_emp where id>3 and $CONDITIONS' \--target-dir /sqoopTest \--delete-target-dir \##表不存在是否創(chuàng)建--hbase-create-table \##hbase中的表名--hbase-table "t_emp" \##將導(dǎo)入數(shù)據(jù)的哪一列作為rowkey--hbase-row-key "id" \##導(dǎo)入的列族--column-family "info" \--num-mappers 2 \--split-by?id1、當(dāng)選用自動(dòng)創(chuàng)建表時(shí)，如果版本不兼容會(huì)報(bào)錯(cuò)：20/03/24 13:51:24 INFO mapreduce.HBaseImportJob: Creating missing HBase table t_emp
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V?此時(shí)只能自己手動(dòng)創(chuàng)建或者可以重新編譯sqoop源碼2、如果要多列族導(dǎo)入，只能多次運(yùn)行命令，一次導(dǎo)入一個(gè)列族三、導(dǎo)出將Hdfs上的數(shù)據(jù)導(dǎo)出到關(guān)系型數(shù)據(jù)庫(kù)中3.1 SQL中表為空表時(shí)bin/sqoop export \--connect 'jdbc:mysql://hadoop102:3306/test?useUnicode=true&characterEncoding=utf-8' \--username root \--password 123 \##導(dǎo)出的表名，需要自己提前創(chuàng)建好--table t_emp2 \--num-mappers 1 \##hdfs上導(dǎo)出的數(shù)據(jù)的路徑--export-dir /user/hive/warehouse/t_emp \##hdfs上數(shù)據(jù)的分隔符--input-fields-terminated-by?"\t"3.2?表不為空表時(shí)如果插入的數(shù)據(jù)的主鍵和表中已有數(shù)據(jù)的主鍵沖突，那么會(huì)報(bào)錯(cuò)Duplicate entry '5' for key 'PRIMARY'如果在SQL下，可以使用INSERT INTO t_emp2 VALUE(5,'jack',30,3,1111) ON DUPLICATE KEY UPDATE NAME=VALUES(NAME),deptid=VALUES(deptid),empno=VALUES(empno);意為指定當(dāng)插入時(shí)，主鍵重復(fù)時(shí)時(shí)，對(duì)于重復(fù)的記錄，只做更新，不做插入！而用sqoop時(shí)，則可以啟用以下倆種模式

3.2.1updateonly模式

bin/sqoop export \--connect 'jdbc:mysql://hadoop103:3306/mydb?useUnicode=true&characterEncoding=utf-8' \--username root \--password 123456 \--table t_emp2 \--num-mappers 1 \--export-dir /hive/t_emp \--input-fields-terminated-by "\t" \--update-key id利用 --update-key 字段? ，表示主鍵重復(fù)時(shí)會(huì)進(jìn)行更新，但是主鍵不重復(fù)的時(shí)候，數(shù)據(jù)不會(huì)插入進(jìn)來(lái)

3.2.2allowinsert模式

bin/sqoop export \--connect 'jdbc:mysql://hadoop103:3306/mydb?useUnicode=true&characterEncoding=utf-8' \--username root \--password 123456 \--table t_emp2 \--num-mappers 1 \--export-dir /hive/t_emp \--input-fields-terminated-by "\t" \--update-key id \--update-mode??allowinsert表示主鍵重復(fù)時(shí)會(huì)進(jìn)行更新，主鍵不重復(fù)的時(shí)候，數(shù)據(jù)也會(huì)插入進(jìn)來(lái)3.3?如何查看導(dǎo)出命令的具體實(shí)現(xiàn)

3.3.1配置/etc/my.cnf

3.3.2重啟mysql服務(wù)

3.3.3進(jìn)入/var/lib/mysql，調(diào)用方法

sudo mysqlbinlog mysql-bin.000001歡迎點(diǎn)贊+收藏+轉(zhuǎn)發(fā)朋友圈素質(zhì)三連

文章不錯(cuò)？點(diǎn)個(gè)【在看】吧！??

總結(jié)

以上是生活随笔為你收集整理的imp命令导入指定表_Sqoop 使用shell命令的各种参数的配置及使用方法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。