當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

sqoop从mysql导入hdfs_3.使用sqoop从mysql 导入数据到hdfs

發布時間：2024/7/5 数据库 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 sqoop从mysql导入hdfs_3.使用sqoop从mysql 导入数据到hdfs 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

使用sqoop從mysql 導入數據到hdfs

注意：

需要將mysql的jdbc驅動包放到lib目錄下

在conf目錄中，執行

cp sqoop-env-template.sh sqoop-env.sh

#修改 ~/.bashrc，增加

export HADOOP_CLASSPATH=/opt/hadoop-2.6.0-cdh5.13.0/lib:/$HIVE_HOME/lib/*

source ~/.bashrc

#執行：

#其中：employee_sql 是mysql中要同步的表。

#name,type是要同步的字段

#-m 沒有主鍵時，需要指定為1.同步的有主鍵時，可以不寫，這是個并行參數，可以提高效率。

# test_emp 是hive中的表。可以提前建，也可以后面建

./sqoop import --connect jdbc:mysql://localhost/test --username root -password 1234567 --table employee_sql --columns "name,type" --hive-import -m 1 --hive-table test_emp

#執行完畢后，只是把數據文件放到了hdfs文件系統上。目錄是HIVE的默認目錄。

#此時，在hive上新建一個表 test_tmp，便可查到導入的數據了。

附加數據：

mysql 中數據：

CREATE TABLE `employee_sql` (

`name` varchar(100) DEFAULT NULL,

`dept_num` int(11) DEFAULT NULL,

`employee_id` int(11) DEFAULT NULL,

`salary` int(11) DEFAULT NULL,

`type` varchar(100) DEFAULT NULL,

`start_date` date DEFAULT NULL

)

;

mysql> select * from employee_sql;

+---------+----------+-------------+--------+------+------------+

+---------+----------+-------------+--------+------+------------+

| Michael | 1000 | 100 | 5000 | full | 2014-01-29 |

| Will | 1000 | 101 | 4000 | full | 2013-10-02 |

| Will | 1000 | 101 | 4000 | part | 2014-10-02 |

| Steven | 1000 | 102 | 6400 | part | 2012-11-03 |

| Lucy | 1000 | 103 | 5500 | full | 2010-01-03 |

| Lily | 1001 | 104 | 5000 | part | 2014-11-29 |

| Jess | 1001 | 105 | 6000 | part | 2014-12-02 |

| Mike | 1001 | 106 | 6400 | part | 2013-11-03 |

| Wei | 1002 | 107 | 7000 | part | 2010-04-03 |

| Yun | 1002 | 108 | 5500 | full | 2014-01-29 |

| Richard | 1002 | 109 | 8000 | full | 2013-09-01 |

+---------+----------+-------------+--------+------+------------+

11 rows in set (0.01 sec)

hadoop中的數據

hadoop@server01:~$ hdfs dfs -ls /user/hive/warehouse/test_emp

Found 1 items

-rwxr-xr-x 1 hadoop supergroup 116 2018-11-08 16:01 /user/hive/warehouse/test_emp/part-m-00000

hadoop@server01:~$ hdfs dfs -cat /user/hive/warehouse/test_emp/part-m-00000

Michaelfull

Willfull

Willpart

Stevenpart

Lucyfull

Lilypart

Jesspart

Mikepart

Weipart

Yunfull

Richardfull

#hive中查看數據表

0: jdbc:hive2://localhost:10000> select * from test_emp;

INFO : Compiling command(queryId=hadoop_20181108170505_ed2ee34d-2ee9-444b-b316-427a91e2e552): select * from test_emp

INFO : Semantic Analysis Completed

INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_emp.name, type:string, comment:null), FieldSchema(name:test_emp.type, type:string, comment:null)], properties:null)

INFO : Completed compiling command(queryId=hadoop_20181108170505_ed2ee34d-2ee9-444b-b316-427a91e2e552); Time taken: 0.098 seconds

INFO : Concurrency mode is disabled, not creating a lock manager

INFO : Executing command(queryId=hadoop_20181108170505_ed2ee34d-2ee9-444b-b316-427a91e2e552): select * from test_emp

INFO : Completed executing command(queryId=hadoop_20181108170505_ed2ee34d-2ee9-444b-b316-427a91e2e552); Time taken: 0.0 seconds

INFO : OK

+----------------+----------------+--+

| test_emp.name | test_emp.type |

+----------------+----------------+--+

| Michael | full |

| Will | full |

| Will | part |

| Steven | part |

| Lucy | full |

| Lily | part |

| Jess | part |

| Mike | part |

| Wei | part |

| Yun | full |

| Richard | full |

+----------------+----------------+--+

11 rows selected (0.158 seconds)

總結

以上是生活随笔為你收集整理的sqoop从mysql导入hdfs_3.使用sqoop从mysql 导入数据到hdfs的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：第三周：浅层神经网络
下一篇： linux sqlserver 管理工具