GPU数据库PG-strom安装及使用
1 Postgresql安裝
1.1 獲取postgresql源碼
下載postgresql-9.5.10.tar.gz (https://www.postgresql.org/ftp/source/v9.5.10/)
解壓 tar -xzf postgresql-9.5.10.tar.gz
進入postgresql-9.5.10文件夾
1.2 編譯postgresql源碼
$ ./configure --prefix=/usr/pgsql-9.5 --enable-debug --enable-cassert
其中/usr/pgsql-9.5是安裝路徑,可以自定義
可能報錯:
通過rpm -qa | grep readline命令查看可以發現系統默認是自帶readline包
通過yum search readline搜索相關readline包,發現有個包readline-devel
安裝readline-devel
$ yum -y install -y readline-devel
# 查看邏輯CPU的個數
$ cat /proc/cpuinfo| grep "processor"| wc -l
依據邏輯CPU的個數設置并行數進行編譯
$ make -j 32
$ make install
其中32代表并行數量
$ vi /etc/profile
export PATH=$PATH:/usr/pgsql-9.5/bin
?
刷新權限
source /etc/profile
?
?
這一步的設置很重要,因為PG_strom是跟postgresql共享配置文件的
$ which pg_config
?
$ pg_config --pgxs
?
如果此處不對,多半是環境變量設置問題。
1.3 創建linux用戶
useradd postgres
1.4 初始化數據庫
$ mkdir /usr/pgsql-9.5/data
$ mkdir -p /usr/pgsql-9.5/log
$ touch /usr/pgsql-9.5/log/pgsql.log
chown -R postgres:postgres /usr/pgsql-9.5/data
chown -R postgres:postgres /usr/pgsql-9.5/log
chown -R postgres:postgres /usr/pgsql-9.5/log/pgsql.log
$ su - postgres
?
$ initdb -D /usr/pgsql-9.5/data??????
可以看到/usr/pgsql-9.5/data目錄下產生了大量的文件
1.5 修改配置文件
$ vi /usr/pgsql-9.5/data/postgresql.conf??????
改#listen_addresses = 'localhost'? 為? listen_addresses='*'
$ vi /usr/pgsql-9.5/data/pg_hba.conf
修改如下內容,信任指定服務器連接(設置免密)
# IPv4 local connections:
host??? all???????????? all????? 0.0.0.0/0???????? trust
1.6 啟動postgresql
添加軟鏈接
$ ln -s /usr/pgsql-9.5/bin/psql
啟動
$ pg_ctl start -D /usr/pgsql-9.5/data
查看是否啟動,出現下面界面代表啟動成功
$ ps -ef | grep postgres
停止
$ pg_ctl stop -D /usr/pgsql-9.5/data
2 Postgresql使用
2.1 postgresql表操作參考
# 創建新表
CREATE TABLE user_tbl(name VARCHAR(20), signup_date DATE);
# 插入數據
INSERT INTO user_tbl(name, signup_date) VALUES('張三', '2013-12-22');
# 選擇記錄
SELECT * FROM user_tbl;
# 更新數據
UPDATE user_tbl set name = '李四' WHERE name = '張三';
# 刪除記錄
DELETE FROM user_tbl WHERE name = '李四' ;
# 添加欄位
ALTER TABLE user_tbl ADD email VARCHAR(40);
# 更新結構
ALTER TABLE user_tbl ALTER COLUMN signup_date SET NOT NULL;
# 更名欄位
ALTER TABLE user_tbl RENAME COLUMN signup_date TO signup;
# 刪除欄位
ALTER TABLE user_tbl DROP COLUMN email;
# 表格更名
ALTER TABLE user_tbl RENAME TO backup_tbl;
# 刪除表格
DROP TABLE IF EXISTS backup_tbl;
# 查看表
\dt+
# 查看數據庫
\l
2.2 生成表數據
/home/users/chenzhuo/create.py文件(192.168.6.175)
生成data1.csv(1億行)、data2.csv(100w)與data3.csv(100w)(數據放在192.168.6.174的/opt目錄下)
2.3 postgresql使用
1)運行,進入postgres用戶
su - postgres
2)連接postgres數據庫
psql
3)生成表和導入數據:
CREATE TABLE mytable(name VARCHAR(20), time VARCHAR(20), phone VARCHAR(10));
COPY mytable FROM '/opt/data2.csv' ?WITH CSV? HEADER;
CREATE TABLE youtable(name VARCHAR(20), time VARCHAR(20), phone VARCHAR(10));
COPY youtable FROM '/opt/data2.csv'? WITH CSV? HEADER;
4)查詢操作:
1.關聯join
\timing on
select count(*) from mytable my inner join youtable you on my.name=you.name and my.time=you.time and my.phone =you.phone;
2.查詢Group by
3.子查詢
可能報錯:
出現中文編碼問題,加入
\encoding GBK
3 PG-strom安裝
3.1 PG_strom源碼下載
下載地址:https://github.com/heterodb/pg-strom
代碼說明:最新版本是2.0,但是由于還不穩定,存在一些bug,一開始就用最新版本,導致運行的時候出現一些莫名其妙的錯誤。建議下載1.x的版本。
下載后上傳到centos系統,解壓后,進入pg-strom文件夾。
3.2 PG_strom源碼編譯
$ make
$ make install
可能報錯:
1 pg_config?was not found
檢查postgresql的PATH環境變量是否配置正確
2 CUDA related files were not found
檢查CUDA安裝是否正確
3.3 修改配置文件
$ vi /usr/pgsql-9.5/data/postgresql.conf
shared_preload_libraries = '$libdir/pg_strom'
shared_buffers = 10GB
work_mem = 10GB
max_wal_size = 32GB
4 PG_strom使用
1)啟動pg-strom
$ su - postgres
$ pg_ctl -D /usr/pgsql-9.5/data start
出現以上紅色框,說明PG_strom已經安裝好了
啟動可能報錯:
在CUDA安裝下查找庫的動態鏈接器
$ sudo sh -c "echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda-x86_64.conf"
$ sudo ldconfig ??????
$ bash
其他可能報錯:
NVIDIA的MPS is not running
啟動 mps-control
export CUDA_VISIBLE_DEVICES=0
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
nvidia-cuda-mps-control -d
關閉mps-control
echo quit | nvidia-cuda-mps-control
執行完了,可以查看log 文件
2)添加插件
# psql
# CREATE EXTENSION pg_strom;
使用表操作與postgresql一樣。
?
5 citus安裝
5.1 獲取citus源碼
下載citus5.2.2版本,并上傳至服務器任意目錄
?
$ tar -xzvf citus-5.2.2.tar.gz
5.2 編譯、安裝
$ cd citus-5.2.2
$ ./configure
$ make
$ sudo make install
5.3 更改配置
$ cd /usr/pgsql-9.5/data
$ vi postgresql.conf
# 在 shared_preload_libraries 參數中添加 'citus' 并放在第一個
shared_preload_libraries = 'citus'
5.4 測試插件是否安裝成功
重啟postgresql后,
$ su - postgres
$ psql
$ create extension citus;
5.5 citus集群安裝
在多臺服務器上執行安裝步驟3.1-3.4,并將其中一臺當做master,另外其他的為worker。
以下操作均在master上執行(hadoop3):
$ echo " 172.22.13.222 5432" | sudo -u postgres tee -a /usr/pgsql-9.5/data/pg_worker_list.conf
$ echo " 172.22.13.223 5432" | sudo -u postgres tee -a /usr/pgsql-9.5/data/pg_worker_list.conf
$ pg_ctl reload -D /usr/pgsql-9.5/data
$ sudo -i -u postgres psql -c "SELECT * FROM master_get_active_worker_nodes();"
6 citus使用
以下操作均在master上執行:
su – postgres
psql
\timing on
create table t01(id int, id2 int, t text);
create table t02(id int, id2 int, t text);
SELECT master_create_distributed_table('t01', 'id2', 'hash');
SELECT master_create_distributed_table('t02', 'id2', 'hash');
每個表創建16個分片,1個副本
SELECT master_create_worker_shards('t01', 16, 1);
SELECT master_create_worker_shards('t02', 16, 1);
citus不支持insert 后面的SELECT另一張表的這種語法,支持下面的語法:
copy (select id, id, lpad(id::text, 5, id::text) from generate_series(1,100000000) as t(id)) to '/tmp/t01.txt';
copy t01 from '/tmp/t01.txt';
copy t02 from '/tmp/t01.txt';
explain select a.t, b.t from t01 a, t02 b where a.id2=b.id2 and a.id2=5;
Set citus.task_executor_type to "task-tracker";?
explain select a.t, b.t from t01 a, t02 b where a.id=b.id2 and a.id2=5;
SELECT * from pg_dist_shard;
SELECT * from pg_dist_shard_placement order by shardid;
watch -n 1 nvidia-smi
?
7 簡單性能測試
7.1 postgresql
set pg_strom.enabled=off;
1)創建測試表
create unlogged table test1 (id int, c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, info text, crt_time timestamp);?
?
create unlogged table test2 (id int, c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, info text, crt_time timestamp);
?
2)寫入測試數據
copy ( select?????????????
id,??
(random()*1000)::integer,??
(random()*1000)::integer,??
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
md5(random()::text),??
clock_timestamp()??
from generate_series(1,10000000)??
t(id)) to '/tmp/test1.txt';
?
copy ( select?????????????
id,??
(random()*1000)::integer,??
(random()*1000)::integer,??
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
md5(random()::text),??
clock_timestamp()??
from generate_series(1,100000000)??
t(id)) to '/tmp/test4.txt';
?
copy test1 from '/tmp/test1.txt';
copy test02 from '/tmp/test2.txt';
?
3)聚合函數
explain (analyze,verbose,timing,costs,buffers) select count(*) from test2;
13869ms
?
4)join
explain (analyze,verbose,timing,costs,buffers) select count(*) from test1 join test2 on test1.id = test2.id;
37693ms
?
4)group by
explain (analyze,verbose,timing,costs,buffers) select test1.c1,count(*) from test1 join test2 using (id) group by 1;
41698ms
?
5)子查詢
explain (analyze,verbose,timing,costs,buffers) select * from test1 where c1 = (select c1 from test2 where c1>500 order by c1 desc limit 1);
18902ms
?
explain (analyze,verbose,timing,costs,buffers) select * from (select id,c1 from test2 order by c1 desc) a limit 30;
23319ms
7.2 pg-strom單機
1)聚合函數
explain (analyze,verbose,timing,costs,buffers) select count(*) from test2;
9401ms
?
2)join
explain (analyze,verbose,timing,costs,buffers) select count(*) from test1 join test2 on test1.id = test2.id;
45232ms
?
3)group by
explain (analyze,verbose,timing,costs,buffers) select test1.c1,count(*) from test1 join test2 using (id) group by 1;
37968ms
?
4)子查詢
explain (analyze,verbose,timing,costs,buffers) select * from test1 where c1 = (select c1 from test2 where c1>500 order by c1 desc limit 1);
20260ms
?
explain (analyze,verbose,timing,costs,buffers) select * from (select id,c1 from test2 order by c1 desc) a limit 30;
21944ms
7.3 pg-strom分布式
1)創建測試表
create unlogged table test1 (id int, c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, info text, crt_time timestamp);?
?
create unlogged table test2 (id int, c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, info text, crt_time timestamp);
?
2)指定為分布式表
SELECT master_create_distributed_table('test1', 'id', 'hash');
SELECT master_create_distributed_table('test2', 'id', 'hash');
?
3)創建分片
每個表創建16個分片,1個副本
SELECT master_create_worker_shards('test1', 16, 1);
SELECT master_create_worker_shards('test2', 16, 1);
?
4)創建數據(不支持insert into … select)
copy ( select?????????????
id,??
(random()*1000)::integer,??
(random()*1000)::integer, ??
(random()*1000)::integer, ???
(random()*1000)::integer, ???
(random()*1000)::integer, ???
(random()*1000)::integer, ???
md5(random()::text),??
clock_timestamp()??
from generate_series(1,1000)??
t(id)) to '/tmp/test1.txt';
?
copy ( select?????????????
id,??
(random()*1000)::integer,??
(random()*1000)::integer,??
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
(random()*1000)::integer,???
md5(random()::text),??
clock_timestamp()??
from generate_series(1,1000)??
t(id)) to '/tmp/test2.txt';
?
copy test1 from '/tmp/test1.txt';
copy test2 from '/tmp/test2.txt';
?
5)聚合函數
explain (analyze,verbose,timing,costs,buffers) select count(*) from test2;
5486ms
?
6)join
explain (analyze,verbose,timing,costs,buffers) select count(*) from test1 join test2 on test1.id = test2.id;
8542ms
?
7)group by
explain (analyze,verbose,timing,costs,buffers) select test1.c1,count(*) from test1 join test2 using (id) group by 1;
9781ms
?
8)子查詢
explain (analyze,verbose,timing,costs,buffers) select * from test1 where c1 = (select c1 from test2 where c1>500 order by c1 desc limit 1);
不支持=子查詢
?
explain (analyze,verbose,timing,costs,buffers) select * from (select id,c1 from test2 order by c1 desc) a limit 30;
不支持limit子查詢、order by子查詢。
7.4 對比
| 查詢 | Postgres(ms) | Pg-strom單機(ms) | Pg-strom分布式(ms) |
| 聚合函數 | 13869 | 9401 | 5486 |
| join | 37693 | 45232 | 8542 |
| Group by | 41698 | 37968 | 9781 |
| 子查詢 | 18902 | 20260 | 某些不支持 |
| 子查詢 | 23319 | 21944 | 某些不支持 |
性能分析:postgres與pg-strom單機速度差不多,但是pg-strom分布式速度更快,目前不支持子查詢。
postgres與pg-strom性能接近的原因分析:上述postgres與pg-strom單機的性能差別不大,經過分析發現是因為在查詢過程中,只有掃描(Custom Scan(GPUPreAgg))部分用到了GPU,而在join上并沒有用到,還是Hash Join(不是GPU Hash Join)。
造成上述的原因是因為數據量太多,該型號的GPU處理性能受到限制。當把數據(1000w與1億行)的join改為(1000w與100w行)的join時,又用到了GPU Hash Join。建議更新更好的GPU型號設備。
7.5 報錯
語句較復雜時報錯(1000w與1億join)
?
總結
以上是生活随笔為你收集整理的GPU数据库PG-strom安装及使用的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 新编标准日本语 第26课 课文
- 下一篇: multism中ui和uo应该怎么表示_