benchmark问题_使用U盘来掩盖CEPH IO性能低下的问题
背景
最近Gemfield利用團隊廢棄的硬件搭建了一個CEPH集群,這些硬件的關鍵信息如下:
當然,這些寒磣的硬件也帶來了預期般的結果,這個CEPH集群的性能非常感人。性能指標如下所示:
[root@rook-ceph-tools-7gemfield-syszux /]# ceph osd pool create gemfield 100 100 pool 'gemfield' created#寫測試 [root@rook-ceph-tools-7bb5797c8-ns4bw /]# rados bench -p gemfield 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects Object prefix: benchmark_data_rook-ceph-tools-7bb5797c8-ns4_268170sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 16 16 0 0 0 - 01 16 19 3 11.9373 12 1.0033 0.885162 16 35 19 37.8984 64 0.959745 1.167653 16 50 34 45.2513 60 1.75533 1.144274 16 66 50 49.9308 64 1.7261 1.082745 16 79 63 50.3425 52 1.82505 1.045256 16 92 76 50.6176 52 0.301995 1.043297 16 103 87 49.6723 44 0.268341 1.090938 16 111 95 47.4643 32 1.75813 1.157879 16 117 101 44.8582 24 0.357698 1.1463410 16 127 111 44.3721 40 0.558962 1.1772311 16 128 112 40.7035 4 0.888134 1.17465 Total time run: 11.8353 Total writes made: 128 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 43.2606 Stddev Bandwidth: 20.616 Max bandwidth (MB/sec): 64 Min bandwidth (MB/sec): 4 Average IOPS: 10 Stddev IOPS: 5.15399 Max IOPS: 16 Min IOPS: 1 Average Latency(s): 1.46968 Stddev Latency(s): 1.15616 Max latency(s): 6.57493 Min latency(s): 0.268341#順序讀測試 [root@rook-ceph-tools-7bb5797c8-ns4bw /]# rados bench -p gemfield 10 seq hints = 1sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 42 26 103.982 104 0.303402 0.2802972 16 77 61 121.986 140 0.324191 0.3633013 16 102 86 114.655 100 0.129616 0.3355184 16 109 93 92.9908 28 0.046699 0.4013115 16 124 108 86.3917 60 0.019722 0.3940676 16 128 112 74.6592 16 0.0841534 0.453227 16 128 112 63.9929 0 - 0.453228 16 128 112 55.9936 0 - 0.453229 16 128 112 49.772 0 - 0.4532210 16 128 112 44.7944 0 - 0.45322 Total time run: 10.5818 Total reads made: 128 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 48.3848 Average IOPS: 12 Stddev IOPS: 13.1724 Max IOPS: 35 Min IOPS: 0 Average Latency(s): 1.31546 Max latency(s): 9.35307 Min latency(s): 0.0112003#隨機讀測試 [root@rook-ceph-tools-7bb5797c8-ns4bw /]# rados bench -p gemfield 10 rand hints = 1sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 44 28 111.984 112 0.345287 0.3252272 16 77 61 121.985 132 0.502314 0.3548143 16 105 89 118.654 112 0.144197 0.3731024 16 134 118 117.988 116 0.111743 0.4456915 16 168 152 121.588 136 0.235599 0.4457856 16 199 183 121.989 124 0.176444 0.4502447 16 229 213 121.703 120 0.111384 0.4580338 16 261 245 122.488 128 0.189708 0.4740839 16 292 276 122.655 124 0.206655 0.46000710 16 327 311 124.389 140 0.888494 0.4677211 7 327 320 116.353 36 0.628508 0.499252 Total time run: 11.8967 Total reads made: 327 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 109.946 Average IOPS: 27 Stddev IOPS: 7.04918 Max IOPS: 35 Min IOPS: 9 Average Latency(s): 0.54098 Max latency(s): 4.07673 Min latency(s): 0.0020629在這樣的指標下,團隊的煉丹人員普遍反映讀取數據集的速度慢了2倍到10倍不等(相比于直接從固態硬盤load數據)。為了解決在K8s Pod中基于CEPHFS創建的PVC上讀取大量小文件的性能低下問題,Gemfield苦苦思索。首先,這是廢棄的硬件,邏輯上不可能為了湊合這堆硬件再添置新的硬件;其次,我也沒錢;那怎么辦呢?
最后,Gemfield發現廢棄硬件的主板是支持USB3.0、USB3.1的,這就讓Gemfield有了兩種截然不同的想法:
1,掛載USB設備,然后將ROOK cluster.yaml中的metadataDevice設置為該USB設備;這個想法暫時被Gemfield壓抑住了,主要是兩個原因:1,metadataDevice的設置不支持熱更新,需要將node purge后重新安裝,對于目前的集群來說這個工作量有點大;2,USB是相對不穩定的設備,把OSD的meta放在USB上,感覺是將大廈的根基放在沙地上。
2,使用USB設備來創建個local persistent volume!缺點就是無法分布式共享!但相比于使用的場景,Gemfield傾向于使用這個方案!
哎,MLab2.0還是太老了,期待新的MLab3.0的早日到來!不過在此之前,先用下面的三波實驗來確認下上述的第二個想法。
第一波實驗:普通三星USB3.0的U盤
1,U盤性能
Gemfield手頭正好有一個閑置的三星U盤,先來對其的順序讀和隨機讀進行下IO性能測試:
#####################使用1個線程讀 (base) gemfield@ThinkPad-X1C:~$ sudo fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=1 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 fio-3.16 Starting 1 thread Jobs: 1 (f=1): [R(1)][100.0%][r=97.9MiB/s][r=6267 IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=1): err= 0: pid=198876: Thu May 21 15:33:54 2020read: IOPS=6550, BW=102MiB/s (107MB/s)(6141MiB/60001msec) ......bw ( KiB/s): min=88384, max=113312, per=100.00%, avg=104850.99, stdev=7011.68, samples=119iops : min= 5524, max= 7082, avg=6553.18, stdev=438.23, samples=119 ......Run status group 0 (all jobs):READ: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=6141MiB (6440MB), run=60001-60001msecDisk stats (read/write):sda: ios=392373/1, merge=0/0, ticks=56234/0, in_queue=84, util=99.75%######################使用3個線程讀 (base) gemfield@ThinkPad-X1C:~$ sudo fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=3 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 ... fio-3.16 Starting 3 threads Jobs: 3 (f=3): [R(3)][100.0%][r=15.0MiB/s][r=1022 IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=3): err= 0: pid=199170: Thu May 21 15:36:36 2020read: IOPS=1121, BW=17.5MiB/s (18.4MB/s)(1052MiB/60003msec) ......bw ( KiB/s): min=14336, max=22624, per=100.00%, avg=17946.16, stdev=555.87, samples=360iops : min= 896, max= 1414, avg=1121.62, stdev=34.74, samples=360 ......Run status group 0 (all jobs):READ: bw=17.5MiB/s (18.4MB/s), 17.5MiB/s-17.5MiB/s (18.4MB/s-18.4MB/s), io=1052MiB (1103MB), run=60003-60003msecDisk stats (read/write):sda: ios=65448/3, merge=1745/0, ticks=116019/2, in_queue=480, util=99.80%##################隨機讀,參數-rw=randread (base) gemfield@ThinkPad-X1C:~$ sudo fio -filename=/dev/sda1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=20G -numjobs=3 -runtime=60 -group_reporting -name=gemfieldtest2,準備U盤空間
在使用U盤空間之前,先在U盤上創建ext4文件系統:
gemfield@ThinkPad-X1C:~$ sudo mkfs.ext4 /dev/sda然后mount到worker node的/gemfield/u64目錄下:
mount /dev/sda /gemfield/u64這個廢棄的U盤是一個三星的USB3.0的64G的U盤,磁盤空間足夠裝下數據集了,但是還需要檢查下inode的數量,不然即使空間足夠,也會報“No space left on device”的錯誤:
(base) gemfield@ThinkPad-X1C:~$ df -i 文件系統 Inode 已用(I) 可用(I) 已用(I)% 掛載點 ...... /dev/sda 3842048 11 3842037 1% /media/gemfield/e34efb11-76f2-48e3-9f0e-4617bf276159inode的數量是3842048,呃,大概不夠,你知道的,數據集每個文件都很小,而數量又很多。于是重新制作文件系統,使用-N參數將inode數量擴大一倍:
gemfield@ai01:~$ sudo mkfs.ext4 -N 7847936 /dev/sda當然這樣做也是有代價的,增加的這300萬inode,大約會損失掉8G的U盤空間。這之后,你就可以使用Hostpath或者local persistent volume了,Gemfield選擇的是HostPath。
3,創建Hostpath Persistent Volumes
創建PV的YAML如下:
apiVersion: v1 kind: PersistentVolume metadata:name: gemfieldu64-hostpath-pvnamespace: mlab2labels:type: local spec:storageClassName: manualcapacity:storage: 54GiaccessModes:- ReadWriteManyhostPath:path: "/gemfield/u64"Apply后,結果顯示如下:
gemfield@ThinkPad-X1C:~$ kubectl -n mlab2 get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE gemfieldu64-hostpath-pv 54Gi RWX Retain Bound mlab2/gemfieldu64-hostpath-pvc manual 6s創建PVC的YAML如下所示:
apiVersion: v1 kind: PersistentVolumeClaim metadata:name: gemfieldu64-hostpath-pvcnamespace: mlab2 spec:storageClassName: manualaccessModes:- ReadWriteManyresources:requests:storage: 54GiApply后如下所示:
gemfield@ThinkPad-X1C:~$ kubectl -n mlab2 get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gemfieldu64-hostpath-pvc Bound gemfieldu64-hostpath-pv 54Gi RWX manual 2m之后創建Pod,將PVC掛載到Volumes,不解釋了。
4,總結
在USB3.0的U盤的加持下,煉丹師們反映讀取數據的速度并沒怎么加快。哼,一定是USB3.0的鍋。等我擇日在使用USB3.1的U盤試試吧。
第二波實驗:USB3.1(GEN 2)轉NVME
Gemfield選擇的是綠聯CM238 NVME固態硬盤盒,SSD選擇的是西數的SN550(容量500GB)。
將該USB硬盤組裝完畢后,使用硬盤盒自帶的type-c數據線接到電腦的USB3.1 type-c口上。我們使用lsusb命令來查看下:
gemfield@ai01:~$ lsusb ...... Bus 006 Device 007: ID 174c:2362 ASMedia Technology Inc. Ugreen Storage Device ......可見該USB硬盤是掛載到了Bus 6上,成為了device 7。這是Ugreen綠聯的設備,使用的是ASMedia的主控方案(祥碩科技)。我們再使用lsusb -t命令來查看下:
gemfield@ai01:~$ lsusb -t /: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M|__ Port 1: Dev 7, If 0, Class=Mass Storage, Driver=uas, 10000M|__ Port 2: Dev 6, If 0, Class=Mass Storage, Driver=usb-storage, 5000M可以看到在Bus 06的Dev 7設備(就是綠聯硬盤盒)使用的driver是uas,這說明系統將會用到上述提到的UASP加速傳輸協議;而Dev 6設備(就是第一波實驗中的三星U盤)的驅動是usb-storage,說明不支持UASP;此外,Dev7和Dev6分別是10000M和5000M,分別代表兩個設備分別使用USB3.1 GEN2和最早的USB 3.0協議。
再看下該組裝移動固態硬盤的可用空間,可見有465.78 GiB的空間可供使用:
gemfield@ai01:~$ sudo fdisk -l /dev/sdd [sudo] password for gemfield: Disk /dev/sdd: 465.78 GiB, 500107862016 bytes, 976773168 sectors Disk model: 00G2B0C-00PXH0 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 33553920 bytes這之后,我們在該SSD硬盤盒上創建ext4文件系統并掛載:
gemfield@ai01:~$ sudo mkfs.ext4 /dev/sdd gemfield@ai01:~$ sudo mount /dev/sdd /gemfield/hostpv/gemfield@ai01:~$ df -h | grep gemfield /dev/sdd 458G 73M 435G 1% /gemfield/hostpv1,USB固態硬盤性能
先確認
#單線程讀 gemfield@ai01:~$ sudo fio -filename=/dev/sdd -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=1 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 fio-3.16 Starting 1 thread Jobs: 1 (f=1): [R(1)][100.0%][r=177MiB/s][r=11.3k IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=1): err= 0: pid=2207076: Thu Jun 11 11:12:09 2020read: IOPS=11.5k, BW=180MiB/s (188MB/s)(10.5GiB/60001msec)clat (usec): min=60, max=12994, avg=85.68, stdev=52.21lat (usec): min=60, max=12994, avg=85.81, stdev=52.23 ......bw ( KiB/s): min=148576, max=190432, per=99.97%, avg=183818.45, stdev=6638.34, samples=119iops : min= 9286, max=11902, avg=11488.65, stdev=414.89, samples=119 ......Run status group 0 (all jobs):READ: bw=180MiB/s (188MB/s), 180MiB/s-180MiB/s (188MB/s-188MB/s), io=10.5GiB (11.3GB), run=60001-60001msecDisk stats (read/write):sdd: ios=688288/2, merge=0/0, ticks=53392/0, in_queue=68, util=99.74%#3個線程讀 gemfield@ai01:~$ sudo fio -filename=/dev/sdd -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=3 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 ... fio-3.16 Starting 3 threads Jobs: 3 (f=3): [R(3)][100.0%][r=388MiB/s][r=24.9k IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=3): err= 0: pid=2217813: Thu Jun 11 11:17:25 2020read: IOPS=24.6k, BW=385MiB/s (403MB/s)(22.5GiB/60001msec)clat (usec): min=67, max=14744, avg=120.41, stdev=69.41lat (usec): min=67, max=14744, avg=120.58, stdev=69.42 ......bw ( KiB/s): min=219200, max=413216, per=99.99%, avg=393774.71, stdev=10054.19, samples=357iops : min=13700, max=25826, avg=24610.92, stdev=628.39, samples=357lat (usec) : 100=15.98%, 250=83.34%, 500=0.61%, 750=0.04%, 1000=0.01%lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% ......Run status group 0 (all jobs):READ: bw=385MiB/s (403MB/s), 385MiB/s-385MiB/s (403MB/s-403MB/s), io=22.5GiB (24.2GB), run=60001-60001msecDisk stats (read/write):sdd: ios=1472203/3, merge=1674/0, ticks=163786/0, in_queue=172, util=99.82%可見在單個線程read上,對三星U盤的提升就是USB3.1對USB3.0的提升,不管是IOPS還是吞吐量都接近2倍;而在多線程read上,那是碾壓態勢,完全沒有可比性了。
再測試個對普通人比較關心的文件寫入吧(swap.img是一個8G大文件):
gemfield@ai01:/gemfield/hostpv$ sudo time cp /swap.img . 0.05user 9.71system 0:09.80elapsed gemfield@ai01:~$ sudo time cp /gemfield/hostpv/swap.img . 0.06user 8.59system 0:08.82elapsed不到10秒寫入8GB的大文件,不到9秒讀取8GB的大文件,嗯,基本就這樣了。
2,準備U盤空間
類似,不再贅述。
3,創建Hostpath Persistent Volumes
類似,不再贅述。
4,總結
這個訓練速度獲得了明顯提升。
第三波實驗:使用文件來虛擬塊設備
前文也提到過了,MLab2.0貧瘠的硬件限制了很多發揮。比方說只有一個NVME設備接口,且沒有多余的PCI可供使用,因此也沒法擴展。但是,唯一的這一個NVME SSD有512G的大小,給Linux操作系統和基礎軟件留上200多個GB的空間是足夠了,那么Gemfield可以額外劃分出200G大小的磁盤空間來虛擬出一個塊設備。
使用dd命令先創建出一個200G大小的文件:
gemfield@ai02:~# dd if=/dev/zero of=/root/hostpv.img bs=1G count=200創建loop塊設備,使用-f參數是說幫我們找到第一個可用的loop設備:
gemfield@ai02:~# losetup -fP /root/hostpv.img查看創建的loop塊設備,就可以發現losetup命令幫我們找到的是loop1設備:
gemfield@ai02:~# losetup -a /dev/loop1: [2082]:17170440 (/root/hostpv.img) /dev/loop6: [2082]:20582286 (/var/lib/snapd/snaps/lxd_15457.snap) /dev/loop4: [2082]:20588815 (/var/lib/snapd/snaps/lxd_15359.snap) /dev/loop2: [2082]:20584910 (/var/lib/snapd/snaps/core18_1754.snap) /dev/loop0: [2082]:20581035 (/var/lib/snapd/snaps/core18_1705.snap) /dev/loop5: [2082]:20578743 (/var/lib/snapd/snaps/snapd_7777.snap) /dev/loop3: [2082]:20581037 (/var/lib/snapd/snaps/snapd_7264.snap)創建ext4文件系統:
gemfield@ai02:~# mkfs.ext4 /root/hostpv.img掛載loop1設備:
gemfield@ai02:~# mkdir -p /gemfield/hostpv gemfield@ai02:~# mount -o loop /dev/loop1 /gemfield/hostpv/最后,在/etc/rc.local文件中添加下面這行內容來持久化:
losetup /dev/loop1 /root/hostpv.img1,loop設備的性能
上fio命令!Gemfield分別測試了SATA協議的SSD和NVME協議的SSD:
#SATA協議的SSDgemfield@ai02:~# fio -filename=/dev/loop1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=1 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 fio-3.16 Starting 1 thread Jobs: 1 (f=1): [R(1)][100.0%][r=532MiB/s][r=34.0k IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=1): err= 0: pid=3577222: Thu Jun 11 16:11:42 2020read: IOPS=36.5k, BW=570MiB/s (597MB/s)(20.0GiB/35947msec)clat (usec): min=15, max=7162, avg=26.80, stdev=22.59lat (usec): min=15, max=7162, avg=26.86, stdev=22.59 ......bw ( KiB/s): min=520320, max=676352, per=100.00%, avg=583979.85, stdev=46053.87, samples=71iops : min=32520, max=42272, avg=36498.73, stdev=2878.36, samples=71 ...... Run status group 0 (all jobs):READ: bw=570MiB/s (597MB/s), 570MiB/s-570MiB/s (597MB/s-597MB/s), io=20.0GiB (21.5GB), run=35947-35947msecDisk stats (read/write):loop1: ios=1309220/0, merge=0/0, ticks=26849/0, in_queue=16, util=99.76%#NVME協議的SSDroot@ai03:~# fio -filename=/dev/loop5 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=20G -numjobs=1 -runtime=60 -group_reporting -name=gemfieldtest gemfieldtest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1 fio-3.16 Starting 1 thread Jobs: 1 (f=1): [R(1)][100.0%][r=1100MiB/s][r=70.4k IOPS][eta 00m:00s] gemfieldtest: (groupid=0, jobs=1): err= 0: pid=2969320: Thu Jun 11 16:20:08 2020read: IOPS=66.6k, BW=1041MiB/s (1092MB/s)(20.0GiB/19668msec)clat (usec): min=7, max=12144, avg=14.75, stdev=31.35lat (usec): min=7, max=12144, avg=14.79, stdev=31.36 ......bw ( MiB/s): min= 837, max= 1218, per=100.00%, avg=1042.72, stdev=89.33, samples=39iops : min=53592, max=77982, avg=66734.46, stdev=5717.25, samples=39 ...... Run status group 0 (all jobs):READ: bw=1041MiB/s (1092MB/s), 1041MiB/s-1041MiB/s (1092MB/s-1092MB/s), io=20.0GiB (21.5GB), run=19668-19668msecDisk stats (read/write):loop5: ios=1308941/0, merge=0/0, ticks=15695/0, in_queue=88, util=99.27%可見SATA的SSD比USB3.1 Gen2 的SSD要上了一個臺階,而NVME的SSD是王者!!!
2,準備U盤空間
類似,不再贅述。
3,創建Hostpath Persistent Volumes
類似,不再贅述。
4,總結
這個訓練速度獲得了明顯提升。
總結
NVME SSD > SATA SSD > USB 3.1 Gens SSD > USB 3.0 flash drive。
總結
以上是生活随笔為你收集整理的benchmark问题_使用U盘来掩盖CEPH IO性能低下的问题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 自定义字体需要css的,CSS 自定义字
- 下一篇: java hanoi塔问题_java