當(dāng)前位置：首頁(yè) > 人文社科 > 生活经验 >内容正文

生活经验

辩证看待 iostat

發(fā)布時(shí)間：2023/11/27 生活经验 39 豆豆

生活随笔收集整理的這篇文章主要介紹了辩证看待 iostat 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

舊博文，搬到 csdn
原文：http://rebootcat.com/2018/01/16/using-iostat-dialectically/

前言

經(jīng)常做系統(tǒng)分析會(huì)接觸到很多有用的工具，比如 iostat,它是用來(lái)分析磁盤(pán)性能、系統(tǒng) I/O 的利器。

本文將重點(diǎn)介紹 iostat 命令的使用，并分析容易引起誤解的幾個(gè)指標(biāo)。

iostat

iostat - Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.

上面是 man 手冊(cè)關(guān)于 iostat 命令的介紹，非常簡(jiǎn)單明了。iostat 是我們經(jīng)常用來(lái)分析 cpu 負(fù)載和磁盤(pán) I/O 情況的工具。

iostat 基本使用

常用命令（個(gè)人習(xí)慣）：

iostat -xk 2 10

參數(shù)的解釋可以查看 man 手冊(cè)：

OPTIONS-c     Display the CPU utilization report.-d     Display the device utilization report.-g group_name { device [...] | ALL }Display statistics for a group of devices.  The iostat command reports statistics for each individual device in the list then a line of global statistics for the group displayed as group_name and made  up  of  all  thedevices in the list. The ALL keyword means that all the block devices defined by the system shall be included in the group.-h     Make the Device Utilization Report easier to read by a human.-j { ID | LABEL | PATH | UUID | ... } [ device [...] | ALL ]Display  persistent  device  names.  Options  ID,  LABEL,  etc.  specify  the type of the persistent name. These options are not limited, only prerequisite is that directory with required persistent names is present in/dev/disk.  Optionally, multiple devices can be specified in the chosen persistent name type.  Because persistent device names are usually long, option -h is enabled implicitly with this option.-k     Display statistics in kilobytes per second.-m     Display statistics in megabytes per second.-N     Display the registered device mapper names for any device mapper devices.  Useful for viewing LVM2 statistics.-p [ { device [,...] | ALL } ]The -p option displays statistics for block devices and all their partitions that are used by the system.  If a device name is entered on the command line, then statistics for it and all its partitions  are  displayed.Last,  the  ALL  keyword  indicates  that  statistics  have to be displayed for all the block devices and partitions defined by the system, including those that have never been used. If option -j is defined before thisoption, devices entered on the command line can be specified with the chosen persistent name type.-T     This option must be used with option -g and indicates that only global statistics for the group are to be displayed, and not statistics for individual devices in the group.-t     Print the time for each report displayed. The timestamp format may depend on the value of the S_TIME_FORMAT environment variable (see below).-V     Print version number then exit.-x     Display extended statistics.-y     Omit first report with statistics since system boot, if displaying multiple records at given interval.-z     Tell iostat to omit output for any devices for which there was no activity during the sample period.

簡(jiǎn)單講，-x 參數(shù)能比較詳細(xì)的給出一些指標(biāo)，2 代表間隔時(shí)間為 2s，統(tǒng)計(jì)輸出 10 次。

上面的命令可以看到如下的輸出：

avg-cpu:  %user   %nice %system %iowait  %steal   %idle0.40    0.00    0.49    0.42    0.00   98.69Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   253.00    0.02   10.26     0.66  2081.56   405.05     0.65   62.78    6.01   62.92   4.55   4.68
sdb               0.00     0.00    0.00    0.00     0.00     0.00     8.19     0.00    0.23    0.23    0.00   0.23   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     8.19     0.00    0.32    0.32    0.00   0.32   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     8.19     0.00    0.34    0.34    0.00   0.34   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     8.19     0.00    0.34    0.34    0.00   0.34   0.00

上面各個(gè)字段的解釋如下（同樣來(lái)自 man）

       Device Utilization Reportrrqm/sThe number of read requests merged per second that were queued to the device.wrqm/sThe number of write requests merged per second that were queued to the device.r/sThe number (after merges) of read requests completed per second for the device.w/sThe number (after merges) of write requests completed per second for the device.rsec/s (rkB/s, rMB/s)The number of sectors (kilobytes, megabytes) read from the device per second.wsec/s (wkB/s, wMB/s)The number of sectors (kilobytes, megabytes) written to the device per second.avgrq-szThe average size (in sectors) of the requests that were issued to the device.avgqu-szThe average queue length of the requests that were issued to the device.awaitThe average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.r_awaitThe average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.w_awaitThe average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.svctmThe average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more.  This field will be removed in a future sysstat version.%utilPercentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

上面的英文應(yīng)該還是挺容易明白的，其中重點(diǎn)需要關(guān)注的是下面幾個(gè)指標(biāo)：

avgrq-sz：每個(gè) IO 的平均扇區(qū)數(shù)，即所有請(qǐng)求的平均大小，以扇區(qū)（512字節(jié)）為單位
avgqu-sz：平均意義上的請(qǐng)求隊(duì)列長(zhǎng)度
await：平均每個(gè) I/O 花費(fèi)的時(shí)間，包括在隊(duì)列中等待時(shí)間以及磁盤(pán)控制器中真正處理的時(shí)間
svctm：每個(gè) I/O 的服務(wù)時(shí)間。但注意上面的解釋 Warning! Do not trust this field any more。iostat 中關(guān)于每個(gè) I/O 的真實(shí)處理時(shí)間不可靠
util：磁盤(pán)繁忙程度，單位為百分比

分析建議：
當(dāng)系統(tǒng)性能下降時(shí)，我們往往需要著重關(guān)注上面列出來(lái)的 5 個(gè)參數(shù)，比如：

I/O 請(qǐng)求隊(duì)列是否過(guò)長(zhǎng)？
I/O size 是否過(guò)大或過(guò)小？
是否造成了 I/O 等待過(guò)長(zhǎng)？
每個(gè) I/O 處理時(shí)間是否過(guò)大？
磁盤(pán)壓力是否過(guò)大？

綜合分析上述指標(biāo)，可以得到一定的性能分析結(jié)論，但需要注意一些陷阱。

注意陷阱

我們看到上面 iostat 的輸出如下：

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   253.00    0.02   10.26     0.66  2081.56   405.05     0.65   62.78    6.01   62.92   4.55   4.68

svctm 為 4.55 ms，即每個(gè) I/O 處理時(shí)間為 4.55 ms，這其實(shí)是有點(diǎn)偏慢了，但是 await 卻高達(dá) 62.78 ms，為何？

上面可以看到總的 I/O 數(shù)為『讀 I/O』+ 『寫(xiě) I/O』 = 0.02 + 10.26 ≈ 11 個(gè)，假設(shè)這 11 個(gè) I/O 是同時(shí)發(fā)起，且磁盤(pán)是順序處理的情況，那么平均等待時(shí)間計(jì)算如下：

平均等待時(shí)間 = 單個(gè) I/O 處理時(shí)間 * ( 1 + 2 + 3 + ...+ I/O 請(qǐng)求總數(shù) - 1 ) / 請(qǐng)求總數(shù) = 4.55 * （ 1 + 2 + 3 + ... + 10） / 11 = 22.75 ms

解釋如下：

可以把 iostat 想像成超市付款處，有 11 個(gè)顧客排隊(duì)等待付款，只有一個(gè)收銀員在服務(wù)，每個(gè)顧客處理時(shí)間為 4.55 ms，第一個(gè)顧客不需要等待，第二個(gè)顧客需要等待第一個(gè)顧客的處理時(shí)間，第三個(gè)顧客需要等待前面兩位的處理時(shí)間…以此類推，所有等待時(shí)間為單個(gè) I/O 處理時(shí)間 * ( 1 + 2 + 3 + …+ I/O 請(qǐng)求總數(shù) - 1 ).

計(jì)算得到的平均等待時(shí)間為 22.75 ms，再加上單個(gè) I/O 處理時(shí)間 4.55 ms 得到 27.3 ms:

22.75 + 4.55 = 27.3 ms

27.3ms 可以表征 iostat 中的 await 指標(biāo)，因?yàn)?await 包括了等待時(shí)間和實(shí)際處理時(shí)間。但 iostat 的 await 為 62.78 ms，為何會(huì)比 iostat 得到的 await 值小這么多？why?

27.3 ms <  62.78 ms

再次查看計(jì)算方法，步驟和原理都是正確的，但其中唯一不準(zhǔn)確的變量就是單個(gè) I/O 的處理時(shí)間 svctm！另外就是前提假定了磁盤(pán)是順序處理 I/O 的。

那么是不是 svctm 不準(zhǔn)確呢？或者磁盤(pán)并不是順序處理 I/O 請(qǐng)求的呢？

丟棄 svctm

我們一直想要得到的指標(biāo)是能夠衡量磁盤(pán)性能的指標(biāo)，也就是單個(gè) I/O 的 service time。但是 service time 和 iostat 無(wú)關(guān)，iostat 沒(méi)有任何一個(gè)參數(shù)能夠提供這方面的信息。人們往往對(duì) iostat 抱有過(guò)多的期待！

Warning! Do not trust this field any more. This field will be removed in a future sysstat version.

man 手冊(cè)中給出了這么一段模凌兩可的警告，卻沒(méi)有說(shuō)明原因。那么原因是什么呢？svctm 又是怎么得到的呢？

iostat 命令來(lái)自 sysstat 工具包，翻閱源碼可以在 rd_stats.c 找到 svctm 的計(jì)算方法，其實(shí) svctm 的計(jì)算依賴于其他指標(biāo)：

/***************************************************************************** Compute "extended" device statistics (service time, etc.).** IN:* @sdc     Structure with current device statistics.* @sdp     Structure with previous device statistics.* @itv     Interval of time in 1/100th of a second.** OUT:* @xds     Structure with extended statistics.****************************************************************************/void compute_ext_disk_stats(struct stats_disk *sdc, struct stats_disk *sdp,unsigned long long itv, struct ext_disk_stats *xds){double tput= ((double) (sdc->nr_ios - sdp->nr_ios)) * 100 / itv;xds->util  = S_VALUE(sdp->tot_ticks, sdc->tot_ticks, itv);xds->svctm = tput ? xds->util / tput : 0.0;/** Kernel gives ticks already in milliseconds for all platforms* => no need for further scaling.*/xds->await = (sdc->nr_ios - sdp->nr_ios) ?((sdc->rd_ticks - sdp->rd_ticks) + (sdc->wr_ticks - sdp->wr_ticks)) /((double) (sdc->nr_ios - sdp->nr_ios)) : 0.0;xds->arqsz = (sdc->nr_ios - sdp->nr_ios) ?((sdc->rd_sect - sdp->rd_sect) + (sdc->wr_sect - sdp->wr_sect)) /((double) (sdc->nr_ios - sdp->nr_ios)) : 0.0;}

其中重點(diǎn)關(guān)注：

xds->svctm = tput ? xds->util / tput : 0.0;

學(xué)過(guò) C 語(yǔ)言的都知道這是一個(gè)三元運(yùn)算符：

A ? B : C
表示如果 A 為真，那么表達(dá)式值為 B，否則為 C

tput 可以理解為 IOPS，即當(dāng) IOPS 非零時(shí)，svctm 等于 util / tput；否則等于 0。

tput 相當(dāng)于 IOPS，下文會(huì)作解釋。

上面說(shuō)的 svctm 的計(jì)算依賴的值就是 util，那么 man 手冊(cè)給出的警告應(yīng)該廢棄 svctm 的原因是不是因?yàn)?util 的計(jì)算不準(zhǔn)確呢？

util 磁盤(pán)利用率

上面說(shuō)到應(yīng)該廢棄 svctm 指標(biāo)，因?yàn)樗⒉荒茏鳛楹饬看疟P(pán)性能的指標(biāo)，svctm 的計(jì)算是不準(zhǔn)確的。但從上面的計(jì)算公式可以看到，唯一的不確定的變量是 util 的值。

util 是用來(lái)衡量磁盤(pán)利用率的指標(biāo)，那么 util 是怎么計(jì)算的呢？還是上面的 compute_ext_disk_stats 函數(shù)：

void compute_ext_disk_stats(struct stats_disk *sdc, struct stats_disk *sdp,unsigned long long itv, struct ext_disk_stats *xds){double tput= ((double) (sdc->nr_ios - sdp->nr_ios)) * 100 / itv;xds->util  = S_VALUE(sdp->tot_ticks, sdc->tot_ticks, itv);...}

進(jìn)一步閱讀源碼找到 S_VALUE 的定義：

#define S_VALUE(m,n,p)      (((double) ((n) - (m))) / (p) * 100)

且上面的注釋可以看到：

   * @sdc        Structure with current device statistics.* @sdp        Structure with previous device statistics.* @itv        Interval of time in 1/100th of a second.

最終得到 util 的計(jì)算方法為：

util = ( current_tot_ticks - previous_tot_ticks ) /  采樣周期 * 100

那么 tot_ticks 是什么呢？這里需要關(guān)注 stats_disk 這個(gè)結(jié)構(gòu)體，查閱源碼在 rd_stats.h 文件中：

/* rd_stats.h */
/* Structure for block devices statistics */
struct stats_disk {unsigned long long nr_ios;unsigned long      rd_sect  __attribute__ ((aligned (8)));unsigned long      wr_sect  __attribute__ ((aligned (8)));unsigned int       rd_ticks __attribute__ ((aligned (8)));unsigned int       wr_ticks;unsigned int       tot_ticks;unsigned int       rq_ticks;unsigned int       major;unsigned int       minor;
};

這里看不出具體每個(gè)字段是什么意義，源文件也沒(méi)有作注釋，接著看 rd_stats.c 文件是怎么對(duì)結(jié)構(gòu)體賦值的，源文件 rd_stats.c 中：

/***************************************************************************** Read block devices statistics from /proc/diskstats.*
*/__nr_t read_diskstats_disk(struct stats_disk *st_disk, __nr_t nr_alloc,int read_part){...if ((fp = fopen(DISKSTATS, "r")) == NULL)return 0;while (fgets(line, sizeof(line), fp) != NULL) {if (sscanf(line, "%u %u %s %lu %*u %lu %u %lu %*u %lu"" %u %*u %u %u",&major, &minor, dev_name,&rd_ios, &rd_sec, &rd_ticks, &wr_ios, &wr_sec, &wr_ticks,&tot_ticks, &rq_ticks) == 11) { ... }
...
}

核心代碼如上，具體來(lái)講，iostat 的使用其實(shí)是依賴于 /proc/diskstats 文件，讀取 /proc/diskstats 值，然后做進(jìn)一步的分析處理。這里額外介紹下 /proc/diskstats 文件：

[root@localhost ~]# cat /proc/diskstats1       0 ram0 0 0 0 0 0 0 0 0 0 0 01       1 ram1 0 0 0 0 0 0 0 0 0 0 01       2 ram2 0 0 0 0 0 0 0 0 0 0 01       3 ram3 0 0 0 0 0 0 0 0 0 0 01       4 ram4 0 0 0 0 0 0 0 0 0 0 01       5 ram5 0 0 0 0 0 0 0 0 0 0 01       6 ram6 0 0 0 0 0 0 0 0 0 0 01       7 ram7 0 0 0 0 0 0 0 0 0 0 01       8 ram8 0 0 0 0 0 0 0 0 0 0 08       0 sda 82044583 3148 10966722840 222442157 24658460 2499170 2700969385 105371088 0 57897509 3281962528       1 sda1 4144 0 339790 2859 93359 82770 4180584 671453 0 534023 6743118       2 sda2 487 0 4114 28 0 0 0 0 0 28 288       3 sda3 8450 0 206387 3489 598140 1719768 413807296 6739177 0 1204240 67425378       4 sda4 82031488 3148 10966172437 222435779 23966958 696632 2282981505 97960444 0 57538914 3210355358      16 sdb 6696805 672 1028622736 99268437 3479149 1095853 385460280 4357778 0 80933531 1036240008      32 sdc 6535697 706 1003357408 101660311 3409287 1048913 370227528 4329287 0 82570947 1059876038      48 sdd 6555170 652 1005848496 98046714 3392381 1044610 369149464 4407316 0 80348361 1024518998      64 sde 6532011 671 1002703024 134576408 3406505 1054721 372497720 5792380 0 103162428 140366630

每個(gè)字段的意義解釋如下：

  The /proc/diskstats file displays the I/O statisticsof block devices. Each line contains the following 14fields:1 - major number2 - minor mumber3 - device name4 - reads completed successfully5 - reads merged6 - sectors read7 - time spent reading (ms)8 - writes completed9 - writes merged10 - sectors written11 - time spent writing (ms)12 - I/Os currently in progress13 - time spent doing I/Os (ms)14 - weighted time spent doing I/Os (ms)

這里英文的解釋可能沒(méi)有很明白很清楚，尤其是第 7 、11、13 個(gè)字段的解釋，我們?cè)儆弥形慕忉屢幌?#xff1a;

域	Value	Quoted	解釋
F1	8	major number	此塊設(shè)備的主設(shè)備號(hào)
F2	0	minor mumber	此塊設(shè)備的次設(shè)備號(hào)
F3	sda	device name	此塊設(shè)備名字
F4	8567	reads completed successfully	成功完成的讀請(qǐng)求次數(shù)
F5	1560	reads merged	讀請(qǐng)求的次數(shù)
F6	140762	sectors read	讀請(qǐng)求的扇區(qū)數(shù)總和
F7	3460	time spent reading (ms)	讀請(qǐng)求花費(fèi)的時(shí)間總和
F8	0	writes completed	成功完成的寫(xiě)請(qǐng)求次數(shù)
F9	0	writes merged	寫(xiě)請(qǐng)求合并的次數(shù)
F10	0	sectors written	寫(xiě)請(qǐng)求的扇區(qū)數(shù)總和
F11	0	time spent writing (ms)	寫(xiě)請(qǐng)求花費(fèi)的時(shí)間總和
F12	0	I/Os currently in progress	次塊設(shè)備隊(duì)列中的IO請(qǐng)求數(shù)
F13	2090	time spent doing I/Os (ms)	塊設(shè)備隊(duì)列非空時(shí)間總和
F14	3440	weighted time spent doing I/Os (ms)	塊設(shè)備隊(duì)列非空時(shí)間加權(quán)總和

這里需要特別對(duì)第 7、11、13 個(gè)字段做一點(diǎn)解釋，第 7 個(gè)字段表示所有讀請(qǐng)求的花費(fèi)時(shí)間總和，這里把每個(gè)讀 I/O 請(qǐng)求都計(jì)算在內(nèi)；同理是第 11 個(gè)字段；那么為什么還有第 13 個(gè)字段呢？第 13 個(gè)字段不關(guān)心有多少 I/O 在處理，它只關(guān)心設(shè)備是否在做 I/O 操作，所以真實(shí)情況是第 7 個(gè)字段加上第 11 個(gè)字段的值會(huì)比第 13 個(gè)字段的值更大一點(diǎn)。

回到 rd_stats.c 源碼中，stats_disk 結(jié)構(gòu)體是如何賦值的呢？

...
while (fgets(line, sizeof(line), fp) != NULL) 
...
sscanf(line, "%u %u %s %lu %*u %lu %u %lu %*u %lu"" %u %*u %u %u",&major, &minor, dev_name,&rd_ios, &rd_sec, &rd_ticks, &wr_ios, &wr_sec, &wr_ticks,&tot_ticks, &rq_ticks) == 11)...

使用 fgets 函數(shù)獲得 /proc/diskstats 文件中的一行數(shù)據(jù)，然后使用 sscanf 函數(shù)格式化字符串到結(jié)構(gòu)體 stats_disk 的不同成員變量中。仔細(xì)看代碼，格式符號(hào)有 14 個(gè)，但接收字符串的變量只有 11 個(gè)，這里要注意的是 sscanf 的使用：

sscanf 中 * 表示讀入的數(shù)據(jù)將被舍棄。帶有*的格式指令不對(duì)應(yīng)可變參數(shù)列表中的任何數(shù)據(jù)。

這么一來(lái)，我們要尋找的 tot_ticks 就是第 13 個(gè)字段，也就是表示：

13 - time spent doing I/Os (ms)，即 花費(fèi)在 I/O 上的時(shí)間

我們?cè)倩氐?util 的計(jì)算：

util = ( current_tot_ticks - previous_tot_ticks ) /  采樣周期 * 100

util 的計(jì)算方法是： 統(tǒng)計(jì)一個(gè)周期內(nèi)磁盤(pán)有多少自然時(shí)間(ms) 是用來(lái)做 I/O 的，得出百分比，代表磁盤(pán)利用率。

上文對(duì)于 svctm 的計(jì)算提到 tput 這個(gè)變量代表 IOPS，這里額外做一點(diǎn)解釋：

/*rd_stats.c 中 read_diskstats_disk 函數(shù)內(nèi) */
/* 讀 I/O + 寫(xiě) I/O 數(shù)量 */
st_disk_i->nr_ios  = (unsigned long long) rd_ios + (unsigned long long) wr_ios;
...
/* rd_stats.c 中 compute_ext_disk_stats 函數(shù)內(nèi) */
/* 當(dāng)前讀寫(xiě) I/O 數(shù)量 - 上一次采樣時(shí)的讀寫(xiě) I/O 數(shù)量 */
double tput = ((double) (sdc->nr_ios - sdp->nr_ios)) * 100 / itv;
...

經(jīng)過(guò)對(duì) /proc/diskstats 各個(gè)字段的分析，不難得出，stats_disk 結(jié)構(gòu)體中的成員變量 nr_ios 代表讀寫(xiě) I/O 成功完成的數(shù)量，也就是 IOPS。

再回過(guò)來(lái)，那么 util 的計(jì)算是準(zhǔn)確的嗎？tot_ticks 的計(jì)算是準(zhǔn)確的嗎？

經(jīng)過(guò)上面的分析，tot_ticks 其實(shí)表示的是 /proc/diskstats 文件中第 13 個(gè)字段，表示磁盤(pán)處理 I/O 操作的自然時(shí)間，不考慮并行性。那么由此得到的 util 就失去了最原本的意義。

舉個(gè)簡(jiǎn)單的例子，假設(shè)磁盤(pán)處理單個(gè) I/O 的能力為 0.01ms，依次有 200 個(gè)請(qǐng)求提交，需要 2s 處理完所有的請(qǐng)求，如果采樣周期為 1s，在 1s 的采樣周期里 util 就達(dá)到了 100%；但是如果這 200 個(gè)請(qǐng)求分批次的并發(fā)提交，比如每次并發(fā)提交 2 個(gè)請(qǐng)求，即每次同時(shí)過(guò)來(lái) 2 個(gè)請(qǐng)求，那么需要 1s 即可完成所有請(qǐng)求，采樣周期為 1s，util 也是 100%。

兩種場(chǎng)景下 util 均是 100%，那一種磁盤(pán)壓力更大？當(dāng)然是第二種，但僅僅通過(guò) util 并不能得出這個(gè)結(jié)論。

再回到 svctm 的計(jì)算：

double tput  = ((double) (sdc->nr_ios - sdp->nr_ios)) * 100 / itv;
xds->util  = S_VALUE(sdp->tot_ticks, sdc->tot_ticks, itv);
xds->svctm = tput ? xds->util / tput : 0.0;

轉(zhuǎn)換上述兩個(gè)式子可以得到：

svctm = ( current_tot_ticks - previous_tot_ticks ) / (current_ios - previous_ios ) = 采樣周期內(nèi)設(shè)備進(jìn)行 I/O 的自然時(shí)間  /  采樣周期內(nèi)讀寫(xiě) I/O 次數(shù)

故通過(guò)此表達(dá)式計(jì)算得到的 svctm 其實(shí)并不能準(zhǔn)確衡量單個(gè) I/O 的處理能力。如果磁盤(pán)沒(méi)有并行處理的能力，那么采樣周期內(nèi)讀寫(xiě) I/O 次數(shù)必然減少，相應(yīng)的，svctm 的計(jì)算就會(huì)偏大。

那回到開(kāi)頭提出的疑問(wèn)，假定順序請(qǐng)求情況下得到的平均等待時(shí)間 27.3ms 小于 iostat 看到的 await 62.78ms:

27.3 ms <  62.78 ms

現(xiàn)在可以解釋了：27.3 ms 的計(jì)算其實(shí)使用了偏小的 svctm 值，故得到的平均等待時(shí)間較 62.78ms 小很多。

iostat 辯證看待

分析到這里，原理已經(jīng)很明白了，util 并不能衡量磁盤(pán)的利用率，svctm 的值失去了意義。期望通過(guò)這兩個(gè)指標(biāo)獲得一個(gè)磁盤(pán)性能的衡量恐怕不行了！

但平常的分析，我們可以參考 iostat 的輸出，再結(jié)合其他的一些工具，進(jìn)行多方面多方位的性能分析，才能得到比較接近真理的結(jié)論！

延伸

上文分析了 iostat 容易引起誤解的幾個(gè)指標(biāo)，在使用 iostat 時(shí)我們需要辯證的看待 iostat 的結(jié)果。

但我們往往更希望獲得一個(gè)能夠衡量磁盤(pán)性能的指標(biāo)，iostat 可能幫不上太多忙了，這時(shí)可能需要借助其他的工具了，比如 blktrace 這個(gè)工具，這才是分析 I/O 的利器！

參考

深入理解iostat
容易被誤讀的IOSTAT
深入分析diskstats
[Linux 運(yùn)維 – 存儲(chǔ)] /proc/diskstats詳解

總結(jié)

以上是生活随笔為你收集整理的辩证看待 iostat的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

iostat