记最近Linux中遇到cpu使用率低loadavg高的分析过程
首先明確一下CPU使用率和loadavg這兩個概念:
CPU使用率
指當前CPU正在執行指令的繁忙程度,越高表面CPU正在執行很多指令即有進程一直在cpu上運行著
Loadavg
指的是CPU負載程度,表明的是CPU當前正在運行的任務以及等待運行的任務統計,是一種趨勢的體現;更詳細一點來說是R和D狀態的進程數量統計
分析此問題的起因是前一段時間購買的開發板,使用最新的SDK跑起來后,幾乎沒有什么任務在運行,CPU 100% idle,但是loadavg一直在1以上(cpu是雙核A7),對比之前使用過的單核MIPS架構路由器來說,顯得非常不正常。
在分析開始之前,先要介紹幾個工具:
top:查看進程狀態以及CPU占用和loadavg參數
ps -aux: 查看進程cpu占用和運行狀態
vmstat:查看cpu戶空間和內核空間占用情況及系統中斷和上下文切換狀態
pidstat:查看具體某個進程對cpu占用和上下文切換狀態
iostat:查看系統IO負載狀態
首先看一下系統狀態:
root@wireless-tag:/# vmstat? -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
?r? b???????? swpd???????? free???????? buff?????????? cache?? si?? so??? bi?? bo?? in?? cs?? us? sy? id? wa? st
?0? 0??????????? 0??????? 75660???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 57? 108?? 2?? 6? 91?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 58?? 51?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 43?? 49?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 45?? 56?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 46?? 53?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 46?? 57?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 43?? 52?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 75664???????? 3948??????? 11740??? 0??? 0???? 0???? 0?? 42?? 49?? 0?? 0 100?? 0?? 0
?cpu 100% idle: 說明幾乎沒有進程在使用cpu,中斷以及上下文切換也很低
?r及b隊列數量為0
root@wireless-tag:/# iostat
Linux 4.9.84 (wireless-tag)???? 01/01/70??????? _armv7l_??????? (2 CPU)
avg-cpu:? %user?? %nice %system %iowait? %steal?? %idle
???????????????? 0.57? ? ? 0.00?? ?? 1.73? ? ? ? 0.02?? ? ? ? 0.00?? 97.68
Device???????????? tps??? kB_read/s??? kB_wrtn/s??? kB_read??? kB_wrtn
mtdblock0???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock1???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock2???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock3???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock4???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock5???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock6???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock7???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock8???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock9???????? 0.03???????? 0.41???????? 0.00??????? 168????????? 0
mtdblock10??????? 0.02???????? 0.38???????? 0.00??????? 156????????? 0
mtdblock11??????? 0.02???????? 0.38???????? 0.00??????? 156????????? 0
mtdblock12??????? 0.02???????? 0.38???????? 0.00??????? 156????????? 0
ubiblock0_1?????? 0.15???????? 9.66???????? 0.00?????? 3948????????? 0
zram0???????????? 0.00???????? 0.01???????? 0.01????????? 4????????? 4
?io也幾乎都是空閑%idle 97.68
root@wireless-tag:/# uptime
?23:03:15 up 4 min,? 0 users,? load average: 1.10, 0.67, 0.29
但是loadavg 1分鐘統計顯示1.10,結合前面r和b隊列為0,說明系統有進程一直在等待運行
接著看一下進程列表:
?root@wireless-tag:/# ps -aux
USER????? PID %CPU %MEM??? VSZ?? RSS TTY????? STAT START?? TIME COMMAND
root??????? 1? 0.3? 1.6?? 2596? 1736 ???????? Ss?? 22:58?? 0:02 /sbin/procd
root??????? 2? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kthreadd]
root??????? 3? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [ksoftirqd/0]
root??????? 5? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [kworker/0:0H]
root??????? 7? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [rcu_preempt]
root??????? 8? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [rcu_sched]
root??????? 9? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [rcu_bh]
root?????? 10? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [migration/0]
root?????? 11? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [lru-add-drain]
root?????? 12? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [watchdog/0]
root?????? 13? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [cpuhp/0]
root?????? 14? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [cpuhp/1]
root?????? 15? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [watchdog/1]
root?????? 16? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [migration/1]
root?????? 17? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [ksoftirqd/1]
root?????? 18? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kworker/1:0]
root?????? 19? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [kworker/1:0H]
root?????? 20? 0.1? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kdevtmpfs]
root?????? 21? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [netns]
root?????? 22? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kworker/u4:1]
root????? 173? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [oom_reaper]
root????? 174? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [writeback]
root????? 176? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kcompactd0]
root????? 177? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [crypto]
root????? 178? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [bioset]
root????? 180? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [kblockd]
root????? 201? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [watchdogd]
root????? 282? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kworker/0:1]
root????? 296? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kswapd0]
root????? 297? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [vmstat]
root????? 384? 0.0? 0.0????? 0???? 0 ???????? D??? 22:58?? 0:00 [ehci_monitor]?? ----->注意狀態為D
root????? 400? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [urdma_tx_thread
root????? 414? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kworker/1:1]
root????? 419? 0.0? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [kworker/0:2]
root????? 502? 0.1? 0.0????? 0???? 0 ???????? S??? 22:58?? 0:00 [ubi_bgt0d]
root????? 503? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [bioset]
root????? 508? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:58?? 0:00 [kworker/0:1H]
root????? 538? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:59?? 0:00 [kworker/1:1H]
root????? 551? 0.0? 0.0????? 0???? 0 ???????? S??? 22:59?? 0:00 [ubifs_bgt0_2]
root????? 809? 0.0? 1.4?? 2144? 1540 ???????? S??? 22:59?? 0:00 /sbin/ubusd
root????? 812? 0.0? 1.8?? 3144? 1924 ttyS0??? Ss?? 22:59?? 0:00 /bin/ash --login
root????? 859? 0.0? 0.0????? 0???? 0 ???????? S??? 22:59?? 0:00 [ubi_bgt1d]
root????? 885? 0.0? 0.0????? 0???? 0 ???????? S??? 22:59?? 0:00 [ubifs_bgt1_0]
root????? 986? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:59?? 0:00 [bioset]
root????? 996? 0.0? 0.0????? 0???? 0 ???????? S<?? 22:59?? 0:00 [cfg80211]
root???? 1003? 0.0? 0.0????? 0???? 0 ???????? S??? 22:59?? 0:00 [kworker/u4:2]
root???? 1226? 0.0? 1.3?? 2320? 1376 ???????? S??? 22:59?? 0:00 /sbin/logd -S 64
root???? 1247? 0.0? 1.4?? 2780? 1560 ???????? S??? 22:59?? 0:00 /sbin/rpcd -s /v
root???? 1395? 0.0? 1.5?? 2396? 1664 ???????? S??? 22:59?? 0:00 /sbin/netifd
root???? 1888? 0.0? 1.3?? 2708? 1416 ttyS0??? R+?? 23:10?? 0:00 ps -aux
?發現一個狀態為D的內核線程ehci_monitor
關于進程狀態說明如下:
進程狀態:O:進程正在處理器運行S:休眠狀態(sleeping)R:等待運行(runable)R Running or runnable (on run queue) 進程處于運行或就緒狀態I:空閑狀態(idle)Z:僵尸狀態(zombie) T:跟蹤狀態(Traced)B:進程正在等待更多的內存頁 D:不可中斷的深度睡眠D狀態標示不可中斷,同時又是內核線程,說明該線程沒有捕獲任何信號,不能被中斷,一直在CPU運行隊列中 ,但是該線程又不占用CPU(前面cpu 100% idle),說明該線程沒有在干活。此時我們可以認為,該線程的實現有點”問題“,不干活又一直要運行,可能是在等待什么條件或者檢查什么狀態亦或其他什么很機密的東西;總之,該線程的狀態處理有待改善。
既然已經定位到了內核線程名,那么就沒有什么比查看代碼來得更直接的了,通過查看代碼發現該線程就是在干一個活,總結起來就是每隔100ms查詢一個條件,如下:
while(1) { if(check_status)do.... msleep(100); }該線程沒有捕獲任何信號,同時又沒有改變運行狀態,唯有一個msleep會引起調度,最終狀態就是D,不可中斷;
既然如此,我們來改善一下狀態,做兩個改變:
1. 設置線程狀態為INTERRUPTIBLE
2. 使用schedule_timeout來顯示的調度
while(1)
{
set_current_state(TASK_INTERRUPTIBLE);
if(check_status)
? do....
schedule_timeout(HZ)
}
?修改之后結果:
root@wireless-tag:/# uptime
?23:51:52 up 0 min,? 0 users,? load average: 0.16, 0.05, 0.01
root@wireless-tag:/# ps -aux
USER????? PID %CPU %MEM??? VSZ?? RSS TTY????? STAT START?? TIME COMMAND
root??????? 1? 5.9? 1.5?? 2596? 1652 ???????? Ss?? 23:51?? 0:02 /sbin/procd
root??????? 2? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kthreadd]
root??????? 3? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [ksoftirqd/0]
root??????? 4? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/0:0]
root??????? 5? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [kworker/0:0H]
root??????? 6? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/u4:0]
root??????? 7? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [rcu_preempt]
root??????? 8? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [rcu_sched]
root??????? 9? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [rcu_bh]
root?????? 10? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [migration/0]
root?????? 11? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [lru-add-drain]
root?????? 12? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [watchdog/0]
root?????? 13? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [cpuhp/0]
root?????? 14? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [cpuhp/1]
root?????? 15? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [watchdog/1]
root?????? 16? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [migration/1]
root?????? 17? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [ksoftirqd/1]
root?????? 18? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/1:0]
root?????? 19? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [kworker/1:0H]
root?????? 20? 2.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kdevtmpfs]
root?????? 21? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [netns]
root?????? 22? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/u4:1]
root????? 173? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [oom_reaper]
root????? 174? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [writeback]
root????? 176? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kcompactd0]
root????? 177? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [crypto]
root????? 178? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [bioset]
root????? 180? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [kblockd]
root????? 201? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [watchdogd]
root????? 282? 1.3? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/0:1]
root????? 296? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kswapd0]
root????? 297? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [vmstat]
root????? 384? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [ehci_monitor]??? ----> S態
root????? 400? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [urdma_tx_thread
root????? 414? 0.5? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/1:1]
root????? 419? 0.0? 0.0????? 0???? 0 ???????? S??? 23:51?? 0:00 [kworker/0:2]
root????? 422? 0.0? 0.0????? 0???? 0 ???????? S<?? 23:51?? 0:00 [bioset]
?root@wireless-tag:/# vmstat -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
?r? b???????? swpd???????? free???????? buff??????? cache? ? ? si?? so??? bi??? bo?? in?? cs? us? sy? id? wa? st
?0? 0??????????? 0??????? 76568???????? 3600??????? 11308??? 0??? 0???? 0???? 0?? 51? 103?? 4?? 6? 90?? 0?? 0
?0? 0??????????? 0??????? 76568???????? 3600??????? 11308??? 0??? 0???? 0???? 0?? 56?? 57?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 76584???????? 3600??????? 11308??? 0??? 0???? 0???? 0?? 43?? 56?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 76584???????? 3600??????? 11308??? 0??? 0???? 0???? 0?? 43?? 53?? 0?? 0 100?? 0?? 0
?0? 0??????????? 0??????? 76584???????? 3600??????? 11308??? 0??? 0???? 0???? 0?? 44?? 59?? 0?? 0 100?? 0?? 0
結果顯示很“完美”
關于內核線程的使用以及線程狀態切換和維護,后面有時間再補充一篇博文,以便更強有力的說明CPU使用率和loadavg這兩個參數的意義。
同時我們也能得出結論:CPU夠不夠用,系統壓力大不大,不能單純的只看使用率或者loadavg,需要兩個相結合來分析。
總結
以上是生活随笔為你收集整理的记最近Linux中遇到cpu使用率低loadavg高的分析过程的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Linux 信号可靠性,同步,异步,多线
- 下一篇: ARM GIC简介与Linux中断处理分