死锁问题分析的利器——valgrind的DRD和Helgrind
? ? ? ? 在《DllMain中不當操作導致死鎖問題的分析--死鎖介紹》一文中,我們介紹了死鎖產生的原因。一般來說,如果我們對線程同步技術掌握不牢,或者同步方案混亂,極容易導致死鎖。本文我們將介紹如何使用valgrind排查死鎖問題。(轉載請指明出于breaksoftware的csdn博客)
? ? ? ? 構造一個場景
#include <pthread.h>pthread_mutex_t s_mutex_a;
pthread_mutex_t s_mutex_b;
pthread_barrier_t s_barrier;void lock() {pthread_mutex_lock(&s_mutex_b);{pthread_barrier_wait(&s_barrier);pthread_mutex_lock(&s_mutex_a);pthread_mutex_unlock(&s_mutex_a);}pthread_mutex_unlock(&s_mutex_b);
}static void* thread_routine(void* arg) {pthread_mutex_lock(&s_mutex_a);{pthread_barrier_wait(&s_barrier);pthread_mutex_lock(&s_mutex_b);pthread_mutex_unlock(&s_mutex_b);}pthread_mutex_unlock(&s_mutex_a);
}int main(int argc, char** argv) {pthread_t tid;pthread_mutex_init(&s_mutex_a, 0);pthread_mutex_init(&s_mutex_b, 0);pthread_barrier_init(&s_barrier, 0, 2);pthread_create(&tid, 0, &thread_routine, 0);lock();pthread_join(tid, 0);pthread_cancel(tid);pthread_barrier_destroy(&s_barrier);pthread_mutex_destroy(&s_mutex_a);pthread_mutex_destroy(&s_mutex_b);return 0;
}
? ? ? ? 這段代碼我們只要關注lock和thread_routine兩個方法。
? ? ? ? lock方法在主線程中執行,它先給s_mutex_b上鎖,然后通過屏障s_barrier等待線程也執行到屏障處(第21行)。
? ? ? ? thread_routine是線程函數,它先給s_mutex_a上鎖,然后通過屏障s_barrier等待主線程也執行到屏障處(第10行)。
? ? ? ? 主線程和子線程都執行到屏障處后,屏障被打開,它們繼續向下執行:主線程執行到第12行試圖獲取s_mutex_a;子線程執行到第23行試圖獲取s_mutex_b。由于這兩個互斥量已經被占用,所以產生死鎖。
? ? ? ? 這是通過代碼分析出來的,但是對于比較大的工程項目,我們則需要通過工具來分析。下面我們使用valgrind來分析
valgrind --tool=drd --trace-mutex=yes ./dead_lock
? ? ? ? 我們使用上面指令,讓valgrind把互斥量相關的信息給打印出來
==4749== [1] mutex_init mutex 0x30a040
==4749== [1] mutex_init mutex 0x30a0a0
==4749== [1] mutex_init mutex 0x1ffefffe10
==4749== [1] mutex_ignore_ordering mutex 0x1ffefffe10
==4749== [1] mutex_trylock mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] mutex_unlock mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] mutex_unlock mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock mutex 0x30a040 rc 0 owner 0
==4749== [2] post_mutex_lock mutex 0x30a040 rc 0 owner 0
==4749== [1] cond_post_wait mutex 0x1ffefffe10 rc 0 owner 2
==4749== [1] mutex_unlock mutex 0x1ffefffe10 rc 1
==4749== [1] mutex_destroy mutex 0x1ffefffe10 rc 0 owner 1
==4749== [1] mutex_trylock mutex 0x30a0a0 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x30a0a0 rc 0 owner 0
==4749== [1] mutex_trylock mutex 0x30a040 rc 1 owner 2
==4749== [2] mutex_trylock mutex 0x30a0a0 rc 1 owner 1
? ? ? ? 第18行顯示線程1試圖給0x30a040互斥量上鎖,但是該互斥量的所有者(owner)是線程2。
? ? ? ? 第19行顯示線程2試圖給0x30a0a0互斥量上鎖,但是該互斥量的所有者(owner)是線程1。
? ? ? ? 如此我們便可以確定這段程序卡住是因為死鎖導致的。
? ? ? ? 但是DRD有個問題,不能指出發生死鎖的位置。這個時候Helgrind該出場了。
valgrind --tool=helgrind ./dead_lock
? ? ? ? helgrind執行時,如果發生死鎖,需要ctrl+C來終止運行,于是可以得到如下結果
==5373== Process terminating with default action of signal 2 (SIGINT)
==5373== at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373== by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373== by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373== by 0x108A11: lock (dead_lock.c:12)
==5373== by 0x108AF4: main (dead_lock.c:38)
==5373== ---Thread-Announcement------------------------------------------
==5373==
==5373== Thread #2 was created
==5373== at 0x518287E: clone (clone.S:71)
==5373== by 0x4E49EC4: create_thread (createthread.c:100)
==5373== by 0x4E49EC4: pthread_create@@GLIBC_2.2.5 (pthread_create.c:797)
==5373== by 0x4C36A27: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373== by 0x108AEA: main (dead_lock.c:36)
==5373==
==5373== ----------------------------------------------------------------
==5373==
==5373== Thread #2: Exiting thread still holds 1 lock
==5373== at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373== by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373== by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373== by 0x108A5C: thread_routine (dead_lock.c:23)
==5373== by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373== by 0x4E496DA: start_thread (pthread_create.c:463)
==5373== by 0x518288E: clone (clone.S:95)
==5373==
==5373== ---Thread-Announcement------------------------------------------
==5373==
==5373== Thread #1 is the program's root thread
==5373==
==5373== ----------------------------------------------------------------
==5373==
==5373== Thread #1: Exiting thread still holds 1 lock
==5373== at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373== by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373== by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373== by 0x108A11: lock (dead_lock.c:12)
==5373== by 0x108AF4: main (dead_lock.c:38)
? ? ? ? 第22和37行分別顯示子線程和主線程在中斷之前,都鎖在哪行,這樣就更容易定位問題了。
總結
以上是生活随笔為你收集整理的死锁问题分析的利器——valgrind的DRD和Helgrind的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 互斥量、读写锁长占时分析的利器——val
- 下一篇: 数据竞争(data race)问题分析的