go panic recover 思考
panic 作為 Go 異常的一種,類比其它語言的 Exception。
常見的 panic 有數組下標越界,或者除數為0,這類 panic 可以通過 recover 進行捕獲。但要特別注意,程序發生 panic 會導致當前協程內,觸發位置后續的邏輯執行不到,如果外層依賴了協程內的變量,要做好安全處理。
下面的例子中,協程內部在給 items 賦值之前,程序發生了 panic,導致 items 賦值失敗。之后,在 main 函數中直接讀取 items[0] 時又會導致進程 panic。
func main() {items := make([]int, 0, 1)go func() {defer func() { recover() }()panic(fmt.Errorf("error"))items = append(items, 1)}()fmt.Println(items[0]) }還存在一類 panic,recover 也無法捕獲。如果觸發了這類 panic,進程就會無腦退出,最常見的是:map 并發讀寫。
后續涉及 go 源碼的版本信息:go version go1.16.6 darwin/amd64
map 并發讀寫
map 并發讀寫的 panic,會導致進程退出。
當下線上的生產環境,一般主進程都是 supervisor,當檢測到 Go 進程異常退出時,supervisor 會重新啟動一個新的 Go 進程來提供線上服務。Go 進程啟動的過程非常快,監控上如果不特殊檢測這一類異常重啟的場景,這種 panic 并不容易被發現。
我們查看 throw("concurrent map read and map write") 方法,底層調用了 fatalthrow,最終會調用 exit 方法按退出進程。
// fatalthrow implements an unrecoverable runtime throw. It freezes the // system, prints stack traces starting from its caller, and terminates the // process. // //go:nosplit func fatalthrow() {pc := getcallerpc()sp := getcallersp()gp := getg()// Switch to the system stack to avoid any stack growth, which// may make things worse if the runtime is in a bad state.systemstack(func() {startpanic_m()if dopanic_m(gp, pc, sp) {// crash uses a decent amount of nosplit stack and we're already// low on stack in throw, so crash on the system stack (unlike// fatalpanic).crash()}exit(2)})*(*int)(nil) = 0 // not reached }map 并發讀寫會發生 panic,但如果只是并發讀,并不會發生 panic。
recover
當函數發生panic,函數的后續執行會立即被終止。接著,內部的defer函數會被執行。之后將panic傳遞給該函數的調用者。重復上述的流程,直到當前gorouter內的函數全部返回。之后程序打印panic傳遞的信息,緊跟著打印調用棧的信息。最后該gorouter終止。
recover用來阻止panic,恢復程序正常執行。但需要注意以下幾點:
打印堆棧
recover 之后輸出堆棧信息,是比較常規的操作。通過堆棧信息,可以定位到觸發 panic 的文件以及代碼行。
工作中,經常會看到,通過下面的代碼來輸出調用棧信息,調用 debug 包現成的方法。我們可以看下這個方法的實現:
fmt.Println(string(debug.Stack()))Stack 會輸出所有的調用棧信息,初始的 buf 長度為 1024。退出 for 循環的條件是:buf 中存儲了所有調用棧的信息。
每次 for 循環,會將 buf 的長度擴大 1 倍,重新去執行調用棧的方法。猜想一下,在執行 runtime.Stack 方法的時候,參數切片 buf 內部可能會發生擴容,導致外層 buf 拿不到全部的堆棧信息。
雖然 buf 是引用傳遞,但 buf 擴容前后是兩個獨立的引用,這個比較好理解。重點在小于 < 符號上,當兩者的關系是大于或者等于時,buf 中存儲的數據都可能是不完整的。
func Stack() []byte {buf := make([]byte, 1024)for {n := runtime.Stack(buf, false)if n < len(buf) {return buf[:n]}buf = make([]byte, 2*len(buf))} }直接調用 debug.Stack 唯一的負面問題,如果函數的調用棧特別深,輸出的堆棧信息就會特別大,可能會導致 for 循環執行多次,對應的 buf 也擴容多次。
很多情況下,我們也不需要那么完整的堆棧信息,我們可以對堆棧的輸出內容做長度限制,比如下面的例子。
和 debug.Stack 函數的唯一區別就在于:只調用一次 runtime.Stack 函數,函數調用棧最多輸出 2048 個字節長度。老實說,這樣的優化并沒有什么意義。
//1. using runtime if p := recover(); p != nil { //打印調用棧信息buf := make([]byte, 2048)n := runtime.Stack(buf, false)stackInfo := fmt.Sprintf("%s", buf[:n])logs.Error("panic stack info %s", stackInfo) }gin recover
我們來看一下 gin 框架輸出的 stack 信息
// -- github.com/gin-gonic/gin/recovery.go:88 // stack returns a nicely formatted stack frame, skipping skip frames. func stack(skip int) []byte {buf := new(bytes.Buffer) // the returned data// As we loop, we open files and read them. These variables record the currently// loaded file.var lines [][]bytevar lastFile stringfor i := skip; ; i++ { // Skip the expected number of framespc, file, line, ok := runtime.Caller(i)if !ok {break}// Print this much at least. If we can't find the source, it won't show.fmt.Fprintf(buf, "%s:%d (0x%x)\n", file, line, pc)if file != lastFile {data, err := ioutil.ReadFile(file)if err != nil {continue}lines = bytes.Split(data, []byte{'\n'})lastFile = file}fmt.Fprintf(buf, "\t%s: %s\n", function(pc), source(lines, line))}return buf.Bytes() }Panic 和 Recover 的聯系
在 panic 的過程中, panic 傳入的參數用來作為 recover 函數的返回。
下面的例子中,聲明了一個 inner 類型的結構體。panic 的時候,我們指定的入參是一個 inner 結構體變量,inner 的 Msg 成員值為 Thank。然后,我們對 recover 的返回做斷言處理(因為返回類型為 interface),直接斷言它為 inner 值類型。
工作中,我們經常遇到的切片下標越界,go 在處理到這種類型的 panic 時,默認傳遞的就是 runtime 包下的 boundsError(A boundsError represents an indexing or slicing operation gone wrong.)。
type inner struct {Msg string }func main() {defer func() {if r := recover(); r != nil {fmt.Print(r.(inner))}}()panic(inner{Msg: "Thank"}) }panic 嵌套
當程序 panic 之后,調用 defer 函數時又觸發了程序再次 panic。在程序的錯誤棧輸出信息中,三處 panic 的錯誤信息都輸出了。
我們不使用任何 recover ,查看 panic 的輸出信息。從代碼末尾的注釋中可以發現,三個 panic 都觸發了,而且輸出中也包含了三個 panic 的信息。
func main() {go func() {// defer 1defer func() {// defer 2defer func() {panic("call panic 3")}()panic("call panic 2")}()panic("call panic 1")}()for{} }//output: //panic: call panic 1 // panic: call panic 2 // panic: call panic 3 // //goroutine 18 [running]: //main.main.func1.1.1() // /Users/fuhui/Desktop/panic/main.go:10 +0x39接下來,我們代碼做 recover 處理,觀察程序的輸出情況。上面的示例中,程序依次觸發了 panic 1、2、3。現在我們修改代碼,對 panic 3 做捕獲處理,程序還會繼續 panic 嗎?
我們在代碼中又嵌套追加了第三個 defer,對 panic 3 進行捕獲。從代碼的輸出結果中,我們可以發現,代碼還是 panic 了。
雖然我們還不了解具體的實現,但至少我們可以明白:Go 程序中的 panic 都需要被 recover 處理掉,才不會觸發程序終止。如果只處理鏈路中的最后一個,程序還是會異常終止。
我們稍作調整,在 defer 3 中再寫三個 recover 語句可行嗎?這樣也是不可行的,defer、panic、recover 需要是一體的,大家可以自行驗證。
func main() {go func() {// defer 1defer func() {// defer 2defer func() {// defer 3defer func() {if r := recover(); r != nil{fmt.Println("recover", r)}}()panic("call panic 3")}()panic("call panic 2")}()panic("call panic 1")}()for{} }//output: //recover panic 3 //panic: call panic 1 // panic: call panic 2 // //goroutine 18 [running]:源碼
Go 源碼版本
確定 Go 源碼的版本
? server go version go version go1.15.1 darwin/amd64gopanic
我們來看 panic 的類型結構:
arg 作為 panic 是的入參,對應我們調用 panic 函數是的入參。在后續 recover 的時候會返回這個參數。
link 作為一個 _panic 類型指針,通過這個類型,可以說明:在 Goroutine 內部 _panic 是按照鏈表的結構存儲的。在一個 goroutine 內,可能會出現多個 panic,但這些 panic 信息都會被存儲。
// A _panic holds information about an active panic. // // This is marked go:notinheap because _panic values must only ever // live on the stack. // // The argp and link fields are stack pointers, but don't need special // handling during stack growth: because they are pointer-typed and // _panic values only live on the stack, regular stack pointer // adjustment takes care of them. // //go:notinheap type _panic struct {argp unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblinkarg interface{} // argument to paniclink *_panic // link to earlier panicpc uintptr // where to return to in runtime if this panic is bypassedsp unsafe.Pointer // where to return to in runtime if this panic is bypassedrecovered bool // whether this panic is overaborted bool // the panic was abortedgoexit bool }gopanic 方法體代碼比較長,我們直接在注釋中對它進行標注和分析
// The implementation of the predeclared function panic. func gopanic(e interface{}) {gp := getg()if gp.m.curg != gp {print("panic: ")printany(e)print("\n")throw("panic on system stack")}if gp.m.mallocing != 0 {print("panic: ")printany(e)print("\n")throw("panic during malloc")}if gp.m.preemptoff != "" {print("panic: ")printany(e)print("\n")print("preempt off reason: ")print(gp.m.preemptoff)print("\n")throw("panic during preemptoff")}if gp.m.locks != 0 {print("panic: ")printany(e)print("\n")throw("panic holding locks")}// 創建了這個 panic 對象,將這個 panic 對象的 link 指針指向當前 goroutine 的 _panic 列表// 說白了就是一個鏈表操作,將當前 panic 插入到當前 goroutine panic 鏈表的首位置var p _panicp.arg = ep.link = gp._panicgp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))atomic.Xadd(&runningPanicDefers, 1)// By calculating getcallerpc/getcallersp here, we avoid scanning the// gopanic frame (stack scanning is slow...)addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))for {// 循環獲取 gp 的 defer,這里不展開,但 _defer 也是跟 _panic 一樣按照鏈表結構進行存儲的。d := gp._deferif d == nil {break}// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),// take defer off list. An earlier panic will not continue running, but we will make sure below that an// earlier Goexit does continue running.if d.started {if d._panic != nil {d._panic.aborted = true}d._panic = nilif !d.openDefer {// For open-coded defers, we need to process the// defer again, in case there are any other defers// to call in the frame (not including the defer// call that caused the panic).d.fn = nilgp._defer = d.linkfreedefer(d)continue}}// Mark defer as started, but keep on list, so that traceback// can find and update the defer's argument frame if stack growth// or a garbage collection happens before reflectcall starts executing d.fn.d.started = true// Record the panic that is running the defer.// If there is a new panic during the deferred call, that panic// will find d in the list and will mark d._panic (this panic) aborted.d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))done := trueif d.openDefer {done = runOpenDeferFrame(gp, d)if done && !d._panic.recovered {addOneOpenDeferFrame(gp, 0, nil)}} else {p.argp = unsafe.Pointer(getargp(0))reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))}p.argp = nil// reflectcall did not panic. Remove d.if gp._defer != d {throw("bad defer entry in panic")}d._panic = nil// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic//GC()pc := d.pcsp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copyif done {d.fn = nilgp._defer = d.linkfreedefer(d)}if p.recovered {gp._panic = p.linkif gp._panic != nil && gp._panic.goexit && gp._panic.aborted {// A normal recover would bypass/abort the Goexit. Instead,// we return to the processing loop of the Goexit.gp.sigcode0 = uintptr(gp._panic.sp)gp.sigcode1 = uintptr(gp._panic.pc)mcall(recovery)throw("bypassed recovery failed") // mcall should not return}atomic.Xadd(&runningPanicDefers, -1)if done {// Remove any remaining non-started, open-coded// defer entries after a recover, since the// corresponding defers will be executed normally// (inline). Any such entry will become stale once// we run the corresponding defers inline and exit// the associated stack frame.d := gp._defervar prev *_deferfor d != nil {if d.openDefer {if d.started {// This defer is started but we// are in the middle of a// defer-panic-recover inside of// it, so don't remove it or any// further defer entriesbreak}if prev == nil {gp._defer = d.link} else {prev.link = d.link}newd := d.linkfreedefer(d)d = newd} else {prev = dd = d.link}}}gp._panic = p.link// Aborted panics are marked but remain on the g.panic list.// Remove them from the list.for gp._panic != nil && gp._panic.aborted {gp._panic = gp._panic.link}if gp._panic == nil { // must be done with signalgp.sig = 0}// Pass information about recovering frame to recovery.gp.sigcode0 = uintptr(sp)gp.sigcode1 = pcmcall(recovery)throw("recovery failed") // mcall should not return}}// ran out of deferred calls - old-school panic now// Because it is unsafe to call arbitrary user code after freezing// the world, we call preprintpanics to invoke all necessary Error// and String methods to prepare the panic strings before startpanic.preprintpanics(gp._panic)fatalpanic(gp._panic) // should not return*(*int)(nil) = 0 // not reached }gorecover
源碼中的 getg() 方法返回當前的 goroutine,之后是獲取當前 Go 的 panic 信息。緊接著 if 判斷,如果條件符合的話,將這個 panic 對象的 recovered 屬性設置為 true,也就是標記為被處理了,并返回的是這個 panic 的參數。如果 if 條件不滿足的話,表示沒有 panic 對象被捕獲,返回空。
// The implementation of the predeclared function recover. // Cannot split the stack because it needs to reliably // find the stack segment of its caller. // // TODO(rsc): Once we commit to CopyStackAlways, // this doesn't need to be nosplit. //go:nosplit func gorecover(argp uintptr) interface{} {// Must be in a function running as part of a deferred call during the panic.// Must be called from the topmost function of the call// (the function used in the defer statement).// p.argp is the argument pointer of that topmost deferred function call.// Compare against argp reported by caller.// If they match, the caller is the one who can recover.gp := getg()p := gp._panicif p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {p.recovered = truereturn p.arg}return nil }注:recover函數捕獲的是祖父一級調用函數棧的異常。必須要和有異常的棧幀只隔一個棧幀,recover函數才能正捕獲異常。
總結
以上是生活随笔為你收集整理的go panic recover 思考的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 机箱ESD保护设计
- 下一篇: Web服务器需要警惕的安全隐患是什么?