CLOSE_WAIT状态的原因与解决方法 --转
轉(zhuǎn)自:http://blog.chinaunix.net/uid-20357359-id-1963662.html
這個問題之前沒有怎么留意過,是最近在面試過程中遇到的一個問題,面了兩家公司,兩家公司竟然都面到到了這個問題,不得不使我開始關注這個問題。說起CLOSE_WAIT狀態(tài),如果不知道的話,還是先瞧一下TCP的狀態(tài)轉(zhuǎn)移圖吧。
關閉socket分為主動關閉(Active closure)和被動關閉(Passive closure)兩種情況。前者是指有本地主機主動發(fā)起的關閉;而后者則是指本地主機檢測到遠程主機發(fā)起關閉之后,作出回應,從而關閉整個連接。將關閉部分的狀態(tài)轉(zhuǎn)移摘出來,就得到了下圖:
產(chǎn)生原因
通過圖上,我們來分析,什么情況下,連接處于CLOSE_WAIT狀態(tài)呢?
在被動關閉連接情況下,在已經(jīng)接收到FIN,但是還沒有發(fā)送自己的FIN的時刻,連接處于CLOSE_WAIT狀態(tài)。
通常來講,CLOSE_WAIT狀態(tài)的持續(xù)時間應該很短,正如SYN_RCVD狀態(tài)。但是在一些特殊情況下,就會出現(xiàn)連接長時間處于CLOSE_WAIT狀態(tài)的情況。
出現(xiàn)大量close_wait的現(xiàn)象,主要原因是某種情況下對方關閉了socket鏈接,但是我方忙與讀或者寫,沒有關閉連接。代碼需要判斷socket,一旦讀到0,斷開連接,read返回負,檢查一下errno,如果不是AGAIN,就斷開連接。
參考資料4中描述,通過發(fā)送SYN-FIN報文來達到產(chǎn)生CLOSE_WAIT狀態(tài)連接,沒有進行具體實驗。不過個人認為協(xié)議棧會丟棄這種非法報文,感興趣的同學可以測試一下,然后把結果告訴我;-)
為了更加清楚的說明這個問題,我們寫一個測試程序,注意這個測試程序是有缺陷的。
只要我們構造一種情況,使得對方關閉了socket,我們還在read,或者是直接不關閉socket就會構造這樣的情況。
server.c:
| #include <stdio.h> #include <string.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(void) { ?? ?struct sockaddr_in servaddr, cliaddr; ?? ?socklen_t cliaddr_len; ?? ?int listenfd, connfd; ?? ?char buf[MAXLINE]; ?? ?char str[INET_ADDRSTRLEN]; ?? ?int i, n; ?? ?listenfd = socket(AF_INET, SOCK_STREAM, 0); ??????? int opt = 1; ??????? setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); ?? ?bzero(&servaddr, sizeof(servaddr)); ?? ?servaddr.sin_family = AF_INET; ?? ?servaddr.sin_addr.s_addr = htonl(INADDR_ANY); ?? ?servaddr.sin_port = htons(SERV_PORT); ?? ? ?? ?bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); ?? ?listen(listenfd, 20); ?? ?printf("Accepting connections ...\n"); ?? ?while (1) { ?? ??? ?cliaddr_len = sizeof(cliaddr); ?? ??? ?connfd = accept(listenfd,? ?? ??? ??? ??? ?(struct sockaddr *)&cliaddr, &cliaddr_len); ?? ??? ?//while (1)? ??????????????? { ?? ??? ??? ?n = read(connfd, buf, MAXLINE); ?? ??? ??? ?if (n == 0) { ?? ??? ??? ??? ?printf("the other side has been closed.\n"); ?? ??? ??? ??? ?break; ?? ??? ??? ?} ?? ??? ??? ?printf("received from %s at PORT %d\n", ?? ??? ??? ??????? inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)), ?? ??? ??? ??????? ntohs(cliaddr.sin_port)); ?? ? ?? ??? ??? ?for (i = 0; i < n; i++) ?? ??? ??? ??? ?buf[i] = toupper(buf[i]); ?? ??? ??? ?write(connfd, buf, n); ?? ??? ?} ??????? //這里故意不關閉socket,或者是在close之前加上一個sleep都可以 ??????? //sleep(5); ?? ??? ?//close(connfd); ?? ?} } |
client.c:
| #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(int argc, char *argv[]) { ?? ?struct sockaddr_in servaddr; ?? ?char buf[MAXLINE]; ?? ?int sockfd, n; ?? ?char *str; ?? ? ?? ?if (argc != 2) { ?? ??? ?fputs("usage: ./client message\n", stderr); ?? ??? ?exit(1); ?? ?} ?? ?str = argv[1]; ?? ? ?? ?sockfd = socket(AF_INET, SOCK_STREAM, 0); ?? ?bzero(&servaddr, sizeof(servaddr)); ?? ?servaddr.sin_family = AF_INET; ?? ?inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr); ?? ?servaddr.sin_port = htons(SERV_PORT); ?? ? ?? ?connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); ?? ?write(sockfd, str, strlen(str)); ?? ?n = read(sockfd, buf, MAXLINE); ?? ?printf("Response from server:\n"); ?? ?write(STDOUT_FILENO, buf, n); ??? write(STDOUT_FILENO, "\n", 1); ?? ?close(sockfd); ?? ?return 0; } |
結果如下:
| debian-wangyao:~$ ./client a Response from server: A debian-wangyao:~$ ./client b Response from server: B debian-wangyao:~$ ./client c Response from server: C debian-wangyao:~$ netstat -antp | grep CLOSE_WAIT (Not all processes could be identified, non-owned process info ?will not be shown, you would have to be root to see it all.) tcp??????? 1????? 0 127.0.0.1:8000????????? 127.0.0.1:58309???????? CLOSE_WAIT? 6979/server????? tcp??????? 1????? 0 127.0.0.1:8000????????? 127.0.0.1:58308???????? CLOSE_WAIT? 6979/server????? tcp??????? 1????? 0 127.0.0.1:8000????????? 127.0.0.1:58307???????? CLOSE_WAIT? 6979/server?? |
解決方法
基本的思想就是要檢測出對方已經(jīng)關閉的socket,然后關閉它。
1.代碼需要判斷socket,一旦read返回0,斷開連接,read返回負,檢查一下errno,如果不是AGAIN,也斷開連接。(注:在UNP 7.5節(jié)的圖7.6中,可以看到使用select能夠檢測出對方發(fā)送了FIN,再根據(jù)這條規(guī)則就可以處理CLOSE_WAIT的連接)
2.給每一個socket設置一個時間戳last_update,每接收或者是發(fā)送成功數(shù)據(jù),就用當前時間更新這個時間戳。定期檢查所有的時間戳,如果時間戳與當前時間差值超過一定的閾值,就關閉這個socket。
3.使用一個Heart-Beat線程,定期向socket發(fā)送指定格式的心跳數(shù)據(jù)包,如果接收到對方的RST報文,說明對方已經(jīng)關閉了socket,那么我們也關閉這個socket。
4.設置SO_KEEPALIVE選項,并修改內(nèi)核參數(shù)
前提是啟用socket的KEEPALIVE機制:
//啟用socket連接的KEEPALIVE
int iKeepAlive = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *)&iKeepAlive, sizeof(iKeepAlive));
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
?????? The number of seconds between TCP keep-alive probes.
tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
?????? The? maximum? number? of? TCP? keep-alive? probes? to? send before giving up and killing the connection if no response is obtained from the other end.
tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
?????? The number of seconds a connection needs to be idle before TCP begins sending out? keep-alive? probes.?? Keep-alives? are only? sent when the SO_KEEPALIVE socket option is enabled.? The default value is 7200 seconds (2 hours).? An idle connec‐tion is terminated after approximately an additional 11 minutes (9 probes an interval of 75? seconds? apart)? when? keep-alive is enabled.
echo 120 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 2 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes
除了修改內(nèi)核參數(shù)外,可以使用setsockopt修改socket參數(shù),參考man 7 socket。
| int KeepAliveProbes=1; int KeepAliveIntvl=2; int KeepAliveTime=120; setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (void *)&KeepAliveProbes, sizeof(KeepAliveProbes)); setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (void *)&KeepAliveTime, sizeof(KeepAliveTime)); setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (void *)&KeepAliveIntvl, sizeof(KeepAliveIntvl)); |
參考:
http://blog.chinaunix.net/u/20146/showart_1217433.html
http://blog.csdn.net/eroswang/archive/2008/03/10/2162986.aspx
http://haka.sharera.com/blog/BlogTopic/32309.htm
http://learn.akae.cn/media/ch37s02.html
http://faq.csdn.net/read/208036.html
http://www.cndw.com/tech/server/2006040430203.asp
http://davidripple.bokee.com/1741575.html
http://doserver.net/post/keepalive-linux-1.php
man 7 tcp
?
轉(zhuǎn)載于:https://www.cnblogs.com/davidwang456/p/3717640.html
《新程序員》:云原生和全面數(shù)字化實踐50位技術專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的CLOSE_WAIT状态的原因与解决方法 --转的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 多台Linux服务器SSH相互访问无需密
- 下一篇: keepalived + haproxy