服务器⼤量的fin_wait1状态长时间存在原因分析
有⼀台服务器,出现很多的fin_wait1状态的socket。
环境:
[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.32-358.el6.x86_64
链路情况如下:
ss -s
Total: 2753 (kernel 3336)
TCP: 6046 (estab 730, clod 5001, orphaned 0, synrecv 0, timewait 5000/0), ports 564
Transport Total IP IPv6
* 3336 - -
RAW 1 1 0
UDP 599 599 0
TCP 1045 1037 8
INET 1645 1637 8
FRAG 0 0 0
[root@localhost ~]# ss -t state fin-wait-1
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 384259 rver_ip:rverport 10.231.18.150:44581
0 763099 rver_ip:rverport 10.231.17.55:45202
0 543095 rver_ip:rverport 10.231.22.76:35348
0 2379216 rver_ip:rverport 10.231.22.37:56283
0 1237680 rver_ip:rverport 10.231.17.161:48815
0 1720677 rver_ip:rverport 10.231.16.73:51550
0 619159 rver_ip:rverport 10.231.138.28:58986肱二头肌长头
生意对联大全0 474399 rver_ip:rverport 10.231.18.82:45256
0 928420 rver_ip:rverport 10.231.20.171:53326
0 27771 rver_ip:rverport 10.231.138.38:58963书海沧生
0 614664 rver_ip:rverport 10.231.26.94:51083
0 152535 rver_ip:rverport 10.231.19.184:43375
只贴出⼀部分。
仔细看,发现fin-wait-1的发送队列中,还有⼀部分数据没有发送到对端,当然⽤户态clo的时候,这种情况是允许的,是没有经过确认的报⽂缓存。悲伤朱丽叶
看fin-wait-1状态的socket占⽤了多少内存:
ss -tn state fin-wait-1|grep -v Recv-Q|awk 'BEGIN{sum=0}{sum+=$2}END{printf "%.2f\n",sum}'
221797627.00
发现占⽤还是很多的,因为这个是我在修改tcp_max_orphans为很⼩的情况下才占⽤的200多M,事实上在我没处理之前,占⽤达到了
16G。
3个月宝宝身高体重标准
ss⾥⾯还有个-m选项,也是查看socket的内存的,不过我没怎么喜欢⽤,它主要查看的是:
if (tb[INET_DIAG_MEMINFO]) {
const struct inet_diag_meminfo *minfo同窗汇
= RTA_DATA(tb[INET_DIAG_MEMINFO]);
printf(" mem:(r%u,w%u,f%u,t%u)",
minfo->idiag_rmem,
minfo->idiag_wmem,
minfo->idiag_fmem,
minfo->idiag_tmem);
⽐如:
[root@ZSL-VS3000-3 ~]# ss -miten -o state last-ack|head -8
Recv-Q Send-Q Local Address:Port Peer Address:Port
1749.115.46.140:60422218.31.255.143:21 timer:(persist,097ms,1) ino:0 sk:ffff880d91cdd080
mem:(r0,w558,f3538,t0) sack cubic wscale:9,9 rto:202 rtt:2.625/1.75 ato:40 cwnd:1 ssthresh:7 nd 4.4Mbps rcv_space:14600
05149.115.46.140:21110.157.1.10:54611 timer:(persist,147ms,1) ino:0 sk:ffff88140f9b0340
mem:(r0,w602,f3494,t0) sack cubic wscale:9,9 rto:213 rtt:13.625/10.25 ato:40 cwnd:1 ssthresh:7 nd 857.2Kbps rcv_space:14600
05149.115.46.140:21110.157.2.9:59688 timer:(persist,174ms,1) ino:0 sk:ffff880560f88a40
mem:(r0,w602,f3494,t0) sack cubic wscale:9,9 rto:219 rtt:19.875/11.75 ato:40 cwnd:1 ssthresh:3 nd 587.7Kbps rcv_space:14600
05149.115.46.140:21110.157.1.9:51252 timer:(persist,003ms,1) ino:0 sk:ffff88012d400d80
⽹上搜索,⼤多是修改tcp_fin_timeout,其实这个是针对FIN-WAIT-2的。
tcp_fin_timeout- INTEGER Time to hold socket in state FIN-WAIT-2, if it was clod
by our side. Peer can be broken and never clo its side,
or even died unexpectedly. Default value is 60c.
Usual value ud in 2.2 was 180 conds, you may restore
it, but remember that if your machine is even underloaded WEB rver, you risk to overflow memory with kilotons of dead sockets,
FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
becau they eat maximum 1.5K of memory, but they tend
to live longer. Cf. tcp_max_orphans.
内核中对TCP_FIN_WAIT1的设置是:
} el if (tcp_clo_state(sk)) {
/* We FIN if the application ate all the data before
* zapping the connection.
*/
/* RED-PEN. Formally speaking, we have broken TCP state
* machine. State transitions:
*
* TCP_ESTABLISHED -> TCP_FIN_WAIT1
* TCP_SYN_RECV -> TCP_FIN_WAIT1 (forget it, it's impossible)
* TCP_CLOSE_WAIT -> TCP_LAST_ACK
读万卷书的下一句*
* are legal only when FIN has been nt (i.e. in window),
* rather than queued out of window. Purists blame.
*
* F.e. "RFC state" is ESTABLISHED,
* if Linux state is FIN-WAIT-1, but FIN is still not nt.
*
* The visible declinations are that sometimes
* we enter time-wait state, when it is not required really
* (harmless), do not nd active rets, when they are
* required by specs (TCP_ESTABLISHED, TCP_CLOSE_WAIT, when
* they look as CLOSING or LAST_ACK for Linux)
* Probably, I misd some more holelets.
* --ANK
*/
tcp_nd_fin(sk);
也就是先设置fin-wait-1状态,然后再发fin包,
理论上说,fin-wait-1的状态应该很难看到才对,因为只要收到对⽅的ack,就应该迁移到fin-wait-2了,如果收到对⽅的fin,则应该迁移到closing状态。但是该服务器上就很多。难道没有收到ack或者fin包?抓包验证下:
进⽽抓包, tcpdump -i eth0 "tcp[tcpflags] & (tcp-fin) != 0" -s 90 -p -n
根据报⽂,看到了fin包的重传,⽽有时候刚好遇到对端通告的窗⼝为0,所以进⼊fin-wait-1后呆的时
间就⽐较长。
⽐如我看到fin-wait-1之后,我可以抓之后的交互包,
[root@localhost ~]# ss -to state Fin-wait-1 |head -200 |grep -i 10.96.152.219
0 387543 服务器ip 客户端ip 1 timer:(persist,1.140ms,0)
在这⾥我们看到了坚持定时器,通过脚本,我获取了fin-wait-1的链路,有的定时器的长度退避之后,达到了120s,然后⼀直120s进⾏探测。
在这⾥说⼀下脚本的思路:
#!/bin/bash
while [ 1 ]
do
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端⼝1|grep -v 服务器端⼝2|sort -n >caq_old
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端⼝1|grep -v 服务器端⼝1|sort -n >caq_new
diff caq_old caq_new|grep '>'|awk '{print $2}' >diff_caq
sleep 1
###1s之后,重新获取fin-wait-1状态的链路情况,如果新增的链路还存在,那么说明该链路⾄少存活超过1s,主要是抓坚持定时器⽐较长的情况
ss -tn state Fin-wait-1 src 服务器ip |awk '{print $4}'|grep -v 服务器端⼝1|grep -v 服务器端⼝2|sort -n >caq_new
while read line
do
grep -q $line caq_new
if [ $? -eq 0 ]
then
###live for 1 cond
echo $line
exit
el
红豆豆浆continue;
fi
done < diff_caq
done
然后抓包:
./get_fin.sh |awk -F '[:]' '{print $2}'|xargs -t tcpdump -i eth1 -s 90 -c 200 -p -n port
可以看到部分抓包:
20:34:26.355115 IP 服务器ip和端⼝ > 客户端ip和端⼝: Flags [.], ack 785942062, win 123, length 0
20:34:26.358210 IP 客户端ip和端⼝ >服务器ip和端⼝: Flags [.], ack 1, win 0, length 0
20:34:28.159117 IP服务器ip和端⼝ > 客户端ip和端⼝: Flags [.], ack 1, win 123, length 0
20:34:28.162169 IP 客户端ip和端⼝ >服务器ip和端⼝: Flags [.], ack 1, win 0, length 0
20:34:31.761502 IP 服务器ip和端⼝ > 客户端ip和端⼝: Flags [.], ack 1, win 123, length 0
20:34:31.765416 IP 客户端ip和端⼝ > 服务器ip和端⼝: Flags [.], ack 1, win 0, length 0