iptables还是jbd2惹的祸?

1. 故障说明

1月6日,监控告警某套系统MySQL的备库,Slave进程宕了,直接登录服务器进行诊断处理。

2. 检查slave状态

# ps -ef | grep mysqld
root      1819     1  0  2016 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysqldata --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     2349  1819  1  2016 ?        3-14:55:20 /usr/sbin/mysqld --basedir=/usr --datadir=/data/mysqldata --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/mysql-slave.err --open-files-limit=65535 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306

登录到服务器检查,发现MySQL的服务器进程还在,通过mysql命令可以登录数据库,但是检查slave状态时,发现夯住了。

# mysql -uroot -pxxxxxx
mysql> show slave status\G

3. 检查mysql的error-log

查看mysql的错误日志文件:/var/lib/mysql/mysql-slave.err,发现如下信息

--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=2349, Main thread ID=140679337576192, state: flushing log
Number of rows inserted 54516268, updated 15115416, deleted 1939641, read 101480991083
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
......
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 1
22963 OS file reads, 84948224 OS file writes, 59117846 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
......

以上信息表明,MySQL正在flushing log,但是正在等待aio请求的完成。可能是IO出现了故障。但是检查操作系统日志/var/log/messages没有任何IO错误信息。

4. strace跟踪系统调用

由于MySQL无其它手段跟踪后台调用,就使用strace跟踪MySQL后台进程的系统调用。
# strace -p 2349 2>&1|grep -v time
Process 2349 attached – interrupt to quit
^C
发现strace跟踪MySQL后台进程得不到任何系统调用,看样子MySQL后台进程完全夯住了。

5. 检查磁盘情况

出现这种情况一般是存储故障了,导致进程IO夯死。检查操作系统日志/var/log/messages,没有发现任何存储的IO报错。测试文件系统写入正常:

# df -h
文件系统              容量  已用  可用 已用%% 挂载点
/dev/mapper/VolGroup-LogVol00
                      5.0G  381M  4.3G   8% /
tmpfs                  32G   72K   32G   1% /dev/shm
/dev/vda1             194M   33M  152M  18% /boot
/dev/mapper/VolGroup-LogVol01
                      4.0G  2.8G  1.1G  73% /home
/dev/mapper/VolGroup-LogVol02
                      7.9G  235M  7.3G   4% /opt
/dev/mapper/VolGroup-LogVol03
                       12G  7.4G  3.9G  66% /usr
/dev/mapper/VolGroup-LogVol04
                      7.9G  2.5G  5.1G  33% /var
/dev/mapper/VolGroup-lv_data
                       99G   28G   66G  30% /data

# time cp MySQL-test-5.6.22-1.el6.x86_64.rpm /data
real    0m0.502s
user    0m0.001s
sys     0m0.344s

# ls -l /dev/mapper/*
crw-rw---- 1 root root 10, 58 4月   4 2016 /dev/mapper/control
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-LogVol00 -> ../dm-1
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-LogVol01 -> ../dm-3
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-LogVol02 -> ../dm-5
lrwxrwxrwx 1 root root      7 4月   8 2016 /dev/mapper/VolGroup-LogVol03 -> ../dm-4
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-LogVol04 -> ../dm-2
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-LogVol05 -> ../dm-0
lrwxrwxrwx 1 root root      7 4月   4 2016 /dev/mapper/VolGroup-lv_data -> ../dm-6

# iostat -dmx 1
Linux 2.6.32-358.el6.x86_64 (bssyyjldb01)       2017年01月06日  _x86_64_        (8 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     6.00    0.00   0.00 100.00
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.10
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.10
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.10
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.10
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     5.94    0.00   0.00  99.01
vdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.99    0.00   0.00  99.01
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.99    0.00   0.00  99.01
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.99    0.00   0.00  99.01
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.99    0.00   0.00  99.01
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.99    0.00   0.00  99.01

dm设备的磁盘使用情况有点奇怪,基本没有任何的I/O读写,但是使用率确是100%。
这种情况下,主机工程师对虚机进行了迁移,迁移虚机没有宕机,完全在线。但是迁移完,情况没有好转。

6. 检查dmesg信息

在/var/log/messages中没有任何信息,但是在dmesg中有如下内容

# dmesg
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:47:41 EST 2013
Command line: ro root=/dev/mapper/VolGroup-LogVol00 intel_iommu=on rd_NO_LUKS rd_LVM_LV=VolGroup/LogVol05 rd_NO_MD rd_LVM_LV=VolGroup/LogVol00 LANG=zh_CN.UTF-8  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
......
INFO: task mysqld:2371 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld        D 0000000000000000     0  2371   1819 0x00000000
 ffff881004e63cc8 0000000000000082 0000000004e63c28 ffff881002e24ee8
 ffff881002e24ee8 ffff881002e24ee8 0000000000000cc1 0000000000000005
 ffff880ffc3a3af8 ffff881004e63fd8 000000000000fb88 ffff880ffc3a3af8
Call Trace:
 [<ffffffff8125e6f8>] ? blk_queue_bio+0x168/0x5a0
 [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
 [<ffffffff8125ccce>] ? generic_make_request+0x25e/0x530
 [<ffffffff8111c495>] ? mempool_alloc_slab+0x15/0x20
 [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff8125d02d>] ? submit_bio+0x8d/0x120
 [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
 [<ffffffff812606ad>] __blkdev_issue_flush+0xad/0xe0
 [<ffffffff812606f6>] blkdev_issue_flush+0x16/0x20
 [<ffffffffa00962c1>] ext4_sync_file+0x191/0x260 [ext4]
 [<ffffffff811b1a31>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811b1add>] vfs_fsync+0x1d/0x20
 [<ffffffff811b1b1e>] do_fsync+0x3e/0x60
 [<ffffffff811b1b70>] sys_fsync+0x10/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
.....
INFO: task mysqld:2373 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld        D 0000000000000000     0  2373   1819 0x00000000
 ffff881004e67df8 0000000000000082 ffff881004e67da8 ffff881003c0be80
 0000000000000000 ffff881003e290d0 0000000000000000 0000000000000000
 ffff880ffc0b9af8 ffff881004e67fd8 000000000000fb88 ffff880ffc0b9af8
Call Trace:
 [<ffffffff81096f6e>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa007bb45>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa009628e>] ext4_sync_file+0x15e/0x260 [ext4]
 [<ffffffff811b1a31>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811b1add>] vfs_fsync+0x1d/0x20
 [<ffffffff811b1b1e>] do_fsync+0x3e/0x60
 [<ffffffff811b1b70>] sys_fsync+0x10/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task mysqld:10922 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld        D 0000000000000000     0 10922   1819 0x00000000
 ffff880c63427df8 0000000000000082 0000000000000000 ffff880ffc5001e8
 ffffea003798d960 ffff880c63427de8 0000000000000000 00000000fffffffb
 ffff880c635e9098 ffff880c63427fd8 000000000000fb88 ffff880c635e9098
Call Trace:
 [<ffffffffa007bb45>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa009628e>] ext4_sync_file+0x15e/0x260 [ext4]
 [<ffffffff811b1a31>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811b1add>] vfs_fsync+0x1d/0x20
 [<ffffffff811b1b1e>] do_fsync+0x3e/0x60
 [<ffffffff811b1b70>] sys_fsync+0x10/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task jbd2/dm-2-8:1226 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-2-8   D 0000000000000002     0  1226      2 0x00000000
 ffff881005311c20 0000000000000046 ffff8810ffffffff 0000e6f2a8edaca7
 ffff881005311b90 ffff881004b39280 000000004c565096 ffffffffa8fe2692
 ffff881003edc5f8 ffff881005311fd8 000000000000fb88 ffff881003edc5f8
Call Trace:
 [<ffffffff810a1aa9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff811b5f60>] ? sync_buffer+0x0/0x50
 [<ffffffff8150de73>] io_schedule+0x73/0xc0
 [<ffffffff811b5fa0>] sync_buffer+0x40/0x50
 [<ffffffff8150e82f>] __wait_on_bit+0x5f/0x90
 [<ffffffff811b5f60>] ? sync_buffer+0x0/0x50
 [<ffffffff8150e8d8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
 [<ffffffff811b5f56>] __wait_on_buffer+0x26/0x30
 [<ffffffffa0076761>] jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff81081a3b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa007bf38>] kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa007be80>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff81096916>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff81096880>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
ip_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
NOHZ: local_softirq_pending 100
Clocksource tsc unstable (delta = 132855152 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.

从dmesg里的mysqld的报错堆栈来看,在做fsync时报错了,这与mysql的错误日志里的状态一致,错误日志里显示mysqld正在flashing logs。
在dmesg里还有一些jbd2/dm-x-x进程被阻塞。

7. jbd2进程是什么

jdb2进程是一个文件系统的写journal的进程,在ext4文件系统上出现过内核Bug,具体咨询红帽的工程师得到如下信息:
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds. This especially happens on systems with a lof of memory.
The problem is solved in later kernels and there is not “fix” from Oracle. I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf. This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.

jdb2进程的Bug,具体可以参考:修复ext4日志(jbd2)bug

8. iptables

在dmesg的信息里还发现了iptables的信息,经过检查iptables服务是启动状态,跟主机工程师确认了下,临时把iptables给停下。当使用service iptables stop后,mysqld的错误日志里就直接显示后台进程宕了,直接就重启了,而前面的show slave status\G检查命令也直接报错了,日志如下:

Process ID=2349, Main thread ID=140679337576192, state: flushing log
Number of rows inserted 54516268, updated 15115416, deleted 1939641, read 101480991101
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-01-06T06:53:43.268138Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2017-01-06 14:53:43 0x7ff27731a700  InnoDB: Assertion failure in thread 140679358555904 in file ut0ut.cc line 920
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
06:53:43 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=134217728
read_buffer_size=33554432
max_used_connections=25
max_threads=100
thread_count=11
connection_count=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6686004 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0xf18335]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x7aca44]
/lib64/libpthread.so.0[0x364c80f500]
/lib64/libc.so.6(gsignal+0x35)[0x364c4328a5]
/lib64/libc.so.6(abort+0x175)[0x364c434085]
/usr/sbin/mysqld(_Z18ut_print_timestampP8_IO_FILE+0x0)[0x79b907]
/usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0xb3)[0x111c8c3]
/usr/sbin/mysqld(srv_error_monitor_thread+0x98e)[0x10ccc1e]
/lib64/libpthread.so.0[0x364c807851]
/lib64/libc.so.6(clone+0x6d)[0x364c4e890d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-01-06T06:59:35.821617Z mysqld_safe Number of processes running now: 0
2017-01-06T06:59:35.829608Z mysqld_safe mysqld restarted
2017-01-06T06:59:36.539336Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details)
......

后面就是正常的重启过程了。
与主机工程师交流,iptables是因为安全加固最近才加上的。

9.带着怀疑的建议

从uptime来看,服务器已经有278天没有重启,也符合jbd2需要长时间积累才能触发Bug的前提,但是从故障处理过程来看,停了iptables,mysqld进程就开始有反应了,可能iptables造成的问题概率更大一点。
当前已经把iptables又启动了,继续观察,下次再碰到这个问题,直接停掉iptables以确认是否是iptables造成的问题。

关于紫砂壶

感悟技术人生
此条目发表在操作系统分类目录,贴了标签。将固定链接加入收藏夹。

1 则回应给 iptables还是jbd2惹的祸?

  1. 匿名说:

    支持

发表评论

电子邮件地址不会被公开。