binlog文件系统满导致MySQL故障处理

1. 故障概述

短信告警,有套MySQL主库无法连接,

2. 故障诊断

登录到服务器检查主库,iptables没有启动。

# service iptables status
iptables:未运行防火墙。

登录数据库报告超过最大连接数

# mysql -uroot -pxxxxxx
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1040 (HY000): Too many connections

数据库的最大连接数是200,这套MySQL库平时的连接数只有30左右,难道是连接数暴涨了?

# cat /etc/my.cnf | grep connections
max_connections = 200

检查MySQL的告警日志,日志为:/var/lib/mysql/mysql-master.err

# ps -ef | grep mysql
root     22992     1  0  2016 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysqldata --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql    23532 22992 11  2016 ?        14-14:36:29 /usr/sbin/mysqld --basedir=/usr --datadir=/data/mysqldata --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/mysql-master.err --open-files-limit=65535 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306
root     24550 24493  0 08:43 pts/6    00:00:00 tail -1000f /var/lib/mysql/mysql-master.err
xxxxxxx  26601 26413  0 08:54 pts/7    00:00:00 mysql -uroot -px xxxxxxxxxxx

检查/var/lib/mysql/mysql-master.err日志发现如下内容:

2017-02-13T11:21:29.943544Z 7944633 [Note] Aborted connection 7944633 to db: 'error_handling' user: 'xxxxxxxx' host: 'xxx.xxx.25.66' (Got timeout reading communication packets)
2017-02-13T17:01:00.743559Z 7946502 [ERROR] Disk is full writing '/var/lib/mysql/mysql-master-bin.000027' (Errcode: 15821648 - No space left on device). Waiting for someone to free space...
2017-02-13T17:01:00.743609Z 7946502 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2017-02-13T17:11:00.805313Z 7946502 [ERROR] Disk is full writing '/var/lib/mysql/mysql-master-bin.000027' (Errcode: 15821648 - No space left on device). Waiting for someone to free space...
2017-02-13T17:11:00.805398Z 7946502 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2017-02-13T17:21:00.829410Z 7946502 [ERROR] Disk is full writing '/var/lib/mysql/mysql-master-bin.000027' (Errcode: 15821648 - No space left on device). Waiting for someone to free space...
2017-02-13T17:21:00.829475Z 7946502 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2017-02-13T17:31:00.832859Z 7946502 [ERROR]2017-02-14T00:45:00.979481Z 7946707 [Note] Aborted connection 7946707 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error writing communication packets)

检查文件系统发现/var满了

# df -h
文件系统              容量  已用  可用 已用%% 挂载点
/dev/mapper/VolGroup-LogVol00
                      5.0G  381M  4.3G   8% /
tmpfs                  32G   72K   32G   1% /dev/shm
/dev/vda1             194M   33M  152M  18% /boot
/dev/mapper/VolGroup-LogVol01
                      4.0G  2.1G  1.8G  55% /home
/dev/mapper/VolGroup-LogVol02
                      7.9G  482M  7.1G   7% /opt
/dev/mapper/VolGroup-LogVol03
                       12G  7.4G  4.0G  66% /usr
/dev/mapper/VolGroup-LogVol04
                      7.9G  7.5G     0 100% /var
/dev/mapper/VolGroup-lv_data
                       99G   49G   46G  52% /data

# cd /var/lib/mysql/
# ls -l
总用量 7474104
-rw-rw---- 1 mysql mysql   12582912 3月  30 2016 ibdata1
-rw-rw---- 1 mysql mysql   50331648 3月  30 2016 ib_logfile0
-rw-rw---- 1 mysql mysql   50331648 3月  30 2016 ib_logfile1
drwx--x--x 2 mysql mysql       4096 3月  30 2016 mysql
-rw-r----- 1 mysql mysql 1073743209 11月 29 16:24 mysql-master-bin.000021
-rw-r----- 1 mysql mysql 1073777330 12月 10 01:08 mysql-master-bin.000022
-rw-r----- 1 mysql mysql 1073772911 12月 23 01:01 mysql-master-bin.000023
-rw-r----- 1 mysql mysql 1087921538 1月   3 01:00 mysql-master-bin.000024
-rw-r----- 1 mysql mysql 1073741920 1月  16 17:02 mysql-master-bin.000025
-rw-r----- 1 mysql mysql 1073742990 2月   3 17:43 mysql-master-bin.000026
-rw-r----- 1 mysql mysql  832425984 2月  14 08:43 mysql-master-bin.000027
-rw-r----- 1 mysql mysql        273 2月   3 17:43 mysql-master-bin.index
-rw-r----- 1 mysql mysql   34705408 2月  14 08:41 mysql-master.err
-rw-r----- 1 mysql mysql  136130560 2月  14 08:31 mysql-slow.log
srwxrwxrwx 1 mysql mysql          0 10月 10 10:07 mysql.sock
-rw------- 1 mysql mysql          6 10月 10 10:07 mysql.sock.lock
drwx------ 2 mysql mysql       4096 3月  30 2016 performance_schema
-rw-r----- 1 root  root    80192864 1月  13 15:08 queryslow.log
-rw-r--r-- 1 root  root         125 3月  30 2016 RPM_UPGRADE_HISTORY
-rw-r--r-- 1 mysql mysql        125 3月  30 2016 RPM_UPGRADE_MARKER-LAST
drwxr-xr-x 2 mysql mysql       4096 3月  30 2016 test

手工移动几个bin日志到其它目录后,把/var/文件系统释放部分空间后,MySQL服务器恢复可用。

# mv mysql-master-bin.000021 /data/
# mv mysql-master-bin.000022 /data/

3. 其它

检查MySQL的数据库日志,发现MySQL自已检查文件时缺binlog文件,这两个文件已经被我移走了。

2017-02-14T00:48:23.323514Z 7948258 [Note] Access denied for user 'xxxxxxx'@'localhost' (using password: NO)
2017-02-14T00:56:46.440115Z 7949468 [Note] Access denied for user 'xxxxxxxx'@'localhost' (using password: YES)
2017-02-14T00:59:22.341924Z 7949750 [ERROR] Failed to open log (file '/var/lib/mysql/mysql-master-bin.000021', errno 2)
2017-02-14T00:59:22.341991Z 7949750 [ERROR] Could not open log file
2017-02-14T00:59:22.342052Z 7949750 [ERROR] Failed to open log (file '/var/lib/mysql/mysql-master-bin.000022', errno 2)
2017-02-14T00:59:22.342064Z 7949750 [ERROR] Could not open log file
2017-02-14T01:00:35.387418Z 7949750 [Note] Failed to execute mysql_file_stat on file '/var/lib/mysql/mysql-master-bin.000021'
2017-02-14T01:00:35.387518Z 7949750 [Note] Failed to execute mysql_file_stat on file '/var/lib/mysql/mysql-master-bin.000022'

查看MySQL的binlog的索引文件,发现MySQL自已已经移除


# cat mysql-master-bin.index
/var/lib/mysql/mysql-master-bin.000024
/var/lib/mysql/mysql-master-bin.000025
/var/lib/mysql/mysql-master-bin.000026
/var/lib/mysql/mysql-master-bin.000027

如果binlog.index文件里的日志已经不存在,可能MySQL重启时会碰到问题,参考:[ERROR] Failed to open log

4. 总结

MySQL的log-bin配置在比较小的/var文件系统下,并且MySQL没有配置bin日志的过期时间,expire_logs_days = x
导致bin日志把/var文件系统撑满,会话无法写入数据而挂起,进行把200个连接数撑满。
释放文件系统后,会话可以写入并且释放,解决故障

# cat /etc/my.cnf | grep data
datadir=/data/mysqldata
# cat /etc/my.cnf | grep log-bin
log-bin=/var/lib/mysql/mysql-master-bin.log

关于紫砂壶

感悟技术人生
此条目发表在MySQL故障诊断分类目录,贴了标签。将固定链接加入收藏夹。