文件系统用满导致的临时文件IO故障

1. 故障说明

有一套数据库发生了临时文件IO问题而导致了实例宕掉,故障报错如下:

Mon Apr 10 20:47:01 2017
KCF: read, write or open error, block=0xfe680 online=1
        file=1 '/oradata/xxxxxxxx/temp01.dbf'
        error=27061 txt: 'Linux-x86_64 Error: 28: No space left on device
Additional information: -1
Additional information: 8192'
Errors in file /oracle/diag/rdbms/xxxxxxxx/xxxxxxxx/trace/xxxxxxxx_dbw0_2791.trc:
Errors in file /oracle/diag/rdbms/xxxxxxxx/xxxxxxxx/trace/xxxxxxxx_dbw0_2791.trc:
ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 4097 (block # 1042048)
ORA-01110: data file 4097: '/oradata/xxxxxxxx/temp01.dbf'
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 28: No space left on device
Additional information: -1
Additional information: 8192
DBW0 (ospid: 2791): terminating the instance due to error 63999
......
Mon Apr 10 20:47:02 2017
System state dump requested by (instance=1, osid=2791 (DBW0)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/diag/rdbms/xxxxxxxx/xxxxxxxx/trace/xxxxxxxx_diag_2783_20170410204702.trc
Instance terminated by DBW0, pid = 2791

2. 故障分析

从报错的字面意思看,是文件系统满了,检查文件系统,确实是满了:

[root@xxxxxx ~]# df -h
文件系统              容量  已用  可用 已用%% 挂载点
/dev/mapper/VolGroup-LogVol00
                     1008M  394M  563M  42% /
tmpfs                  16G   72K   16G   1% /dev/shm
/dev/sda1             504M   67M  412M  14% /boot
/dev/mapper/VolGroup-LogVol01
                      7.9G  147M  7.4G   2% /home
/dev/mapper/VolGroup-LogVol04
                      7.9G  148M  7.4G   2% /opt
/dev/mapper/VolGroup-LogVol02
                      7.9G  147M  7.4G   2% /tmp
/dev/mapper/VolGroup-LogVol03
                      9.9G  7.5G  1.9G  81% /usr
/dev/mapper/VolGroup-LogVol05
                      7.9G  696M  6.8G  10% /var
/dev/mapper/VolGroup-lv_oracle
                       30G   18G   11G  62% /oracle
/dev/mapper/vg_data-lV_oradata
                      1.0T  972G   16M 100% /oradata

一般来讲,Oracle的文件分配完,只要不是自动扩展,大小一般是不变的,那怎么还会发生这样的问题呢?是不是自动扩展了?检查如下:

SQL> set linesize 180
SQL> set pagesize 999
SQL> column file_name format a60
column tablespace_name format a25
SQL> SQL> select tablespace_name, file_name, bytes/1024/1024, AUTOEXTENSIBLE, status 
from dba_data_files;  2  

TABLESPACE_NAME           FILE_NAME                                                    BYTES/1024/1024 AUT STATUS
------------------------- ------------------------------------------------------------ --------------- --- ---------
SYSTEM                    /oradata/xxxxxxxx/system01.dbf                                          2048 NO  AVAILABLE
SYSAUX                    /oradata/xxxxxxxx/sysaux01.dbf                                          8192 NO  AVAILABLE
UNDOTBS1                  /oradata/xxxxxxxx/undotbs01.dbf                                        30720 NO  AVAILABLE
USERS                     /oradata/xxxxxxxx/users01.dbf                                            500 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_01.dbf                                 30720 NO  AVAILABLE
WLW_INAS_CONF             /oradata/xxxxxxxx/WLW_INAS_CONF_01.dbf                                  5120 NO  AVAILABLE
WLW_INAS_JS_CONF          /oradata/xxxxxxxx/WLW_INAS_JS_CONF_01.dbf                               5120 NO  AVAILABLE
TBS_AD_DATA               /oradata/xxxxxxxx/TBS_AD_DATA_01.dbf                                   30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_01.dbf                               30720 NO  AVAILABLE
TBS_AD_DATA_HIS_IDX       /oradata/xxxxxxxx/TBS_AD_DATA_HIS_IDX_01.dbf                           10240 NO  AVAILABLE
TBS_AD_DATA_IDX           /oradata/xxxxxxxx/TBS_AD_DATA_IDX_01.dbf                               10240 NO  AVAILABLE
TBS_JTWLW_TFJ             /oradata/xxxxxxxx/TBS_JTWLW_TFJ_01.dbf                                  5120 NO  AVAILABLE
TBS_JTWLW_TFJ_HIS         /oradata/xxxxxxxx/TBS_JTWLW_TFJ_HIS_01.dbf                              5120 NO  AVAILABLE
TBS_WLWJTOIP              /oradata/xxxxxxxx/TBS_WLWJTOIP_01.dbf                                   5120 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_02.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_02.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_03.dbf                                 30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_04.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_03.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_05.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_04.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_06.dbf                                 30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_07.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_05.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_08.dbf                                 30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_09.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_06.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_10.dbf                                 30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_11.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_07.dbf                               30720 NO  AVAILABLE
TBS_WLWJTOIP              /oradata/xxxxxxxx/TBS_WLWJTOIP_02.dbf                                  30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_12.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_08.dbf                               30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_09.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_13.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_10.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_14.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_11.dbf                               30720 NO  AVAILABLE
WLW_INAS_DATA             /oradata/xxxxxxxx/WLW_INAS_DATA_15.dbf                                 30720 NO  AVAILABLE
TBS_AD_DATA_HIS           /oradata/xxxxxxxx/TBS_AD_DATA_HIS_12.dbf                               30720 NO  AVAILABLE

40 rows selected.

SQL> set linesize 180
SQL> set pagesize 999
SQL> column file_name format a60
SQL> column tablespace_name format a25
select tablespace_name, file_name, bytes/1024/1024, AUTOEXTENSIBLE, status 
SQL>   2  from dba_temp_files ;

TABLESPACE_NAME           FILE_NAME                                                    BYTES/1024/1024 AUT STATUS
------------------------- ------------------------------------------------------------ --------------- --- -------
TEMP                      /oradata/xxxxxxxx/temp01.dbf                                           20480 NO  ONLINE

数据文件和临时文件都没有自动扩展。

3. 说明

这个问题在 ORA-01114 ORA-27063 SVR4 Error: 28: No space left on device,temporary tablespace (文档 ID 121732.1) 解释如下:

Problem Description
——————-
You have a temporary tablespace of type TEMPORARY.

You are running a SQL statement using temporary storage (for example, a sort
segment to process a GROUP BY or ORDER BY ), and get these errors :

ORA-01114: IO error writing block to file 5015 (block # 199528)
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 28: No space left on device
Additional information: -1
Additional information: 49152

Solution Description
——————–
The files of the temporary tablespace of type TEMPORARY are in a full file
system (no free space available).

Make free space in this file system.

Explanation
———–
On certain file systems, creation of tempfiles does not guarantee the allocation
of the actual disk space of the file size specified.
The disk space is allocated later when the tempfile blocks are accessed.
This has the advantage of fast tempfile creation (as compared to datafile
creation). However, the disk could run out of space when tempfiles are accessed
later. The same thing happens when tempfiles are resized (to a larger size:
either by ALTER DATABASE … TEMPFILE … RESIZE command or by autoextend) i.e.
resize does not guarantee the actual disk space allocation on certain file
systems.

Tempfiles are currently not initialized, and (at least on Unix) this means they
are not pre-allocated. So even if a file system has enough free space at the
time the tempfile is created, later on the user may get write errors due to the
file system being full while writing to the tempfile.

DATAFILE FOR TEMPORARY TABLESPACE IS CREATED AS A SPARSE FILE
大致解释为:TEMPFILE在Oracle中不保证预分配,只有在用到时才会实际占用。

关于紫砂壶

感悟技术人生
此条目发表在Oracle故障诊断分类目录,贴了标签。将固定链接加入收藏夹。