外键无索引导致的锁问题

1. 故障说明

今天业务维护人员电话我说,有套数据库锁的情况比较严重,但是电话我时说锁已经释放,正在重启业务系统。

2. 数据库日志

我登录到数据库上,数据库的alert日志如下:

Tue Jan 03 09:09:37 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_18134.trc.
Tue Jan 03 09:09:37 2017
Dumping diagnostic data in directory=[cdmp_20170103090937], requested by (instance=2, osid=18134), summary=[abnormal process termination].
Tue Jan 03 09:13:08 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_28404.trc.
Tue Jan 03 09:13:09 2017
Dumping diagnostic data in directory=[cdmp_20170103091309], requested by (instance=2, osid=28404), summary=[abnormal process termination].
Tue Jan 03 09:17:44 2017
Thread 2 advanced to log sequence 99034 (LGWR switch)
  Current log# 16 seq# 99034 mem# 0: /oradata/xxxxxx/redo161.dbf
Tue Jan 03 09:25:53 2017
Thread 2 advanced to log sequence 99035 (LGWR switch)
  Current log# 9 seq# 99035 mem# 0: /oradata/xxxxxx/redo091.dbf
Tue Jan 03 09:30:42 2017
Thread 2 advanced to log sequence 99036 (LGWR switch)
  Current log# 10 seq# 99036 mem# 0: /oradata/xxxxxx/redo101.dbf
Tue Jan 03 09:35:26 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_13367.trc.
Tue Jan 03 09:35:27 2017
Dumping diagnostic data in directory=[cdmp_20170103093527], requested by (instance=2, osid=13367), summary=[abnormal process termination].
Tue Jan 03 09:45:12 2017
Thread 2 advanced to log sequence 99037 (LGWR switch)
  Current log# 11 seq# 99037 mem# 0: /oradata/xxxxxx/redo111.dbf
Tue Jan 03 09:52:18 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_13978.trc.
Tue Jan 03 09:52:18 2017
Dumping diagnostic data in directory=[cdmp_20170103095218], requested by (instance=2, osid=13978), summary=[abnormal process termination].
Tue Jan 03 09:54:45 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_13043.trc.
Tue Jan 03 09:54:45 2017
Dumping diagnostic data in directory=[cdmp_20170103095445], requested by (instance=2, osid=13043), summary=[abnormal process termination].
Tue Jan 03 09:54:55 2017
Non critical error ORA-48113 caught while writing to trace file "/oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/cdmp_20170103095445/xxxxxx2_ora_14882_bucket.trc"
Error message:
Writing to the above trace file is disabled for now on...
Non critical error ORA-48181 caught while writing to trace file "/oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/cdmp_20170103095445/xxxxxx2_ora_14884_bucket.trc"
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Non critical error ORA-48181 caught while writing to trace file "/oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/cdmp_20170103095445/xxxxxx2_ora_14886_bucket.trc"
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file Tue Jan 03 11:06:23 2017
Restarting dead background process DIAG
Tue Jan 03 11:06:23 2017
DIAG started with pid=335, OS id=18355
Dumping diagnostic data in directory=[cdmp_20170103110623], requested by (instance=2, osid=13533), summary=[abnormal process termination].
Tue Jan 03 11:11:23 2017
Global Enqueue Services Deadlock detected. More info in file
 /oracle/diag/rdbms/xxxxxx/xxxxxx2/trace/xxxxxx2_ora_17418.trc.
Tue Jan 03 11:11:23 2017
Dumping diagnostic data in directory=[cdmp_20170103111123], requested by (instance=2, osid=17418), summary=[abnormal process termination].

确实有一些全局锁的问题,当前/oracle文件系统已满,先清理了监听的日志和alert日志,释放空间,让trc可以写入。这里有一些全局死锁的信息,但是光看trc,估计只能看出死锁涉及的相关SQL,而没办法确定相关原因。当然死锁问题一般与业务相关,但是本着客户至上的原则,分析一下可能的原因。

3. 检查等待

检查会话的等待事件,发现enq: TM – contention等待比较多

                                                   SQL Child
  SID USERNAME EVENT                   SQL_ID        Number         P1      P2  P3 LAST_CALL_ET
----- -------- ----------------------- ------------- ------ ---------- ------- --- ------------
 3553 xxxxxx   db file scattered read  3pv7yn5qb7tfk      0          6 1471999   6           19
 1313 xxxxxx   db file scattered read  9umr8jfzw0sw2      0         10   46120  32           68
 6857 xxxxxx   db file sequential read 1uz4zsx2q5s9d      0          8 2306092   1         3300
 3589 xxxxxx   direct path read        c4ryz9xqj9bg1      0         11  274016  32          196
 8039 xxxxxx   enq: TM - contention    2wju0jnyxfxaj      0 1414332419   88855   0            5
  180 xxxxxx   enq: TM - contention    2wju0jnyxfxaj      0 1414332419   88855   0            2
 8049 xxxxxx   enq: TM - contention    9umr8jfzw0sw2      0 1414332421   88855   0           41
  148 xxxxxx   enq: TM - contention    9umr8jfzw0sw2      0 1414332421   88855   0           40
 7945 xxxxxx   enq: TM - contention    9umr8jfzw0sw2      0 1414332421   88855   0           38
 5755 xxxxxx   enq: TM - contention    9umr8jfzw0sw2      0 1414332421   88855   0           28
 4645 xxxxxx   enq: TM - contention    9umr8jfzw0sw2      0 1414332421   88855   0           32
 4537 xxxxxx   enq: TM - contention    b9jvgzmfppzc4      0 1414332419   88855   0            1
 1154 xxxxxx   enq: TM - contention    b9jvgzmfppzc4      0 1414332419   88855   0            4
  151 xxxxxx   gc buffer busy acquire  2y82w6dk9mnak      0          6 1471999   1          177
 4680 xxxxxx   gc cr request           9gx5uvrg3wbta      0          9 2224988   1            2
 4549 xxxxxx   gc cr request           dkwrm0h83n0h6      0         17  434478   1            0
   36 xxxxxx   gc cr request           g76vbc0w64r0s      0          7 2453912   1            2

 17 rows selected.

检查9umr8jfzw0sw2的SQL,发现是delete xxxxxx.AUDIT_ORDER

xxxxxx2> ora sql 9umr8jfzw0sw2
           SQL_TEXT
    -----------------------
    delete from AUDIT_ORDER
        where ORDER_ID = :1

enq: TM – contention等待事件的P2是object_id,检查发现TM锁是在AUDIT_IMPORT_DETAIL表上

SQL> select owner, object_name, object_type from dba_objects o where o.object_id = 88855;
OWNER    OBJECT_NAME           OBJECT_TYPE
-------- --------------------- -------------------
xxxxxx   AUDIT_IMPORT_DETAIL   TABLE

这种情况,首先应该怀疑的就是可能子表上的外键没有索引

4. 检查有外键无索引的表

with con1 as
 (select /*+rule*/
   c.owner, c.constraint_name, cc.table_name, cc.column_name, cc.position
    from dba_constraints c, dba_cons_columns cc
   where c.owner = 'xxxxxx'
     and c.constraint_type = 'R'
     and c.owner = cc.owner
     and c.constraint_name = cc.constraint_name),
ind1 as
 (select /*+rule*/
   i.owner, i.table_name, i.index_name, ic.column_name, ic.column_position
    from dba_indexes i, dba_ind_columns ic
   where i.owner = 'xxxxxx'
     and i.owner = ic.index_owner
     and i.index_name = ic.index_name)
select c1.owner,
       c1.constraint_name,
       c1.table_name,
       c1.column_name,
       c1.position,
       i1.owner,
       i1.table_name,
       i1.index_name,
       i1.column_name,
       i1.column_position
  from con1 c1, ind1 i1
 where c1.owner = i1.owner(+)
   and c1.table_name = i1.table_name(+)
   and c1.column_name = i1.column_name(+)
   and c1.position = i1.column_position(+)
   and i1.column_name is null;

使用这个脚本,查询xxxxxx用户里有外键无索引的字段,

	OWNER	CONSTRAINT_NAME	            TABLE_NAME	            COLUMN_NAME	POSITION	OWNER	TABLE_NAME	INDEX_NAME	COLUMN_NAME	COLUMN_POSITION
......
2	xxxxxx	AUDIT_IMPORT_DETAIL_F	    AUDIT_IMPORT_DETAIL	    ORDER_ID	1
......

找到AUDIT_IMPORT_DETAIL这张表的ORDER_ID上没有索引,查看一下AUDIT_IMPORT_DETAIL表的DDL语句,存在外键,父表就是AUDIT_ORDER

alter table xxxxxx.AUDIT_IMPORT_DETAIL
add constraint AUDIT_IMPORT_DETAIL_F foreign key (ORDER_ID)
references xxxxxx.AUDIT_ORDER (ORDER_ID) on delete cascade;

查看一下两张表的大小如下:

xxxxxx2> ora size xxxxxx.AUDIT_ORDER
    USER# OWNER        OBJECT_NAME        PARTITION_NAME OBJECT_TYPE SIZE_MB SEQ SIZE_GB EXTENTS SEGMENTS INIT_KB NEXT_KB TABLESPACE_NAME  ROWS#
    ----- ------ ------------------------ -------------- ----------- ------- --- ------- ------- -------- ------- ------- --------------- -------
       91 xxxxxx AUDIT_ORDER                             TABLE           612   1   0.598     153        1     512       8 xxxxxx_TBS      2052985
       91 xxxxxx IDX_STATE_DATE                          INDEX           168   2   0.164      42        1      64    1024 xxxxxx_TBS
       91 xxxxxx IDX_CREATE_DATE                         INDEX            80   3   0.078      20        1      64    1024 xxxxxx_TBS
       91 xxxxxx IDX_AUDIT_ORDER_STATE                   INDEX            72   4    0.07      18        1      64    1024 xxxxxx_TBS
       91 xxxxxx IDX_AUDIT_ORDER_STAFF                   INDEX            52   5   0.051      13        1      64    1024 xxxxxx_TBS
       91 xxxxxx IDX_AUDIT_DEPT_ID                       INDEX            52   6   0.051      13        1    4096    4096 xxxxxx_TBS
       91 xxxxxx IDX_FLOW_INST_ID                        INDEX            52   7   0.051      13        1      64    1024 xxxxxx_TBS
       91 xxxxxx PK_AUDIT_ORDER                          INDEX            48   8   0.047      12        1     512    1024 xxxxxx_TBS
       91 xxxxxx IDX_AUDIT_DOING_CYCLE_ID                INDEX            44   9   0.043      11        1    4096    4096 xxxxxx_TBS

xxxxxx2> ora size xxxxxx.AUDIT_IMPORT_DETAIL
    USER# OWNER        OBJECT_NAME       PARTITION_NAME OBJECT_TYPE SIZE_MB SEQ SIZE_GB EXTENTS SEGMENTS INIT_KB NEXT_KB TABLESPACE_NAME  ROWS#
    ----- ------ ----------------------- -------------- ----------- ------- --- ------- ------- -------- ------- ------- --------------- -------
       91 xxxxxx AUDIT_IMPORT_DETAIL                    TABLE           852   1   0.832     213        1      64       8 xxxxxx_TBS      4680954
       91 xxxxxx SYS_C0011565                           INDEX           104   2   0.102      26        1      64    1024 xxxxxx_TBS

每次删除AUDIT_ORDER表的1行记录,都需要全表扫描AUDIT_IMPORT_DETAIL表,AUDIT_IMPORT_DETAIL表有852M,肯定会引起比较严重的性能问题,进而造成比较严重的锁问题,也可以理解。

对AUDIT_IMPORT_DETAIL表的ORDER_ID字段添加完索引后,观察半天,未再发生死锁的情况。

5. 回顾trc中的死锁情况

查看一下死锁情况,以xxxxxx2_ora_18134.trc文件为例:

Single resource deadlock: blocking enqueue which blocks itself, f 0
Granted global enqueue 0x23f016970
----------enqueue 0x23f016970------------------------
lock version     : 1849425
Owner inst       : 2
grant_level      : KJUSERNL
req_level        : KJUSERPW
bast_level       : KJUSERNL
notify_func      : (nil)
resp             : 0x248c62fc0
procp            : 0x24c185808
pid              : 16986
proc version     : 14197
oprocp           : (nil)
opid             : 16986
group lock owner : 0x25138ea40
possible pid     : 16986
xid              : D7000-0002-00008CF5
dd_time          : 7.0 secs
dd_count         : 3
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : Y
lock_state       : OPENING CONVERTING
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK
Convert options  : KJUSERGETVALUE
History          : LOC_AST > CLOSE > FREE > FREE > REF_RES > GR2CVT
Msg_Seq          : 0x0
res_seq          : 1
valblk           : 0x00000000000000000000000000000000 .
user session for deadlock lock 0x23f016970
  sid: 7940 ser: 39569 audsid: 48783034 user: 91/xxxxxx
    flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 215 O/S info: user: grid, term: UNKNOWN, ospid: 16986
    image: oracle@xxxxxx_ora02
  client details:
    O/S info: user: xxxxxx, term: unknown, ospid: 1234
    machine: xxxxxx1 program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
  current SQL:
  delete from AUDIT_ORDER
    where ORDER_ID = :1
Requesting global enqueue 0x24401f7c0
----------enqueue 0x24401f7c0------------------------
lock version     : 2011267
Owner inst       : 2
grant_level      : KJUSERCW
req_level        : KJUSERPW
bast_level       : KJUSERNL
notify_func      : (nil)
resp             : 0x248c62fc0
procp            : 0x24c241848
pid              : 18134
proc version     : 8546
oprocp           : (nil)
opid             : 18134
group lock owner : 0x255d45280
possible pid     : 18134
xid              : 54000-0002-0001F69A
dd_time          : 0.0 secs
dd_count         : 0
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : N
lock_state       : CONVERTING
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK
Convert options  : KJUSERGETVALUE
History          : FREE > FREE > REF_RES > GR2CVT > LOC_AST_FG > LOC_AST
Msg_Seq          : 0x0
res_seq          : 1
valblk           : 0x7069013f02000000660ce760c2fe7f61 pi?f`a
user session for deadlock lock 0x24401f7c0
  sid: 4636 ser: 2731 audsid: 48783161 user: 91/xxxxxx
    flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 84 O/S info: user: grid, term: UNKNOWN, ospid: 18134
    image: oracle@xxxxxx_ora02
  client details:
    O/S info: user: xxxxxx, term: unknown, ospid: 1234
    machine: xxxxxx4 program: JDBC Thin Client
    application name: JDBC Thin Client, hash value=2546894660
  current SQL:
  delete from AUDIT_ORDER
    where ORDER_ID = :1 

----------resource 0x248c62fc0----------------------
resname       : [0x15b17][0x0],[TM][ext 0x0,0x0]
hash mask     : x3
Local inst    : 2
dir_inst      : 2
master_inst   : 2
hv idx        : 107
hv last r.inc : 2
current inc   : 4
hv status     : 0
hv master     : 1
open options  : dd cached
grant_bits    : KJUSERNL KJUSERCW
grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
count         : 5         0         11         0         0         0
val_state     : KJUSERVS_NOVALUE
valblk        : 0x01000000000000000600e30200000000 .
access_inst   : 2
vbreq_state   : 0
state         : x0
resp          : 0x248c62fc0
On Scan_q?    : N
cache level   : 0
Total accesses: 25114607
Imm.  accesses: 24920816
Granted_locks : 12
Cvting_locks  : 4
value_block:  01 00 00 00 00 00 00 00 06 00 e3 02 00 00 00 00
GRANTED_Q :
lp 0x24301fd30 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2523aa220 possible pid 7358 xid 157000-0002-00000B3E bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x2420216c0 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x253b100a0 possible pid 330 xid CE000-0002-00009B68 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x240015908 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x254bcd6e0 possible pid 3185 xid 136000-0002-00001A19 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x244021150 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2571813c0 possible pid 19811 xid 138000-0002-00001E49 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x23f016d10 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2588ff940 possible pid 19821 xid 149000-0002-00001305 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x242022e80 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2581fe7a0 possible pid 9140 xid 110000-0002-00001D05 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x24401fd30 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x254be2d00 possible pid 17394 xid C6000-0002-0000F70F bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x24301f7c0 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x25241b280 possible pid 9136 xid 10F000-0002-00002D5D bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x23f0132b0 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2556132e0 possible pid 23194 xid 143000-0002-00001D50 bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x240014c40 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x258f8c9a0 possible pid 27762 xid 142000-0002-0000147A bast 0 rseq 1 mseq 0
  history CLOSE > FREE > FREE > REF_RES > GR2CVT > LOC_AST
  open opt KJUSERDEADLOCK
lp 0x24401f7c0 gl KJUSERCW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x255d45280 possible pid 18134 xid 54000-0002-0001F69A bast 0 rseq 1 mseq 0
  history FREE > FREE > REF_RES > GR2CVT > LOC_AST_FG > LOC_AST
  open opt KJUSERDEADLOCK
CONVERT_Q:
lp 0x23f016970 gl KJUSERNL rl KJUSERPW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x25138ea40 possible pid 16986 xid D7000-0002-00008CF5 bast 0 rseq 1 mseq 0
  history LOC_AST > CLOSE > FREE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE
lp 0x23f017d90 gl KJUSERNL rl KJUSERPW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x2566ec100 possible pid 4795 xid FB000-0002-000035F9 bast 0 rseq 1 mseq 0
  history LOC_AST > CLOSE > FREE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE
lp 0x2430303e0 gl KJUSERNL rl KJUSERPW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x254515a40 possible pid 18402 xid FD000-0002-0000359C bast 0 rseq 1 mseq 0
  history LOC_AST > CLOSE > FREE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE
lp 0x242021e18 gl KJUSERNL rl KJUSERPW rp 0x248c62fc0 [0x15b17][0x0],[TM][ext 0x0,0x0]
  master 2 gl owner 0x257eeab40 possible pid 28404 xid 10A000-0002-00002459 bast 0 rseq 1 mseq 0
  history LOC_AST > CLOSE > FREE > FREE > REF_RES > GR2CVT
  convert opt KJUSERGETVALUE

在RAC环境下,某个会话发觉有问题,请求死锁检查,发现死锁后,会放弃自已执行的SQL,
以避免死锁的发生。从死锁会话正在执行的SQL来看,涉及到的表也是AUDIT_ORDER。

关于紫砂壶

感悟技术人生
此条目发表在Oracle故障诊断分类目录,贴了标签。将固定链接加入收藏夹。

1 则回应给 外键无索引导致的锁问题

  1. 越yan说:

    我也遇到外键无索引造成死锁问题,现在觉得这个是死锁头号嫌疑犯

评论已关闭。