ORA-7445 opiptp故障诊断处理

1. 性能故障说明

业务在执行如下这个查询时,20多秒返回,业务认为可以接爱。

select * from v_all_chkiserrchannel x

但是这带具体条件查询,或者用子查询带条件方式,就非常慢,要5分钟多

select * from v_all_chkIsErrChannel x where x.channel_status_cd = 1
  and x.IsErrChannel = 'YES';

select * from (select * from v_all_chkiserrchannel x) m
where m.channel_status_cd = 1
  and m.iserrchannel = 'YES';

需要诊断一下为什么加上条件后会这么慢。

2. 检查相关信息

v_all_chkIsErrChannel视图的脚本如下,两个crm.channel表进行关联

create or replace view data_man.v_all_chkiserrchannel as
select distinct m.channel_id,
                m.parent_channel_id,
                m.channel_status_cd,
                f_chkIsErrChannel(m.channel_id) as IsErrChannel
  from crm.channel m,
      (select distinct c.parent_channel_id from crm.channel c) x
where m.channel_id = x.parent_channel_id;

查看一crm.chanlel表的大小,只有8M左右。

BSSSZ1> ora size crm.channel
    USER# OWNER      OBJECT_NAME      PARTITION_NAME OBJECT_TYPE SIZE_MB SEQ SIZE_GB EXTENTS SEGMENTS INIT_KB NEXT_KB TABLESPACE_NAME ROWS#
    ----- ----- ----------------------- -------------- ----------- ------- --- ------- ------- -------- ------- ------- --------------- -----
      48 CRM  CHANNEL                                TABLE            8  1  0.008      2        1      64    4096 TBS_CRM_TS_S1  25964
      48 CRM  PK_CHANNEL                            INDEX            4  2  0.004      1        1      64    4096 TBS_CRM_TS_S1
      48 CRM  UIDX_CHANNEL_MANAGECODE                INDEX            4  3  0.004      1        1      64    1024 TBS_CRM_IDX_B1
      48 CRM  IDX_CHANNEL_NBR                        INDEX            4  4  0.004      1        1      64    4096 TBS_CRM_TS_S1

这两张表关联,需要20秒左右才能返回结果,主要是因为f_chkIsErrChannel这个function的原因。

create or replace function data_man.f_chkIsErrChannel(i_channelId in number)
  return varchar2 as
  v_o_IsErrChannel varchar2(4) := 'NO';
  v_cnt            number(3) := null;
  v_errMsg        varchar2(512) := null;
begin
  begin
    select count(1)
      into v_cnt
      from crm.channel cc
    where cc.channel_id = i_channelId
    connect by prior cc.channel_id = cc.parent_channel_id
    start with cc.channel_id in (i_channelId);
  exception
    when others then
      v_errMsg := substr(sqlerrm, 1, 512);
  end;
  if (upper(v_errMsg) like '%ORA-01436%' or v_errMsg is not null) then
    v_o_IsErrChannel := 'YES';
  end if;
  return v_o_IsErrChannel;
end f_chkIsErrChannel;

这个funcation是个树型查找过程。

3. 优化过程

这里业务认为20s左右可以接受,就不去理会它里面的业务逻辑了,主要研究,为什么加了条件后时间会变成5分钟。
看了下where条件,里面有iserrchannel,这个字段是根据f_chkIsErrChannel计算出来的结果进行过虑,全部计算出来只需要20s,
加了这个条件,我估计修改了相关执行计划。既然将全部结果计算出来的时间可以接受,那就将这个结果给固定住。
就这用到了MATERIALIZE,测试如下:为了不修改视图,这里使用with as来代替视图

with v_all_chkiserrchannel as
(
select /*+ MATERIALIZE */ distinct m.channel_id,
                m.parent_channel_id,
                m.channel_status_cd
                , data_man.f_chkIsErrChannel(m.channel_id) as IsErrChannel
  from crm.channel m,
      (select distinct c.parent_channel_id from crm.channel c) x
where m.channel_id = x.parent_channel_id
)
select  * from (select * from v_all_chkiserrchannel x) m
where m.channel_status_cd = 1
  and m.iserrchannel = 'YES';

测试结果,这样编写的SQL与直接查询视图时间基本一致了,应用就需修改视图语句了,加上/*+ MATERIALIZE */。

4. 处理ORA-07445 [opiptp]

以上的处理方法,在大部分的库上面没有问题,但是有两个库上报ORA-07445错误了,错误堆栈如下:

*** ACTION NAME:(with v_all_chkiserrchannel ...) 2017-08-25 09:40:32.948
*** MODULE NAME:(PL/SQL Developer) 2017-08-25 09:40:32.948
*** SERVICE NAME:(bsszj) 2017-08-25 09:40:32.948
*** SESSION ID:(4788.19585) 2017-08-25 09:40:32.948
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x2b6a86db2d50, PC: [0x30fbe10, opiptp()+56]
*** 2017-08-25 09:40:32.956
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [opiptp()+56] [SIGSEGV] [Address not mapped to object] [0x2B6A86DB2D50] [] []
Current SQL statement for this session:
SELECT COUNT(1) FROM CRM.CHANNEL CC WHERE CC.CHANNEL_ID = :B1 CONNECT BY PRIOR CC.CHANNEL_ID = CC.PARENT_CHANNEL_ID START WITH CC.CHANNEL_ID IN (:B1 )
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x27099c8608        8  function DATA_MAN.F_CHKISERRCHANNEL
----- Call Stack Trace -----
calling              call    entry                argument values in hex      
location            type    point                (? means dubious value)    
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call    ksedst1()            000000000 ? 000000001 ?
                                                  2B3DAD899D50 ? 2B3DAD899DB0 ?
                                                  2B3DAD899CF0 ? 000000000 ?
ksedmp()+610        call    ksedst()            000000000 ? 000000001 ?
                                                  2B3DAD899D50 ? 2B3DAD899DB0 ?
                                                  2B3DAD899CF0 ? 000000000 ?
ssexhd()+629        call    ksedmp()            000000003 ? 000000001 ?
                                                  2B3DAD899D50 ? 2B3DAD899DB0 ?
                                                  2B3DAD899CF0 ? 000000000 ?
__restore_rt()+0    call    ssexhd()            00000000B ? 2B3DAD89AD70 ?
                                                  2B3DAD89AC40 ? 2B3DAD899DB0 ?
                                                  2B3DAD899CF0 ? 000000000 ?
opiptp()+56          signal  __restore_rt()      000017BC8 ? 000000000 ?
                                                  7FFF527316E4 ? 7FFF527316E0 ?
                                                  000000001 ? 2B6A86DB2C78 ?
opiptc()+161        call    opiptp()            000017BC8 ? 000000000 ?
                                                  7FFF527316E4 ? 7FFF527316E0 ?
                                                  000000001 ? 2B6A86DB2C78 ?
opiexe()+24951      call    opiptc()            000000057 ? 25439F84F0 ?
                                                  7FFF527316E4 ? 7FFF527316E0 ?
                                                  000000001 ? 2B6A86DB2C78 ?
opipls()+2092        call    opiexe()            000000004 ? 000000005 ?
                                                  7FFF52732B14 ? 000000002 ?
                                                  000000001 ? 2B6A86DB2C78 ?
opiodr()+1184        call    opipls()            000000066 ? 000000006 ?
                                                  7FFF52734260 ? 000000000 ?
                                                  2B3DAD968FA8 ?
                                                  527327B800000000 ?
rpidrus()+196        call    opiodr()            000000066 ? 000000006 ?
                                                  7FFF52734260 ? 000000002 ?
                                                  005BFFE50 ?
                                                  527327B800000000 ?
skgmstack()+158      call    rpidrus()            7FFF52733898 ? 000000006 ?
                                                  7FFF52734260 ? 000000002 ?
                                                  005BFFE50 ?
                                                  527327B800000000 ?
rpidru()+116        call    skgmstack()          7FFF52733870 ? 006AFF640 ?
                                                  00000F618 ? 00242328C ?
                                                  7FFF52733898 ?
                                                  527327B800000000 ?
rpiswu2()+409        call    rpidru()            7FFF52733F38 ? 006AFF640 ?
                                                  00000F618 ? 00242328C ?
                                                  7FFF52733898 ?
                                                  527327B800000000 ?
rpidrv()+1516        call    rpiswu2()            2A5D39BE68 ? 00000003F ?
                                                  7FFF52733F18 ? 000000002 ?
                                                  7FFF52733F80 ? 00000003F ?
psddr0()+439        call    rpidrv()            000000002 ? 000000066 ?
                                                  7FFF52734260 ? 000000038 ?
                                                  7FFF52733F80 ? 00000003F ?
psdnal()+386        call    psddr0()            000000002 ? 000000066 ?
                                                  7FFF52734260 ? 000000030 ?
                                                  7FFF52733F80 ? 00000003F ?
pevm_EXECC()+376    call    psdnal()            2B3DAD9600E8 ? 000000000 ?
                                                  7FFF52734260 ? 2B3DAD969640 ?
                                                  000000008 ? 00000003F ?
pfrinstr_EXECC()+80  call    pevm_EXECC()        2B3DAD9602C0 ? 000000000 ?
                                                  0000002E0 ? 2B3DAD969640 ?
                                                  000000008 ? 00000003F ?
pfrrun_no_tool()+65  call    pfrinstr_EXECC()    2B3DAD9602C0 ? 274036A104 ?
                                                  2B3DAD960328 ? 2B3DAD969640 ?
                                                  000000008 ? 7FFF000002E0 ?
pfrrun()+898        call    pfrrun_no_tool()    2B3DAD9602C0 ? 274036A104 ?
                                                  2B3DAD960328 ? 2B3DAD969640 ?
                                                  000000008 ? 7FFF000002E0 ?
plsql_run()+839      call    pfrrun()            2B3DAD9602C0 ? 000000000 ?
                                                  2B3DAD960328 ? 2B3DAD9600E8 ?
                                                  000000008 ? 274036A0C2 ?
peidxr_run()+246    call    plsql_run()          2B3DAD9602C0 ? 000000001 ?
                                                  2B3DAD954C48 ? 2B3DAD9600E8 ?
                                                  000000008 ? 900000000 ?
peidxexe()+89        call    peidxr_run()        2B3DAD960070 ? 000000001 ?
                                                  2B3DAD954C48 ? 2B3DAD9602C0 ?
                                                  000000008 ? 900000000 ?
kkxdexe()+736        call    peidxexe()          2B3DAD960070 ? 000000001 ?
                                                  2B3DAD954C48 ? 2B3DAD9602C0 ?
                                                  000000008 ? 900000000 ?
kkxmpexe()+227      call    kkxdexe()            2B3DAD960070 ? 2B3DAD9608B0 ?
                                                  2B3DAD96EF98 ? 2B3DAD9602C0 ?
                                                  000000000 ? 000000000 ?
kgmexwi()+565        call    kkxmpexe()          006AFFA60 ? 2895E28AD0 ?
                                                  2B3DADDEE070 ? 2B3DAD960070 ?
                                                  27099C8608 ? 2B3DAD96EF98 ?
kgmexec()+1314      call    kgmexwi()            006AFFA60 ? 000000001 ?
                                                  7FFF52735800 ? 2895E28AD0 ?
                                                  2B3DADDEE070 ? 2B3DADDEE070 ?
evapls()+909        call    kgmexec()            006AFFA60 ? 000000001 ?
                                                  7FFF52735800 ? 24AEAA5618 ?
                                                  28AF7FFB28 ? 2B3DAD96F118 ?
evaopn2()+458        call    evapls()            2490D48438 ? 000000000 ?
                                                  006B07FF0 ? 006AFFA60 ?
                                                  2490D483D0 ? 2B3DAD96F118 ?
qerghRowP()+1124    call    evaopn2()            2490D48438 ? 2B3DAD965900 ?
                                                  006B07FF0 ? 006AFFA60 ?
                                                  2B3DAD969B38 ? 2B3DAD96F118 ?
qerhjInnerProbeHash  call    qerghRowP()          000000001 ? 000007FFF ?
Table()+531                                        006B07FF0 ? 2B3DAD965910 ?
                                                  2B3DAD965A00 ? 2490D485C0 ?
kdstf0000101km()+43  call    qerhjInnerProbeHash  7FFF527366F0 ? 000007FFF ?
9                            Table()              006B07FF0 ? 2B3DAD965910 ?
                                                  2B3DAD9656F0 ? 2490D485C0 ?
kdsttgr()+1997      call    kdstf0000101km()    13E0E6E0BA ? 000000000 ?
                                                  002E98012 ? 7FFF527366F0 ?
                                                  000007FFF ? 000000000 ?
qertbFetch()+650    call    kdsttgr()            2B3DAD965038 ? 000000000 ?
                                                  2B3DAD965460 ? 000000000 ?
                                                  000007FFF ? 002E98012 ?
rwsfcd()+88          call    qertbFetch()        273FFB07E8 ? 002E98012 ?
                                                  7FFF527366F0 ? 000007FFF ?
                                                  273FFB0858 ? 002E98012 ?
qerhjFetch()+549    call    rwsfcd()            273FFB07E8 ? 002E98012 ?
                                                  7FFF527366F0 ? 000007FFF ?
                                                  273FFB0858 ? 002E98012 ?
qerghFetch()+269    call    qerhjFetch()        2490D487B0 ? 002FBBF58 ?
                                                  2490D485F8 ? 000007FFF ?
                                                  273FFB0858 ? 002E98012 ?
rwsfcd()+88          call    qerghFetch()        2490D485F8 ? 002EF6EF4 ?
                                                  2490D47FC0 ? 000007FFF ?
                                                  273FFB0858 ? 2B3DAD965910 ?
qerltFetch()+1008    call    rwsfcd()            2490D485F8 ? 002EF6EF4 ?
                                                  2490D47FC0 ? 000007FFF ?
                                                  273FFB0858 ? 2B3DAD965910 ?
insdlexe()+372      call    qerltFetch()        2490D47FC0 ? 001F13B20 ?
                                                  7FFF52736B70 ? 000000001 ?
                                                  273FFB0858 ? 26F7274628 ?
insExecStmtExecIniE  call    insdlexe()          2490D47FC0 ? 235F1B3AB0 ?
ngine()+85                                        7FFF52736B70 ? 000000001 ?
                                                  273FFB0858 ? 26F7274628 ?
insexe()+867        call    insExecStmtExecIniE  2490D47FC0 ? 235F1B3AB0 ?
                              ngine()              000000001 ? 000000001 ?
                                                  273FFB0858 ? 26F7274628 ?
qes3tExecSQL()+1251  call    insexe()            273FFB1208 ? 7FFF52737048 ?
                                                  25AF9D2400 ? 000000001 ?
                                                  273FFB0858 ? 2B3DAD97F000 ?
qerleStart()+132    call    qes3tExecSQL()      238ABB49F8 ? 7FFF52737048 ?
                                                  000000000 ? 000000001 ?
                                                  273FFB0858 ? D85098D5250 ?
selexe()+667        call    qerleStart()        235F1B3ED0 ? 000000001 ?
                                                  000000000 ? 000000001 ?
                                                  273FFB0858 ? D85098D5250 ?
opiexe()+4687        call    selexe()            25AF9D2400 ? 7FFF52737E88 ?
                                                  7FFF52737E88 ? 25AF9D2400 ?
                                                  273FFB0858 ? D85098D5250 ?
kpoal8()+2305        call    opiexe()            000000049 ? 000000003 ?
                                                  7FFF52738358 ? 000000001 ?
                                                  273FFB0858 ? D85098D5250 ?
opiodr()+1184        call    kpoal8()            00000005E ? 000000017 ?
                                                  7FFF5273B268 ? 000000001 ?
                                                  D22D000000000001 ?
                                                  D85098D5250 ?
ttcpip()+1226        call    opiodr()            00000005E ? 000000017 ?
                                                  7FFF5273B268 ? 000000000 ?
                                                  005BFFD10 ? D85098D5250 ?
opitsk()+1310        call    ttcpip()            006B07FF0 ? 0054B8640 ?
                                                  7FFF5273B268 ? 000000000 ?
                                                  7FFF5273AD68 ? 7FFF5273B3D0 ?
opiino()+1024        call    opitsk()            000000003 ? 000000000 ?
                                                  7FFF5273B268 ? 000000001 ?
                                                  000000000 ?
                                                  1DA3001500000001 ?
opiodr()+1184        call    opiino()            00000003C ? 000000004 ?
                                                  7FFF5273C468 ? 000000000 ?
                                                  000000000 ?
                                                  1DA3001500000001 ?
opidrv()+548        call    opiodr()            00000003C ? 000000004 ?
                                                  7FFF5273C468 ? 000000000 ?
                                                  005BFF7C0 ?
                                                  1DA3001500000001 ?
sou2o()+114          call    opidrv()            00000003C ? 000000004 ?
                                                  7FFF5273C468 ? 000000000 ?
                                                  005BFF7C0 ?
                                                  1DA3001500000001 ?
opimai_real()+163    call    sou2o()              7FFF5273C440 ? 00000003C ?
                                                  000000004 ? 7FFF5273C468 ?
                                                  005BFF7C0 ?
                                                  1DA3001500000001 ?
main()+116          call    opimai_real()        000000002 ? 7FFF5273C4D0 ?
                                                  000000004 ? 7FFF5273C468 ?
                                                  005BFF7C0 ?
                                                  1DA3001500000001 ?
__libc_start_main()  call    main()              000000002 ? 7FFF5273C4D0 ?
+244                                              000000004 ? 7FFF5273C468 ?
                                                  005BFF7C0 ?
                                                  1DA3001500000001 ?
_start()+41          call    __libc_start_main()  00072D518 ? 000000002 ?
                                                  7FFF5273C628 ? 000000000 ?
                                                  005BFF7C0 ? 000000002 ?

怀疑是碰到Bug了,查了下,确实如此

Connection Is Lost On Insert or Select Connect by With ORA-7445 [opiptp()+50] (文档 ID 436199.1)

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.2 to 10.2.0.5 - Release: 10.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 06-APR-2012***
Symptoms
On 10.2.0.3 in Production:
When attempting to insert append to global temporary table the following error occurs.

connection is lost with ORA-03113, logged for the client

The following error logged at the alert log file ORA-07445: Exception aufgetreten: CORE Dump [opiptp()+50] [SIGSEGV] [Address not mapped to object] [0xB718CABC] [] []

The stack trace is similar to:
opiptp opiptc opiexe opipls opiodr rpidrus

Users cannot insert append or select from  table.
Cause
The issue is caused by the following Bug 5968363 : ORA-7445 [_OPIPTP+50] FROM A COMPLEX QUERY WITH A CONNECT BY, WITH, FUNCTION CALL .

The Bug is not backport able , This issue is fixed in 11.1.0.7

Solution
Possible workarunds are:

1. set the parameter "_optimizer_connect_by_cost_based" to false , at the database level :
SQL> alter system set "_optimizer_connect_by_cost_based" = false scope=both ;

OR

2.  set the parameter "_optimizer_connect_by_cost_based" to false , at the session level :
SQL> alter session set "_optimizer_connect_by_cost_based" = false;

OR

3. Add  /*+ NO_CONNECT_BY_COST_BASED */ hint to the query, instead of setting the parameter

The effect of this parameter , is it just disables the optimizer to use cost-based transformation for connect by queries only.

BUG:5968363 - ORA-7445 [_OPIPTP+50] FROM A COMPLEX QUERY WITH A CONNECT BY, WITH, FUNCTION CAL

解决这个问题的办法呢,可以在执行SQL语句之前,先执行:

alter session set "_optimizer_connect_by_cost_based" = false;

修改会话的参数,或者直接修改数据库参数:

alter system set "_optimizer_connect_by_cost_based" = false scope=both ;

这两种方式,由于都涉及到隐含参数的修改,应用估计也不可能在会话上设置alter
session来修改会话参数,所以这里使用了第三种方式,将funcation修改为如下:

create or replace function data_man.f_chkIsErrChannel(i_channelId in number)
  return varchar2 as
  v_o_IsErrChannel varchar2(4) := 'NO';
  v_cnt            number(3) := null;
  v_errMsg        varchar2(512) := null;
begin
  begin
    select /*+ NO_CONNECT_BY_COST_BASED */ count(1)
      into v_cnt
      from crm.channel cc
......

修改后,出现ORA-07445的两个库上执行查询SQL,不再报ORA-07445错误

关于紫砂壶

感悟技术人生
此条目发表在Oracle性能问题分类目录,贴了标签。将固定链接加入收藏夹。

4 则回应给 ORA-7445 opiptp故障诊断处理

  1. lfree说:

    20秒可接受不应该.
    你应该看看
    https://jonathanlewis.wordpress.com/2016/02/15/connect-by-2/
    也许有用…

  2. 紫砂壶说:

    20s业务能接受就行了,估计是一次性数据处理

  3. 紫砂壶说:

    只解决业务需要我们解决的问题 🙂

  4. lfree说:

    上午写的不见了.重新写一遍.

    当我查询ora-1436错误时,就明白开发要表达的思想.
    $ oerr ora 1436
    01436, 00000, “CONNECT BY loop in user data”
    // *Cause:
    // *Action:

    实际上使用connect by支持2个伪列,以及一个函数.结合起来就可以判断.

    例子:
    select CONNECT_BY_ROOT(empno),CONNECT_BY_ISLEAF,CONNECT_BY_ISCYCLE,emp.* from emp
    start with mgr is null connect by NOCYCLE prior empno = mgr;

评论已关闭。