Oracle TFA系列2:分析数据

1. 简单分析

分析数据使用analyze,参考命令帮助

# /oracle/product/11.2.0/grid/bin/tfactl analyze -help


  Usage : /oracle/product/11.2.0/grid/bin/tfactl analyze [-search "pattern"] [-comp <db|asm|crs|acfs|os|osw|oswslabinfo|oratop|all> [-type <error|warning|generic>] [-since <n>[h|d]] [-from "MMM/DD/YYYY HH24:MI:SS"] [-to "MMM/DD/YYYY HH24:MI:SS"] [-for "MMM/DD/YYYY HH24:MI:SS"] [-node <all | local | n1,n2,..>] [-verbose] [-o <file>]

Options:
  -search "pattern"  Search for pattern in DB and CRS alert logs in past <n> [h]ours or [d]ays. 
                 Default value of <n> is 1h
  -comp Components to analyze. Default is all.
  -type Analyze messages of specified type. Default is error.
  -node Specify comma separated list of host names. Default is all 

analyze会分析所有节点,默认会分析所有组件。

#/oracle/product/11.2.0/grid/bin/tfactl analyze -since 4h
INFO: analyzing all (Alert and Unix System Logs) logs for the last 240 minutes...  Please wait...
INFO: analyzing host: xxxx1

                       Report title: Analysis of Alert,System Logs
                  Report date range: last ~4 hour(s)
         Report (default) time zone: CST - China Standard Time
                Analysis started at: 10-Feb-2017 12:03:39 PM CST
              Elapsed analysis time: 2 second(s).
                 Configuration file: /oracle/product/11.2.0/grid/tfa/xxxx1/tfa_home/ext/tnt/conf/tnt.prop
                Configuration group: all
                Total message count:         33,373, from 12-Mar-2015 08:45:55 PM CST to 10-Feb-2017 12:01:00 PM CST
  Messages matching last ~4 hour(s):            192, from 10-Feb-2017 08:04:18 AM CST to 10-Feb-2017 12:01:00 PM CST
        last ~4 hour(s) error count:              0
last ~4 hour(s) ignored error count:              0
 last ~4 hour(s) unique error count:              0

Message types for last ~4 hour(s)
   Occurrences percent  server name          type
   ----------- -------  -------------------- -----
           192  100.0%  xxxx1                generic
   ----------- -------
           192  100.0%

Unique error messages for last ~4 hour(s)
   Occurrences percent  server name          error
   ----------- -------  -------------------- -----
   ----------- -------
             0  100.0%
......

2. 分析组件

当然也可以使用-comp指定要分析的组件

# /oracle/product/11.2.0/grid/bin/tfactl analyze -comp db -since 4h

3. 分析ORA-报错

检查1天内有没有ORA-报错

# /oracle/product/11.2.0/grid/bin/tfactl analyze -search "ORA-" -since 1d
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes...  Please wait...
INFO: analyzing host: billhis01

                    Report title: Analysis of Alert,System Logs
               Report date range: last ~1 day(s)
      Report (default) time zone: CST - China Standard Time
             Analysis started at: 10-Feb-2017 12:25:17 PM CST
           Elapsed analysis time: 0 second(s).
              Configuration file: /oracle/product/11.2.0/grid/tfa/billhis01/tfa_home/ext/tnt/conf/tnt.prop
             Configuration group: all
                       Parameter: ORA-
             Total message count:          3,339, from 13-Apr-2016 09:42:54 AM CST to 10-Feb-2017 12:13:23 PM CST
Messages matching last ~1 day(s):            142, from 09-Feb-2017 12:33:17 PM CST to 10-Feb-2017 12:13:23 PM CST
                  Matching regex: ORA-
                  Case sensitive: false
                     Match count: 0

INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes...  Please wait...
INFO: analyzing host: billhis02

                    Report title: Analysis of Alert,System Logs
               Report date range: last ~1 day(s)
      Report (default) time zone: CST - China Standard Time
             Analysis started at: 10-Feb-2017 12:25:17 PM CST
           Elapsed analysis time: 9 second(s).
              Configuration file: /oracle/product/11.2.0/grid/tfa/billhis02/tfa_home/ext/tnt/conf/tnt.prop
             Configuration group: all
                       Parameter: ORA-
             Total message count:         91,472, from 13-Apr-2016 09:52:00 AM CST to 10-Feb-2017 12:12:50 PM CST
Messages matching last ~1 day(s):            278, from 09-Feb-2017 05:04:43 PM CST to 10-Feb-2017 12:12:50 PM CST
                  Matching regex: ORA-
                  Case sensitive: false
                     Match count: 4

[Source: /oracle/diag/rdbms/xxxxx/xxxxx2/trace/alert_xxxxx2.log, Line: 8681]
Feb 09 17:04:43 2017
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01157: cannot identify/lock data file 284 - see DBWR trace file
ORA-01110: data file 284: '/oradata05/xxxxx/NT_ACCT_TBS_10.dbf'
ORA-17503: ksfdopn:4 Failed to open file /oradata05/xxxxx/NT_ACCT_TBS_10.dbf
ORA-17500: ODM err:File does not exist
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01186: file 284 failed verification tests
ORA-01157: cannot identify/lock data file 284 - see DBWR trace file
ORA-01110: data file 284: '/oradata05/xxxxx/NT_ACCT_TBS_10.dbf'
File 284 not verified due to error ORA-01157
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01157: cannot identify/lock data file 284 - see DBWR trace file
ORA-01110: data file 284: '/oradata05/xxxxx/NT_ACCT_TBS_10.dbf'
ORA-17503: ksfdopn:4 Failed to open file /oradata05/xxxxx/NT_ACCT_TBS_10.dbf
ORA-17500: ODM err:File does not exist
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01186: file 284 failed verification tests
ORA-01157: cannot identify/lock data file 284 - see DBWR trace file
ORA-01110: data file 284: '/oradata05/xxxxx/NT_ACCT_TBS_10.dbf'
File 284 not verified due to error ORA-01157

[Source: /oracle/diag/rdbms/xxxxx/xxxxx2/trace/alert_xxxxx2.log, Line: 8702]
Feb 09 17:07:27 2017
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01157: cannot identify/lock data file 285 - see DBWR trace file
ORA-01110: data file 285: '/oradata05/xxxxx/NT_ACCT_TBS_11.dbf'
ORA-17503: ksfdopn:4 Failed to open file /oradata05/xxxxx/NT_ACCT_TBS_11.dbf
ORA-17500: ODM err:File does not exist
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01186: file 285 failed verification tests
ORA-01157: cannot identify/lock data file 285 - see DBWR trace file
ORA-01110: data file 285: '/oradata05/xxxxx/NT_ACCT_TBS_11.dbf'
File 285 not verified due to error ORA-01157
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01157: cannot identify/lock data file 285 - see DBWR trace file
ORA-01110: data file 285: '/oradata05/xxxxx/NT_ACCT_TBS_11.dbf'
ORA-17503: ksfdopn:4 Failed to open file /oradata05/xxxxx/NT_ACCT_TBS_11.dbf
ORA-17500: ODM err:File does not exist
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_dbw0_192015.trc:
ORA-01186: file 285 failed verification tests
ORA-01157: cannot identify/lock data file 285 - see DBWR trace file
ORA-01110: data file 285: '/oradata05/xxxxx/NT_ACCT_TBS_11.dbf'
File 285 not verified due to error ORA-01157

[Source: /oracle/diag/rdbms/xxxxx/xxxxx2/trace/alert_xxxxx2.log, Line: 8732]
Feb 09 22:00:12 2017
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_j001_205119.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_961"
ORA-01157: cannot identify/lock data file 218 - see DBWR trace file
ORA-01110: data file 218: '/oradata05/xxxxx/NJ_ACCT_BAK_01_TBS_25.dbf'
ORA-06512: at "SYS.DBMS_ADVISOR", line 201
ORA-06512: at "SYS.DBMS_SPACE", line 2480
ORA-06512: at "SYS.DBMS_SPACE", line 2553

[Source: /oracle/diag/rdbms/xxxxx/xxxxx2/trace/alert_xxxxx2.log, Line: 8740]
Feb 09 22:00:28 2017
Errors in file /oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_smon_192027.trc:
ORA-01157: cannot identify/lock data file 183 - see DBWR trace file
ORA-01110: data file 183: '/oradata05/xxxxx/XZ_ACCT_BAK_01_TBS_4.dbf'

4. 分析OSWatch

最新版本的TFA已经自带了OSWatch,不再需要OSWatch再单独安装,这给出问题时的诊断带来了极大的方便
OSWatch的安装目录是:/oracle/product/11.2.0/grid/tfa/billhis01/tfa_home/ext/oswbb
启动的OSW进程

# ps -ef | grep OSW
grid     176005      1  0 12:13 ?        00:00:00 /bin/sh ./OSWatcher.sh 30 48 NONE /oracle/grid/tfa/repository/suptools/billhis01/oswbb/grid/archive
grid     177419 176005  0 12:14 ?        00:00:00 /bin/sh ./OSWatcherFM.sh 48 /oracle/grid/tfa/repository/suptools/billhis01/oswbb/grid/archive

OSWatch收集数据存放的位置:/oracle/grid/tfa/repository/suptools/billhis01/oswbb/grid/archive

-comp osw只是显示top summary

[root@bsseopdb01 ~]# /oracle/product/11.2.0/grid/bin/tfactl analyze -comp osw -since 6h
INFO: analyzing host: bsseopdb01

                     Report title: OSW top logs
                Report date range: last ~6 hour(s)
       Report (default) time zone: CST - China Standard Time
              Analysis started at: 10-Feb-2017 10:40:06 AM CST
            Elapsed analysis time: 0 second(s).
               Configuration file: /oracle/product/11.2.0/grid/tfa/bsseopdb01/tfa_home/ext/tnt/conf/tnt.prop
              Configuration group: osw
                        Parameter: 
              Total osw rec count:            798, from 10-Feb-2017 04:00:21 AM CST to 10-Feb-2017 10:39:54 AM CST
OSW recs matching last ~6 hour(s):            718, from 10-Feb-2017 04:40:27 AM CST to 10-Feb-2017 10:39:54 AM CST
                        statistic: t     first   highest   (time)   lowest   (time)  average  non zero  3rd last  2nd last      last  trend
                  top.cpu.util.hi: %       0.0       0.1 @07:53AM      0.0 @04:40AM      0.0         1       0.0       0.0       0.0    n/a
                  top.cpu.util.si: %       0.0       0.1 @04:43AM      0.0 @04:40AM      0.0        74       0.0       0.0       0.1    n/a
                  top.cpu.util.sy: %       0.4       2.7 @09:35AM      0.3 @05:29AM      0.6       718       0.4       1.5       0.9   125%
                  top.cpu.util.us: %       0.9      13.8 @09:00AM      0.3 @10:31AM      1.0       718       0.5       1.0       1.0    11%
                  top.cpu.util.wa: %       0.0       0.1 @06:15AM      0.0 @04:40AM      0.0         3       0.0       0.0       0.0    n/a
            top.loadavg.last01min:        1.21      1.95 @10:20AM     1.00 @06:03AM     1.16       677      1.02      1.07      1.12    -7%
            top.loadavg.last05min:        1.18      1.37 @05:21AM     1.04 @06:04AM     1.15       677      1.09      1.10      1.10    -6%
            top.loadavg.last15min:        1.13      1.22 @10:20AM     1.06 @07:54AM     1.12       677      1.11      1.11      1.11    -1%
                     top.mem.free: k   5594072   5597176 @04:44AM  5418916 @09:05AM  5529992       718   5476040   5468508   5472056    -2%
                 top.tasks.zombie:           0         1 @04:47AM        0 @04:40AM        0         3         0         0         0    n/a
                        top.users:           0         2 @09:18AM        0 @04:40AM        0        64         0         0         0    n/a

查看oswslabinfo概要信息

# /oracle/product/11.2.0/grid/bin/tfactl analyze -comp oswslabinfo -since 1h

可以通过slab的增长情况观察有没有问题

TFA文章系列:
Oracle TFA系列1:介绍与升级
Oracle TFA系列2:分析数据
TFA进程太多的Bug
11.2.0.4 打GI PSU补丁 hang住处理

关于紫砂壶

感悟技术人生
此条目发表在工具分类目录,贴了标签。将固定链接加入收藏夹。