11.2.0.4 打GI PSU补丁 hang住处理

1. 背景

今天新装了一套11.2.0.4的RAC,在两个节点同时打GI PSU 11.2.0.4.161018时,节点2顺利过去了,但是节点1却hang住了。
故障如下:

[root@xxxxxxdb01 ~]# opatch auto /tmp/24436338/24436338 -ocmrf /tmp/ocm.rsp
Executing /oracle/product/11.2.0/grid/perl/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/patch11203.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp -paramfile /oracle/product/11.2.0/grid/crs/install/crsconfig_params

This is the main log file: /oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log

This file will show your detected configuration and all the steps that opatchauto attempted to do on your system:
/oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.report.log

2017-02-09 08:42:37: Starting Clusterware Patch Setup
Using configuration parameter file: /oracle/product/11.2.0/grid/crs/install/crsconfig_params

Stopping CRS...
Stopped CRS successfully

patch /tmp/24436338/24436338/24006111  apply successful for home  /oracle/product/11.2.0/grid 
patch /tmp/24436338/24436338/23054319  apply successful for home  /oracle/product/11.2.0/grid 
patch /tmp/24436338/24436338/22502505  apply successful for home  /oracle/product/11.2.0/grid 

Starting CRS...
Installing Trace File Analyzer		--> hang住了

2. 处理过程

找了下oracle用户下的进程,发现有个unzip进程比较奇怪

[root@xxxxxxdb01 xxxxxxdb01]# ps -ef | grep oracle
root       567   419  0 08:55 pts/0    00:00:00 tail -f /oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log
root      5491  5141  0 09:08 pts/4    00:00:00 grep oracle
root     16513     1  0 Feb08 ?        00:02:11 /oracle/product/11.2.0/grid/jdk/jre/bin/java -Xms64m -Xmx256m -classpath /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/RATFA.jar:/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/je-4.0.103.jar:/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/ojdbc6.jar oracle.rat.tfa.TFAMain /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home
root     19927 19824  0 08:42 pts/1    00:00:00 /bin/sh /oracle/product/11.2.0/grid/OPatch/opatch auto /tmp/24436338/24436338 -ocmrf /tmp/ocm.rsp
root     19989 19927  0 08:42 pts/1    00:00:00 /usr/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/auto_patch.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp
root     19997 19989  0 08:42 pts/1    00:00:00 /oracle/product/11.2.0/grid/perl/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/patch11203.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp -paramfile /oracle/product/11.2.0/grid/crs/install/crsconfig_params
root     29976 19997  0 08:47 pts/1    00:00:00 /bin/sh /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh -silent -crshome /oracle/product/11.2.0/grid
root     30100 29976  0 08:47 pts/1    00:00:00 /usr/bin/unzip -q /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_install.29976.zip

进入了/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/目录,测试了下unzip -q tfa_install.29976.zip,发现可以解压。
现在Patch已经在打TFA了,TFA目前的设计是脱离RDBMS和CRS进行设计的。所以就想着先处理了,让GI patch lock掉,后面再处理TFA

[root@xxxxxxdb01 xxxxxxdb01]# kill -9 30100
杀掉这个unzip进程后,OPatch就继续进行下去了,并且CRS也顺利启动
Starting CRS...
Installing Trace File Analyzer
CRS-4123: Oracle High Availability Services has been started.

查看OPatch日志:/oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log

2017-02-09 09:08:47: Command output:
>  TFA Installation Log will be written to File : /tmp/tfa_install_29976_2017_02_09-08_47_37.log
>  
>  Starting TFA installation
>  
>  TFA Build Version: 121270 Build Date: 201603041405
>  Installed Build Version: 0 Build Date: 201308012341
>  
>  TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home...
>  tfahome_full.tar:  write error (disk full?).  Continue? (y/n/^C) /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh: line 1234: 30100 Killed                  $UNZIP -q $ZFILE
>  /bin/tar: ./tfa_home/resources/file_type_patterns_internal.xml: Wrote only 9728 of 10240 bytes
>  /bin/tar: ./tfa_home/resources/components_saas.xml: Cannot write: No space left on device
>  /bin/tar: ./tfa_home/resources/collect_all_directories.xml: Cannot open: No space left on device
>  /bin/tar: ./tfa_home/resources/ignorefiles.txt: Cannot open: No space left on device
>  /bin/tar: ./tfa_home/resources/directory_patterns.xml: Cannot open: No space left on device
>  /bin/tar: ./tfa_home/resources/directory_patterns_jcs.xml: Cannot open: No space left on device
>  /bin/tar: ./tfa_home/resources/date_patterns.xml: Cannot open: No space left on device
>  /bin/tar: ./tfa_home/resources/gdb_commands: Cannot open: No space left on device

可能是磁盘满了,检查文件系统,确实/tmp满了

[grid@xxxxxxdb01 ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-LogVol00
                      2.9G  611M  2.2G  22% /
tmpfs                  32G  136K   32G   1% /dev/shm
/dev/sda1             477M   83M  369M  19% /boot
/dev/mapper/VolGroup-LogVol02
                      7.8G   19M  7.4G   1% /home
/dev/mapper/VolGroup-LogVol05
                      4.8G  1.4G  3.3G  29% /opt
/dev/mapper/VolGroup-LogVol04
                      8.0G  8.0G   20M 100% /tmp			--> 满了
/dev/mapper/VolGroup-LogVol01
                      9.8G  5.0G  4.3G  54% /usr
/dev/mapper/VolGroup-LogVol03
                      8.0G  332M  7.7G   5% /var
/dev/mapper/VolGroup-lv_oracle
                       30G   19G   12G  63% /oracle
tmpfs                 4.0K     0  4.0K   0% /dev/vx
/dev/vx/dsk/xxxxxxdg/oradata
                      986G  577M  977G   1% /oradata
/dev/vx/dsk/xxxxxxdg/ocrvote
                      4.0G  105M  3.9G   3% /ocrvote

查看正常节点的OPatch日志:

2017-02-09 08:47:39: Installing Trace File Analyzer

2017-02-09 08:47:39: Executing cmd: /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh -silent -crshome /oracle/product/11.2.0/grid
2017-02-09 08:48:31: Command output:
>  TFA Installation Log will be written to File : /tmp/tfa_install_18417_2017_02_09-08_47_39.log
> 
>  Starting TFA installation
> 
>  TFA Build Version: 121270 Build Date: 201603041405
>  Installed Build Version: 0 Build Date: 201308012341
>
>  TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfa_home...
>  TFA patching CRS or DB from zipfile is written to /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfapatch.log
> 
>  TFA will be Patched on Node xxxxxxdb02:
> 
> 
>  Applying Patch on xxxxxxdb02:
> 
>  Stopping TFA Support Tools...
> 
>  Shutting down TFA for Patching...

检查TFA的状态,节点1没有tfactl命令

[grid@xxxxxxdb01 ~]$ tfactl print status
-bash: tfactl: command not found

节点2的TFA正常

[grid@xxxxxxdb02 ~]$ tfactl print status

.--------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID   | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+-------+------+------------+----------------------+------------------+
| xxxxxxdb02 | RUNNING       | 19403 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE         |
'------------+---------------+-------+------+------------+----------------------+------------------'

3. 处理TFA

TFA Collector – TFA with Database Support Tools Bundle (文档 ID 1513912.1)

从MOS 1513912.1文档上下载TFA安装包p21757377_121020_Generic.zip,上传到节点1的/tmp目录。
升级过程如下:只要在节点1上执行,会自动升级节点2的TFA

[root@xxxxxxdb01 tmp]# unzip -q p21757377_121020_Generic.zip -d ./tfa
[root@xxxxxxdb01 tmp]# cd tfa
[root@xxxxxxdb01 tfa]# ls -l
total 63228
-rwxr-xr-x 1 root root 63159734 Feb  7 03:12 installTFALite
-rw-r--r-- 1 root root     2096 Nov 23 04:27 README.txt
-rw-r--r-- 1 root root  1577349 Nov 21 12:26 TFA_User_Guide_12.1.2.8.4.pdf
[root@xxxxxxdb01 tfa]# ./installTFALite
TFA Installation Log will be written to File : /tmp/tfa_install_19751_2017_02_09-09_28_24.log

Starting TFA installation

TFA HOME : /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home
TFA Build Version: 121284 Build Date: 201702061110
Installed Build Version: 0 Build Date: 201308012341

TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home...
TFA patching CRS or DB from zipfile extracted to /tmp/.19751.tfa
TFA patching CRS or DB from zipfile is written to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfapatch.log

TFA will be Patched on: 
xxxxxxdb01
xxxxxxdb02

Do you want to continue with patching TFA? [Y|N] [Y]: Y

Checking for ssh equivalency in xxxxxxdb02
xxxxxxdb02 is configured for ssh user equivalency for root user

Creating ZIP: /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/internal/tfapatch.zip

Using SSH to patch TFA to remote nodes :

Applying Patch on xxxxxxdb02:

TFA_HOME: /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfa_home
Stopping TFA Support Tools...
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
Shutting down TFA
oracle-tfa stop/waiting
. . . . . 
Killing TFA running with pid 19403
. . . 
Successfully shutdown TFA..
Copying files from xxxxxxdb01 to xxxxxxdb02...
 Authorized only. All activity will be monitored and reported 

 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
Current version of Berkeley DB in  is 5.0.84, so no upgrade required
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
Running commands to fix init.tfa and tfactl in xxxxxxdb02...
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
Updating init.tfa in xxxxxxdb02...
 Authorized only. All activity will be monitored and reported 
Starting TFA in xxxxxxdb02...
 Authorized only. All activity will be monitored and reported 
Starting TFA..
oracle-tfa start/running, process 13859
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 
 Authorized only. All activity will be monitored and reported 

Enabling Access for Non-root Users on xxxxxxdb02...
 Authorized only. All activity will be monitored and reported 


Applying Patch on xxxxxxdb01:

Stopping TFA Support Tools...

Shutting down TFA for Patching...

Shutting down TFA
oracle-tfa stop/waiting
. . . . . 
Killing TFA running with pid 16513
. . . 
Successfully shutdown TFA..

Renaming /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib
Adding INSTALL_TYPE = GI to tfa_setup.txt
Copying /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/output/ to /oracle/grid/tfa/xxxxxxdb01/

The current version of Berkeley DB is 4.0.103
Copying je-4.1.27.jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/
Copying je-5.0.84.jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/
Running DbPreUpgrade_4_1 utility
Output of upgrade : Pre-upgrade succeeded

Copying TFA Certificates...
Moving config.properties.bkp to config.properties

Running commands to fix init.tfa and tfactl in localhost

Starting TFA in xxxxxxdb01...

Creating Sym Link /etc/rc.d/rc0.d/K17init.tfa to /etc/init.d/init.tfa
Creating Sym Link /etc/rc.d/rc1.d/K17init.tfa to /etc/init.d/init.tfa
Creating Sym Link /etc/rc.d/rc2.d/K17init.tfa to /etc/init.d/init.tfa
Creating Sym Link /etc/rc.d/rc4.d/K17init.tfa to /etc/init.d/init.tfa
Creating Sym Link /etc/rc.d/rc6.d/K17init.tfa to /etc/init.d/init.tfa
Starting TFA..
oracle-tfa start/running, process 21014
Waiting up to 100 seconds for TFA to be started..
. . . . . 
Successfully started TFA Process..
. . . . . 
TFA Started and listening for commands
Removing /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/je-4.0.103.jar

Enabling Access for Non-root Users on xxxxxxdb01...

Adding default users to TFA Access list...

.------------------------------------------------------------------.
| Host       | TFA Version | TFA Build ID         | Upgrade Status |
+------------+-------------+----------------------+----------------+
| xxxxxxdb01 |  12.1.2.8.4 | 12128420170206111019 | UPGRADED       |
| xxxxxxdb02 |  12.1.2.8.4 | 12128420170206111019 | UPGRADED       |
'------------+-------------+----------------------+----------------'

升级完检查两个节点的TFA状态,确认没有问题

[grid@xxxxxxdb01 ~]$ tfactl print status

.--------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID   | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+-------+------+------------+----------------------+------------------+
| xxxxxxdb01 | RUNNING       | 21071 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
| xxxxxxdb02 | RUNNING       | 13916 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
'------------+---------------+-------+------+------------+----------------------+------------------'

[grid@xxxxxxdb02 ~]$ tfactl print status

.--------------------------------------------------------------------------------------------------.
| Host       | Status of TFA | PID   | Port | Version    | Build ID             | Inventory Status |
+------------+---------------+-------+------+------------+----------------------+------------------+
| xxxxxxdb02 | RUNNING       | 13916 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
| xxxxxxdb01 | RUNNING       | 21071 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE         |
'------------+---------------+-------+------+------------+----------------------+------------------'

4. 检查软件状态

这毕竟不是正常的Patch过程,处理完后,检查一下两个节点的软件安装包

[grid@xxxxxxdb01 ~]$ cluvfy comp software -n xxxxxxdb01,xxxxxxdb02 -verbose

Verifying software 

Check: Software

  1178 files verified                 

Software check passed

Verification of software was successful. 


[grid@xxxxxxdb02 ~]$ cluvfy comp software -n xxxxxxdb01,xxxxxxdb02 -verbose

Verifying software 

Check: Software

  1178 files verified                 

Software check passed

Verification of software was successful. 

检查PSU补丁信息

[grid@xxxxxxdb01 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory | grep "Patch"
Oracle Interim Patch Installer version 11.2.0.3.12
OPatch version    : 11.2.0.3.12
Patch  22502505     : applied on Thu Feb 09 08:47:32 CST 2017
Unique Patch ID:  19880366
Patch description:  "ACFS Patch Set Update : 11.2.0.4.160419 (22502505)"
Patch  23054319     : applied on Thu Feb 09 08:47:09 CST 2017
Unique Patch ID:  20209287
Patch description:  "OCW Patch Set Update : 11.2.0.4.160719 (23054319)"
Patch  24006111     : applied on Thu Feb 09 08:46:24 CST 2017
Unique Patch ID:  20508568
Patch description:  "Database Patch Set Update : 11.2.0.4.161018 (24006111)"
Sub-patch  23054359; "Database Patch Set Update : 11.2.0.4.160719 (23054359)"
Sub-patch  22502456; "Database Patch Set Update : 11.2.0.4.160419 (22502456)"
Sub-patch  21948347; "Database Patch Set Update : 11.2.0.4.160119 (21948347)"
Sub-patch  21352635; "Database Patch Set Update : 11.2.0.4.8 (21352635)"
Sub-patch  20760982; "Database Patch Set Update : 11.2.0.4.7 (20760982)"
Sub-patch  20299013; "Database Patch Set Update : 11.2.0.4.6 (20299013)"
Sub-patch  19769489; "Database Patch Set Update : 11.2.0.4.5 (19769489)"
Sub-patch  19121551; "Database Patch Set Update : 11.2.0.4.4 (19121551)"
Sub-patch  18522509; "Database Patch Set Update : 11.2.0.4.3 (18522509)"
Sub-patch  18031668; "Database Patch Set Update : 11.2.0.4.2 (18031668)"
Sub-patch  17478514; "Database Patch Set Update : 11.2.0.4.1 (17478514)"
OPatch succeeded.

cluvfy和OPatch检查都正常,所以应该没有问题

关于紫砂壶

感悟技术人生
此条目发表在Oracle故障诊断分类目录,贴了标签。将固定链接加入收藏夹。