1. 背景
今天新装了一套11.2.0.4的RAC,在两个节点同时打GI PSU 11.2.0.4.161018时,节点2顺利过去了,但是节点1却hang住了。
故障如下:
[root@xxxxxxdb01 ~]# opatch auto /tmp/24436338/24436338 -ocmrf /tmp/ocm.rsp Executing /oracle/product/11.2.0/grid/perl/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/patch11203.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp -paramfile /oracle/product/11.2.0/grid/crs/install/crsconfig_params This is the main log file: /oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log This file will show your detected configuration and all the steps that opatchauto attempted to do on your system: /oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.report.log 2017-02-09 08:42:37: Starting Clusterware Patch Setup Using configuration parameter file: /oracle/product/11.2.0/grid/crs/install/crsconfig_params Stopping CRS... Stopped CRS successfully patch /tmp/24436338/24436338/24006111 apply successful for home /oracle/product/11.2.0/grid patch /tmp/24436338/24436338/23054319 apply successful for home /oracle/product/11.2.0/grid patch /tmp/24436338/24436338/22502505 apply successful for home /oracle/product/11.2.0/grid Starting CRS... Installing Trace File Analyzer --> hang住了
2. 处理过程
找了下oracle用户下的进程,发现有个unzip进程比较奇怪
[root@xxxxxxdb01 xxxxxxdb01]# ps -ef | grep oracle root 567 419 0 08:55 pts/0 00:00:00 tail -f /oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log root 5491 5141 0 09:08 pts/4 00:00:00 grep oracle root 16513 1 0 Feb08 ? 00:02:11 /oracle/product/11.2.0/grid/jdk/jre/bin/java -Xms64m -Xmx256m -classpath /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/RATFA.jar:/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/je-4.0.103.jar:/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar/ojdbc6.jar oracle.rat.tfa.TFAMain /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home root 19927 19824 0 08:42 pts/1 00:00:00 /bin/sh /oracle/product/11.2.0/grid/OPatch/opatch auto /tmp/24436338/24436338 -ocmrf /tmp/ocm.rsp root 19989 19927 0 08:42 pts/1 00:00:00 /usr/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/auto_patch.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp root 19997 19989 0 08:42 pts/1 00:00:00 /oracle/product/11.2.0/grid/perl/bin/perl /oracle/product/11.2.0/grid/OPatch/crs/patch11203.pl -patchdir /tmp/24436338 -patchn 24436338 -ocmrf /tmp/ocm.rsp -paramfile /oracle/product/11.2.0/grid/crs/install/crsconfig_params root 29976 19997 0 08:47 pts/1 00:00:00 /bin/sh /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh -silent -crshome /oracle/product/11.2.0/grid root 30100 29976 0 08:47 pts/1 00:00:00 /usr/bin/unzip -q /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_install.29976.zip
进入了/oracle/product/11.2.0/grid/tfa/xxxxxxdb01/目录,测试了下unzip -q tfa_install.29976.zip,发现可以解压。
现在Patch已经在打TFA了,TFA目前的设计是脱离RDBMS和CRS进行设计的。所以就想着先处理了,让GI patch lock掉,后面再处理TFA
[root@xxxxxxdb01 xxxxxxdb01]# kill -9 30100 杀掉这个unzip进程后,OPatch就继续进行下去了,并且CRS也顺利启动 Starting CRS... Installing Trace File Analyzer CRS-4123: Oracle High Availability Services has been started.
查看OPatch日志:/oracle/product/11.2.0/grid/cfgtoollogs/opatchauto2017-02-09_08-42-37.log
2017-02-09 09:08:47: Command output: > TFA Installation Log will be written to File : /tmp/tfa_install_29976_2017_02_09-08_47_37.log > > Starting TFA installation > > TFA Build Version: 121270 Build Date: 201603041405 > Installed Build Version: 0 Build Date: 201308012341 > > TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home... > tfahome_full.tar: write error (disk full?). Continue? (y/n/^C) /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh: line 1234: 30100 Killed $UNZIP -q $ZFILE > /bin/tar: ./tfa_home/resources/file_type_patterns_internal.xml: Wrote only 9728 of 10240 bytes > /bin/tar: ./tfa_home/resources/components_saas.xml: Cannot write: No space left on device > /bin/tar: ./tfa_home/resources/collect_all_directories.xml: Cannot open: No space left on device > /bin/tar: ./tfa_home/resources/ignorefiles.txt: Cannot open: No space left on device > /bin/tar: ./tfa_home/resources/directory_patterns.xml: Cannot open: No space left on device > /bin/tar: ./tfa_home/resources/directory_patterns_jcs.xml: Cannot open: No space left on device > /bin/tar: ./tfa_home/resources/date_patterns.xml: Cannot open: No space left on device > /bin/tar: ./tfa_home/resources/gdb_commands: Cannot open: No space left on device
可能是磁盘满了,检查文件系统,确实/tmp满了
[grid@xxxxxxdb01 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-LogVol00 2.9G 611M 2.2G 22% / tmpfs 32G 136K 32G 1% /dev/shm /dev/sda1 477M 83M 369M 19% /boot /dev/mapper/VolGroup-LogVol02 7.8G 19M 7.4G 1% /home /dev/mapper/VolGroup-LogVol05 4.8G 1.4G 3.3G 29% /opt /dev/mapper/VolGroup-LogVol04 8.0G 8.0G 20M 100% /tmp --> 满了 /dev/mapper/VolGroup-LogVol01 9.8G 5.0G 4.3G 54% /usr /dev/mapper/VolGroup-LogVol03 8.0G 332M 7.7G 5% /var /dev/mapper/VolGroup-lv_oracle 30G 19G 12G 63% /oracle tmpfs 4.0K 0 4.0K 0% /dev/vx /dev/vx/dsk/xxxxxxdg/oradata 986G 577M 977G 1% /oradata /dev/vx/dsk/xxxxxxdg/ocrvote 4.0G 105M 3.9G 3% /ocrvote
查看正常节点的OPatch日志:
2017-02-09 08:47:39: Installing Trace File Analyzer 2017-02-09 08:47:39: Executing cmd: /oracle/product/11.2.0/grid/crs/install/tfa_setup.sh -silent -crshome /oracle/product/11.2.0/grid 2017-02-09 08:48:31: Command output: > TFA Installation Log will be written to File : /tmp/tfa_install_18417_2017_02_09-08_47_39.log > > Starting TFA installation > > TFA Build Version: 121270 Build Date: 201603041405 > Installed Build Version: 0 Build Date: 201308012341 > > TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfa_home... > TFA patching CRS or DB from zipfile is written to /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfapatch.log > > TFA will be Patched on Node xxxxxxdb02: > > > Applying Patch on xxxxxxdb02: > > Stopping TFA Support Tools... > > Shutting down TFA for Patching...
检查TFA的状态,节点1没有tfactl命令
[grid@xxxxxxdb01 ~]$ tfactl print status -bash: tfactl: command not found
节点2的TFA正常
[grid@xxxxxxdb02 ~]$ tfactl print status .--------------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +------------+---------------+-------+------+------------+----------------------+------------------+ | xxxxxxdb02 | RUNNING | 19403 | 5000 | 12.1.2.7.0 | 12127020160304140533 | COMPLETE | '------------+---------------+-------+------+------------+----------------------+------------------'
3. 处理TFA
TFA Collector – TFA with Database Support Tools Bundle (文档 ID 1513912.1)
从MOS 1513912.1文档上下载TFA安装包p21757377_121020_Generic.zip,上传到节点1的/tmp目录。
升级过程如下:只要在节点1上执行,会自动升级节点2的TFA
[root@xxxxxxdb01 tmp]# unzip -q p21757377_121020_Generic.zip -d ./tfa [root@xxxxxxdb01 tmp]# cd tfa [root@xxxxxxdb01 tfa]# ls -l total 63228 -rwxr-xr-x 1 root root 63159734 Feb 7 03:12 installTFALite -rw-r--r-- 1 root root 2096 Nov 23 04:27 README.txt -rw-r--r-- 1 root root 1577349 Nov 21 12:26 TFA_User_Guide_12.1.2.8.4.pdf [root@xxxxxxdb01 tfa]# ./installTFALite TFA Installation Log will be written to File : /tmp/tfa_install_19751_2017_02_09-09_28_24.log Starting TFA installation TFA HOME : /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home TFA Build Version: 121284 Build Date: 201702061110 Installed Build Version: 0 Build Date: 201308012341 TFA is already installed. Patching /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home... TFA patching CRS or DB from zipfile extracted to /tmp/.19751.tfa TFA patching CRS or DB from zipfile is written to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfapatch.log TFA will be Patched on: xxxxxxdb01 xxxxxxdb02 Do you want to continue with patching TFA? [Y|N] [Y]: Y Checking for ssh equivalency in xxxxxxdb02 xxxxxxdb02 is configured for ssh user equivalency for root user Creating ZIP: /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/internal/tfapatch.zip Using SSH to patch TFA to remote nodes : Applying Patch on xxxxxxdb02: TFA_HOME: /oracle/product/11.2.0/grid/tfa/xxxxxxdb02/tfa_home Stopping TFA Support Tools... Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Shutting down TFA oracle-tfa stop/waiting . . . . . Killing TFA running with pid 19403 . . . Successfully shutdown TFA.. Copying files from xxxxxxdb01 to xxxxxxdb02... Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Current version of Berkeley DB in is 5.0.84, so no upgrade required Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Running commands to fix init.tfa and tfactl in xxxxxxdb02... Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Updating init.tfa in xxxxxxdb02... Authorized only. All activity will be monitored and reported Starting TFA in xxxxxxdb02... Authorized only. All activity will be monitored and reported Starting TFA.. oracle-tfa start/running, process 13859 Waiting up to 100 seconds for TFA to be started.. . . . . . Successfully started TFA Process.. . . . . . TFA Started and listening for commands Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Authorized only. All activity will be monitored and reported Enabling Access for Non-root Users on xxxxxxdb02... Authorized only. All activity will be monitored and reported Applying Patch on xxxxxxdb01: Stopping TFA Support Tools... Shutting down TFA for Patching... Shutting down TFA oracle-tfa stop/waiting . . . . . Killing TFA running with pid 16513 . . . Successfully shutdown TFA.. Renaming /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib Adding INSTALL_TYPE = GI to tfa_setup.txt Copying /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/output/ to /oracle/grid/tfa/xxxxxxdb01/ The current version of Berkeley DB is 4.0.103 Copying je-4.1.27.jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/ Copying je-5.0.84.jar to /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/ Running DbPreUpgrade_4_1 utility Output of upgrade : Pre-upgrade succeeded Copying TFA Certificates... Moving config.properties.bkp to config.properties Running commands to fix init.tfa and tfactl in localhost Starting TFA in xxxxxxdb01... Creating Sym Link /etc/rc.d/rc0.d/K17init.tfa to /etc/init.d/init.tfa Creating Sym Link /etc/rc.d/rc1.d/K17init.tfa to /etc/init.d/init.tfa Creating Sym Link /etc/rc.d/rc2.d/K17init.tfa to /etc/init.d/init.tfa Creating Sym Link /etc/rc.d/rc4.d/K17init.tfa to /etc/init.d/init.tfa Creating Sym Link /etc/rc.d/rc6.d/K17init.tfa to /etc/init.d/init.tfa Starting TFA.. oracle-tfa start/running, process 21014 Waiting up to 100 seconds for TFA to be started.. . . . . . Successfully started TFA Process.. . . . . . TFA Started and listening for commands Removing /oracle/product/11.2.0/grid/tfa/xxxxxxdb01/tfa_home/jlib/je-4.0.103.jar Enabling Access for Non-root Users on xxxxxxdb01... Adding default users to TFA Access list... .------------------------------------------------------------------. | Host | TFA Version | TFA Build ID | Upgrade Status | +------------+-------------+----------------------+----------------+ | xxxxxxdb01 | 12.1.2.8.4 | 12128420170206111019 | UPGRADED | | xxxxxxdb02 | 12.1.2.8.4 | 12128420170206111019 | UPGRADED | '------------+-------------+----------------------+----------------'
升级完检查两个节点的TFA状态,确认没有问题
[grid@xxxxxxdb01 ~]$ tfactl print status .--------------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +------------+---------------+-------+------+------------+----------------------+------------------+ | xxxxxxdb01 | RUNNING | 21071 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE | | xxxxxxdb02 | RUNNING | 13916 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE | '------------+---------------+-------+------+------------+----------------------+------------------' [grid@xxxxxxdb02 ~]$ tfactl print status .--------------------------------------------------------------------------------------------------. | Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status | +------------+---------------+-------+------+------------+----------------------+------------------+ | xxxxxxdb02 | RUNNING | 13916 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE | | xxxxxxdb01 | RUNNING | 21071 | 5000 | 12.1.2.8.4 | 12128420170206111019 | COMPLETE | '------------+---------------+-------+------+------------+----------------------+------------------'
4. 检查软件状态
这毕竟不是正常的Patch过程,处理完后,检查一下两个节点的软件安装包
[grid@xxxxxxdb01 ~]$ cluvfy comp software -n xxxxxxdb01,xxxxxxdb02 -verbose Verifying software Check: Software 1178 files verified Software check passed Verification of software was successful. [grid@xxxxxxdb02 ~]$ cluvfy comp software -n xxxxxxdb01,xxxxxxdb02 -verbose Verifying software Check: Software 1178 files verified Software check passed Verification of software was successful.
检查PSU补丁信息
[grid@xxxxxxdb01 ~]$ $ORACLE_HOME/OPatch/opatch lsinventory | grep "Patch" Oracle Interim Patch Installer version 11.2.0.3.12 OPatch version : 11.2.0.3.12 Patch 22502505 : applied on Thu Feb 09 08:47:32 CST 2017 Unique Patch ID: 19880366 Patch description: "ACFS Patch Set Update : 11.2.0.4.160419 (22502505)" Patch 23054319 : applied on Thu Feb 09 08:47:09 CST 2017 Unique Patch ID: 20209287 Patch description: "OCW Patch Set Update : 11.2.0.4.160719 (23054319)" Patch 24006111 : applied on Thu Feb 09 08:46:24 CST 2017 Unique Patch ID: 20508568 Patch description: "Database Patch Set Update : 11.2.0.4.161018 (24006111)" Sub-patch 23054359; "Database Patch Set Update : 11.2.0.4.160719 (23054359)" Sub-patch 22502456; "Database Patch Set Update : 11.2.0.4.160419 (22502456)" Sub-patch 21948347; "Database Patch Set Update : 11.2.0.4.160119 (21948347)" Sub-patch 21352635; "Database Patch Set Update : 11.2.0.4.8 (21352635)" Sub-patch 20760982; "Database Patch Set Update : 11.2.0.4.7 (20760982)" Sub-patch 20299013; "Database Patch Set Update : 11.2.0.4.6 (20299013)" Sub-patch 19769489; "Database Patch Set Update : 11.2.0.4.5 (19769489)" Sub-patch 19121551; "Database Patch Set Update : 11.2.0.4.4 (19121551)" Sub-patch 18522509; "Database Patch Set Update : 11.2.0.4.3 (18522509)" Sub-patch 18031668; "Database Patch Set Update : 11.2.0.4.2 (18031668)" Sub-patch 17478514; "Database Patch Set Update : 11.2.0.4.1 (17478514)" OPatch succeeded.
cluvfy和OPatch检查都正常,所以应该没有问题