測试oracle 11g cluster 中OLR的重要性
called an Oracle Local Registry (OLR): each node in a cluster has a local registry for node-specific resources
測试一:模拟olr异常丢失的情况:
这里首先将olr renam [root@vmrac2 cdata]# mv vmrac2.olr vmrac2.olr.bak
然后尝试去启动crs
[root@vmrac2 cdata]# crsctl start crs CRS-4124: Oracle High Availability Services startup failed. CRS-4000: Command Start failed, or completed with errors.然后我们观察下 集群alert log的日志输出情况:
[grid@vmrac2 vmrac2]$ tailf alertvmrac2.log
[ohasd(2495)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in
/u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
2014-06-16 16:51:59.491 [ohasd(2506)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in
/u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
2014-06-16 16:51:59.698 [ohasd(2517)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in
/u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
2014-06-16 16:51:59.901 [ohasd(2528)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in
/u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
2014-06-16 16:52:00.113 [ohasd(2539)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage Operating System error [No such file or directory] [2]]. Details at (:OHAS00106:) in
/u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
[client(2554)]CRS-10001:CRS-10132: No msg for has:crs-10132 [10][60] 2014-06-16 16:56:00.720 [ohasd(2717)]CRS-2112:The OLR service started on node vmrac2. 2014-06-16 16:56:00.788 [ohasd(2717)]CRS-1301:Oracle High Availability Service started on node vmrac2. 2014-06-16 16:56:00.855 [ohasd(2717)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errorsoccurred
2014-06-16 16:56:01.836 [/u02/app/11.2.0.3/grid/bin/orarootagent.bin(2768)]CRS-5016:Process "/u02/app/11.2.0.3/grid/bin/acfsload" spawned by agent"/u02/app/11.2.0.3/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in
"/u02/app/11.2.0.3/grid/log/vmrac2/agent/ohasd/orarootagent_root/orarootagent_root.log"
2014-06-16 16:56:19.876[ohasd(2717)]CRS-2302:Cannot get GPnP profile.Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2014-06-16 16:56:19.909 [gpnpd(2873)]CRS-2328:GPNPD started on node vmrac2. 2014-06-16 16:56:22.751 [cssd(2947)]CRS-1713:CSSD daemon is started in clustered mode 2014-06-16 16:56:24.073 [ohasd(2717)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2014-06-16 16:56:32.512 [cssd(2947)]CRS-1707:Lease acquisition for node vmrac2 number 2 completed 2014-06-16 16:56:33.798 [cssd(2947)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /u02/app/11.2.0.3/grid/log/vmrac2/cssd/ocssd.log. 2014-06-16 16:56:40.342 [cssd(2947)]CRS-1601:CSSD Reconfiguration complete. Active nodes are vmrac1 vmrac2 . 2014-06-16 16:56:42.635 [ctssd(3009)]CRS-2401:The Cluster Time Synchronization Service started on host vmrac2. 2014-06-16 16:56:42.635 [ctssd(3009)]CRS-2407:The new Cluster Time Synchronization Service reference node is host vmrac1. 2014-06-16 16:56:46.726 [ctssd(3009)]CRS-2408:The clock on host vmrac2 has been updated by the Cluster Time Synchronization Service to besynchronous with the mean cluster time.
[client(3047)]CRS-10001:16-Jun-14 16:56 ACFS-9391: Checking for existing ADVM/ACFS installation. [client(3060)]CRS-10001:16-Jun-14 16:56 ACFS-9392: Validating ADVM/ACFS installation files for operating system. [client(3062)]CRS-10001:16-Jun-14 16:56 ACFS-9393: Verifying ASM Administrator setup. [client(3065)]CRS-10001:16-Jun-14 16:56 ACFS-9308: Loading installed ADVM/ACFS drivers. [client(3069)]CRS-10001:16-Jun-14 16:56 ACFS-9154: Loading 'oracleoks.ko' driver. [client(3080)]CRS-10001:16-Jun-14 16:56 ACFS-9154: Loading 'oracleadvm.ko' driver. [client(3096)]CRS-10001:16-Jun-14 16:56 ACFS-9154: Loading 'oracleacfs.ko' driver. [client(3180)]CRS-10001:16-Jun-14 16:56 ACFS-9327: Verifying ADVM/ACFS devices. [client(3183)]CRS-10001:16-Jun-14 16:56 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'. [client(3187)]CRS-10001:16-Jun-14 16:56 ACFS-9156: Detecting control device '/dev/ofsctl'. [client(3193)]CRS-10001:16-Jun-14 16:56 ACFS-9322: completed 測试二:清空olr的内容,使用一个空文件来取代:观察alert.log内容例如以下:
[ohasd(5451)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error while
accessing the physical storage]. Details at (:OHAS00106:) in /u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
2014-06-16 17:19:02.723 [ohasd(5462)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-26: Error whileaccessing the physical storage]. Details at (:OHAS00106:) in /u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log.
[client(5477)]CRS-10001:CRS-10132: No msg for has:crs-10132 [10][60] 观察对应的ohasd.log 日志的内容:[grid@vmrac2 vmrac2]$ tail -300 /u02/app/11.2.0.3/grid/log/vmrac2/ohasd/ohasd.log
2014-06-16 17:19:02.722: [ OCROSD][1923920288]utread:3: Problem reading buffer 150c4000 buflen 4096 retval 0 phy_offset102400 retry 5
2014-06-16 17:19:02.722: [ OCRRAW][1923920288]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format.2014-06-16 17:19:02.722: [ OCRRAW][1923920288]proprioini: all disks are not OCR/OLR formatted 2014-06-16 17:19:02.722: [ OCRRAW][1923920288]proprinit: Could not open raw device 2014-06-16 17:19:02.722: [ OCRAPI][1923920288]a_init:16!: Backend init unsuccessful : [26] 2014-06-16 17:19:02.723: [ CRSOCR][1923920288] OCR context init failure. Error: PROCL-26: Error while accessing thephysical storage
2014-06-16 17:19:02.723: [ default][1923920288] Created alert : (:OHAS00106:) : OLR initialization failed, error: PROCL-26: Error while accessing the physical storage
2014-06-16 17:19:02.723: [ default][1923920288][PANIC] OHASD exiting; Could not init OLR 2014-06-16 17:19:02.723: [ default][1923920288] Done总结:
依据上面的測试 能够发现ohasd (Oracle High Availability Service) 依赖于 olr (Oracle Local Registry)中的配置信息 假设olr 异常,或者丢失都会导致ohasd 进程启动失败。