如何定位rac节点重启原因(1)–正常停止crs,正常的日志消息记录
这篇是给以后该系列文章的一个基础,来说明rac中节点重启的诊断方式和日志是如何记录及输出的。
能够引起节点重启的进程相关如下:
1、首先查看ocss守护进程是否存在
该进程属于oracle主要监控rdbms实例是否运行正常,以及在该节点实例是否可见,通过初始的脚本将节点kill。由inittab中的/etc/init.d/init.cssd启动。
[oracle@rac1 ~]$ ps -ef | grep ocss | grep -v grep oracle 5463 4869 2 11:02 ? 00:00:14 /opt/ora10g/product/crs/bin/ocssd.bin [root@rac1 ~]# ps -ef | grep fatal | grep -v grep root 4062 1 1 14:35 ? 00:00:02 /bin/sh /etc/init.d/init.cssd fatal
2、进程oprocd(10.2.0.4以后才有)
该进程主要是监测节点的硬件状态,如果一个主机资源无法释放,或者主机资源耗尽未给oracle可用资源,则该进程会将节点kill
[root@rac1 ~]# ps -ef | grep oprocd | grep -v grep root 4984 4054 0 14:42 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd root 5534 4984 0 14:43 ? 00:00:00 /opt/ora10g/product/crs/bin/oprocd.bin run -t 1000 -m 500 -f
3、进程OCLSOMON(10.2.0.2以后才有)
该进程主要用来监控css进程是否hang住,一旦发现出现文件,就会将节点重启。除非是一个oracle自己可感知的hang信息。
[root@rac1 ~]# ps -ef | grep oclsomon | grep -v grep root 4994 4054 0 14:42 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon root 5539 4994 0 14:43 ? 00:00:00 /sbin/runuser -l oracle -c /bin/sh -c 'cd /opt/ora10g/product/crs/log/rac1/cssd/oclsomon; ulimit -c unlimited; /opt/ora10g/product/crs/bin/oclsomon || exit $?' oracle 5540 5539 0 14:43 ? 00:00:00 /bin/sh -c cd /opt/ora10g/product/crs/log/rac1/cssd/oclsomon; ulimit -c unlimited; /opt/ora10g/product/crs/bin/oclsomon || exit $? oracle 5572 5540 0 14:43 ? 00:00:00 /opt/ora10g/product/crs/bin/oclsomon.bin
定位步骤
1、查看各节点主机messages信息
* Messages file locations:
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
Tru64: /var/adm/messages
Linux: /var/log/messages
IBM: /bin/errpt -a > messages.out
2、查看各个节点的css进程日志
** CSS log locations:
11.1 and 10.2:
10.1:
3、查看各个节点的Oprocd 进程日志
*** Oprocd log locations:
In /etc/oracle/oprocd or /var/opt/oracle/oprocd depending on version/platform.
情景1:人为停掉其中一个节点的crs,看该节点的日志消息记录情况。
[root@rac2 ~]# /opt/ora10g/product/crs/bin/crsctl stop crs Stopping resources. This could take several minutes. Successfully stopped CRS resources. Stopping CSSD. Shutting down CSS daemon. Shutdown request successfully issued.
主机messages记录如下:
[root@rac2 ~]# tail -f /var/log/messages Jan 29 21:44:40 rac2 kernel: hrtimer: interrupt took 3370332 ns Jan 29 21:48:55 rac2 avahi-daemon[3700]: Withdrawing address record for 192.168.137.154 on eth0. Jan 29 21:49:37 rac2 logger: Oracle CSS family monitor shutting down gracefully. Jan 29 21:49:37 rac2 logger: Oracle clsomon shutdown successful. Jan 29 21:49:40 rac2 logger: Oracle CSSD graceful shutdown Jan 29 21:49:40 rac2 logger: Oprocd received graceful shutdown request. Shutting down.
数据库节点的alert记录如下:
[oracle@rac2 bdump]$ tail -f alert*.log Tue Jan 29 21:48:50 CST 2013 Shutting down instance (abort) --是crs将资源(instance)直接shutdown abort License high water mark = 7 Termination issued to instance processes. Waiting for the processes to exit Tue Jan 29 21:49:01 CST 2013 Instance termination failed to kill one or more processes Tue Jan 29 21:49:31 CST 2013 Instance terminated by USER, pid = 23459
节点ocss日志记录如下:
[root@rac2 cssd]# tail -f ocssd.log [ CSSD]2013-01-29 21:49:37.965 [2996820880] >TRACE: clssgmUnregisterClient: client 0xa0429d0 expiring [ CSSD]2013-01-29 21:49:37.965 [2996820880] >WARNING: clssgmClientShutdown: graceful shutdown completed. [ CSSD]2013-01-29 21:49:37.978 [3032300432] >TRACE: clssnmvDoWork: type 7 for disk 0 [ CSSD]2013-01-29 21:49:37.992 [2996820880] >TRACE: clssnmSendManualShut: Notifying all nodes that this node has been manually shut down [ CSSD]2013-01-29 21:49:37.993 [3005213584] >TRACE: clssnmHandleManualShut: Node 2, rac2, has been manually shut down [ CSSD]2013-01-29 21:49:38.017 [2963250064] >TRACE: clssgmDispatchCMXMSG(): msg type(13) src(1) dest(2) size(148) tag(00000000) incarnation(3) [ CSSD]2013-01-29 21:49:38.017 [2963250064] >TRACE: clssgmHandleDataInvalid: grock ocr_crs, member 1 node 1, birth 3 [ CSSD]2013-01-29 21:49:38.484 [2954857360] >WARNING: clssnmPollingThread: state(3) clusterState(1) exit [ CSSD]2013-01-29 21:49:38.962 [2946464656] >WARNING: clssnmSendingThread: state(3) clusterState(0) exit [ CSSD]2013-01-29 21:49:38.994 [2996820880] >WARNING: clssscExit: CSSD terminating normally from thread GMClientListener
正常启动crs
[root@rac2 ~]# /opt/ora10g/product/crs/bin/crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly
主机messages记录如下:
[root@rac2 ~]# tail -f /var/log/messages Jan 29 21:49:37 rac2 logger: Oracle CSS family monitor shutting down gracefully. Jan 29 21:49:37 rac2 logger: Oracle clsomon shutdown successful. Jan 29 21:49:40 rac2 logger: Oracle CSSD graceful shutdown Jan 29 21:49:40 rac2 logger: Oprocd received graceful shutdown request. Shutting down. Jan 29 22:06:22 rac2 logger: Cluster Ready Services completed waiting on dependencies. --从此处是正常启动crs的消息记录开始 Jan 29 22:06:34 rac2 last message repeated 2 times Jan 29 22:06:34 rac2 logger: Oracle CSS Family monitor starting. Jan 29 22:06:35 rac2 logger: Running CRSD with TZ = Jan 29 22:06:36 rac2 logger: Oracle CSS restart. 0, 1 Jan 29 22:07:21 rac2 avahi-daemon[3700]: Registering new address record for 192.168.137.154 on eth0. --将资源注册到节点2的vip上
数据库节点的alert记录如下:
[oracle@rac2 bdump]$ tail -f alert*.log Tue Jan 29 21:48:50 CST 2013 Shutting down instance (abort) License high water mark = 7 Termination issued to instance processes. Waiting for the processes to exit Tue Jan 29 21:49:01 CST 2013 Instance termination failed to kill one or more processes Tue Jan 29 21:49:31 CST 2013 Instance terminated by USER, pid = 23459 Tue Jan 29 22:07:49 CST 2013 --从此处是开始正常启动crs后的消息记录 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface type 1 eth1 192.168.136.0 configured from OCR for use as a cluster interconnect Interface type 1 eth0 192.168.137.0 configured from OCR for use as a public interface Picked latch-free SCN scheme 2 LICENSE_MAX_USERS = 0 SYS auditing is disabled ksdpec: called for event 13740 prior to event group initialization Starting up ORACLE RDBMS Version: 10.2.0.5.0. System parameters with non-default values: processes = 150 __shared_pool_size = 155189248 __large_pool_size = 4194304 __java_pool_size = 4194304 __streams_pool_size = 0 spfile = +DATA/racdb/spfileracdb.ora sga_target = 603979776 control_files = +DATA/racdb/controlfile/current.276.796865453 db_block_size = 8192 __db_cache_size = 436207616 compatible = 10.2.0.5.0 log_archive_dest_1 = LOCATION=+DATA/ log_archive_format = %t_%s_%r.dbf db_file_multiblock_read_count= 16 cluster_database = TRUE cluster_database_instances= 2 db_create_file_dest = +DATA thread = 2 instance_number = 2 undo_management = AUTO undo_tablespace = UNDOTBS2 remote_login_passwordfile= EXCLUSIVE db_domain = dispatchers = (PROTOCOL=TCP) (SERVICE=racdbXDB) remote_listener = LISTENERS_RACDB job_queue_processes = 10 background_dump_dest = /opt/ora10g/admin/racdb/bdump user_dump_dest = /opt/ora10g/admin/racdb/udump core_dump_dest = /opt/ora10g/admin/racdb/cdump audit_file_dest = /opt/ora10g/admin/racdb/adump db_name = racdb open_cursors = 300 pga_aggregate_target = 200278016 Cluster communication is configured to use the following interface(s) for this instance 192.168.136.152 Tue Jan 29 22:07:52 CST 2013 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 PMON started with pid=2, OS id=31443 DIAG started with pid=3, OS id=31452 PSP0 started with pid=4, OS id=31457 LMON started with pid=5, OS id=31463 LMD0 started with pid=6, OS id=31508 LMS0 started with pid=7, OS id=31515 MMAN started with pid=8, OS id=31519 DBW0 started with pid=9, OS id=31521 LGWR started with pid=10, OS id=31528 CKPT started with pid=11, OS id=31530 SMON started with pid=12, OS id=31532 RECO started with pid=13, OS id=31534 CJQ0 started with pid=14, OS id=31541 MMON started with pid=15, OS id=31543 Tue Jan 29 22:07:58 CST 2013 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... MMNL started with pid=16, OS id=31545 Tue Jan 29 22:07:58 CST 2013 starting up 1 shared server(s) ... Tue Jan 29 22:07:58 CST 2013 lmon registered with NM - instance id 2 (internal mem no 1) Tue Jan 29 22:08:00 CST 2013 Reconfiguration started (old inc 0, new inc 8) List of nodes: 0 1 Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished * domain 0 valid = 1 according to instance 0 Tue Jan 29 22:08:00 CST 2013 Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Tue Jan 29 22:08:00 CST 2013 LMS 0: 0 GCS shadows cancelled, 0 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Tue Jan 29 22:08:00 CST 2013 LMS 0: 0 GCS shadows traversed, 0 replayed Tue Jan 29 22:08:00 CST 2013 Submitted all GCS remote-cache requests Fix write in gcs resources Reconfiguration complete LCK0 started with pid=19, OS id=31567 Tue Jan 29 22:08:02 CST 2013 ALTER DATABASE MOUNT Tue Jan 29 22:08:02 CST 2013 Starting background process ASMB ASMB started with pid=21, OS id=31578 Starting background process RBAL RBAL started with pid=22, OS id=31587 Tue Jan 29 22:08:09 CST 2013 SUCCESS: diskgroup DATA was mounted Tue Jan 29 22:08:13 CST 2013 Setting recovery target incarnation to 2 Tue Jan 29 22:08:14 CST 2013 Successful mount of redo thread 2, with mount id 803617404 Tue Jan 29 22:08:14 CST 2013 Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE) Completed: ALTER DATABASE MOUNT Tue Jan 29 22:08:17 CST 2013 ALTER DATABASE OPEN Picked broadcast on commit scheme to generate SCNs Tue Jan 29 22:08:18 CST 2013 LGWR: STARTING ARCH PROCESSES ARC0 started with pid=24, OS id=31660 Tue Jan 29 22:08:19 CST 2013 ARC0: Archival started ARC1: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC1 started with pid=25, OS id=31662 Tue Jan 29 22:08:19 CST 2013 Thread 2 opened at log sequence 23 Current log# 3 seq# 23 mem# 0: +DATA/racdb/onlinelog/group_3.272.796865777 Successful open of redo thread 2 Tue Jan 29 22:08:19 CST 2013 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Tue Jan 29 22:08:19 CST 2013 SMON: enabling cache recovery Tue Jan 29 22:08:19 CST 2013 ARC0: Becoming the 'no FAL' ARCH ARC0: Becoming the 'no SRL' ARCH Tue Jan 29 22:08:19 CST 2013 ARC1: Becoming the heartbeat ARCH Tue Jan 29 22:08:22 CST 2013 Successfully onlined Undo Tablespace 5. Tue Jan 29 22:08:22 CST 2013 SMON: enabling tx recovery Tue Jan 29 22:08:22 CST 2013 Database Characterset is WE8ISO8859P1 Opening with internal Resource Manager plan replication_dependency_tracking turned off (no async multimaster replication found) Tue Jan 29 22:08:33 CST 2013 Starting background process QMNC QMNC started with pid=29, OS id=31772 Tue Jan 29 22:08:36 CST 2013 Completed: ALTER DATABASE OPEN
ocssd.log记录如下:
[ CSSD]2013-01-29 21:49:38.994 [2996820880] >WARNING: clssscExit: CSSD terminating normally from thread GMClientListener [ CSSD]2013-01-29 22:06:50.242 >USER: Copyright 2013, Oracle version 10.2.0.5.0 --从此处开始记录用户正常手动启动crs消息开始 [ CSSD]2013-01-29 22:06:50.242 >USER: Starting CSS daemon on node rac2, number 2, in cluster crs [ CSSD]2013-01-29 22:06:50.256 [3060074704] >TRACE: clssscmain: RT queue setting is at default value [ CSSD]2013-01-29 22:06:50.308 [3060074704] >TRACE: clssscmain: local-only set to false [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CSSD)) [ CSSD]2013-01-29 22:06:50.416 [3060074704] >TRACE: clssnmReadNodeInfo: added node 1 (rac1) to cluster [ CSSD]2013-01-29 22:06:50.562 [3060074704] >TRACE: clssnmReadNodeInfo: added node 2 (rac2) to cluster [ CSSD]2013-01-29 22:06:50.573 [3060074704] >TRACE: clssnmInitNMInfo: Initialized with unique 1359468409 [ CSSD]2013-01-29 22:06:50.631 [3060074704] >TRACE: clssNMInitialize: Initializing with OCR id (1464966774) [ CSSD]2013-01-29 22:06:51.840 [3033107344] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use [ CSSD]2013-01-29 22:06:51.906 [3060074704] >TRACE: clssnmNMInitialize: misscount set to (60) [ CSSD]2013-01-29 22:06:51.922 [3060074704] >TRACE: clssnmStartNM: reboottime set to (3) sec [ CSSD]2013-01-29 22:06:51.922 [3060074704] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig start (misscount) 60000 ms [ CSSD]2013-01-29 22:06:51.996 [3060074704] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2) [ CSSD]2013-01-29 22:06:51.997 [3033107344] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/raw/raw2) [ CSSD]2013-01-29 22:06:52.024 [3033107344] >TRACE: clssnmvDiskOpen: Overwrote kill block for voting disk /dev/raw/raw2 [ CSSD]2013-01-29 22:06:54.053 [3033107344] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2) [ CSSD]2013-01-29 22:06:54.488 [3060074704] >TRACE: clssnmFatalInit: fatal mode enabled [ CSSD]2013-01-29 22:06:54.800 [3024034704] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw2) initial sleep interval (1000)ms 中间省略部分输出... ... [ CSSD]2013-01-29 22:18:56.229 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:18:56.230 [2946464656] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-01-29 22:19:00.234 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:19:00.235 [2946464656] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-01-29 22:19:04.975 [2996820880] >TRACE: clssgmAllocProc: (0x93df420) allocated [ CSSD]2013-01-29 22:19:04.979 [2980035472] >TRACE: Connect request from user root [ CSSD]2013-01-29 22:19:04.981 [2996820880] >TRACE: clssgmClientConnectMsg: Connect from con(0x93e0778) proc(0x93df420) pid() proto(10:2:1:1) [ CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE: clssgmDispatchCMXMSG(): msg type(5) src(1) dest(65535) size(416) tag(00000000) incarnation(5) [ CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE: clssgmHandleJoinUpdate: (src 1/1) grock SRVM.DATABASE.NODEAPPS.rac1, member 0 [ CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE: clssgmAddMember: granted member(0) flags(0x1001) node(1) grock (0x93dd748/SRVM.DATABASE.NODEAPPS.rac1) [ CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE: clssgmCommonAddMember: Remote member(0) node(1) flags 0x1001 0x1000001 grock (3/0x93dd748/SRVM.DATABASE.NODEAPPS.rac1) [ CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE: clssgmDispatchCMXMSG(): msg type(6) src(1) dest(65535) size(344) tag(00000000) incarnation(5) [ CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE: clssgmHandleExitUpdate: (src 1) grock SRVM.DATABASE.NODEAPPS.rac1, member 0 [ CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.rac1) member(0/0x93dcc18) nodeNum(1) flags(0x1001) type(3) [ CSSD]2013-01-29 22:19:05.009 [2996820880] >TRACE: clssgmRegisterClient: proc(21/0x93df420), client(1/0x93dedb0) [ CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE: clssgmExecuteClientRequest: GRKJOIN recvd from client 1 (0x93dedb0) [ CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE: clssgmJoinGrock: grock SRVM.DATABASE.NODEAPPS.rac2 new client 0x93dedb0 with con 0x93dc998, requested num -1 [ CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.rac2 [ CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE: clssgmAllocateRPCIndex: allocated rpc 246 (0xb4c9d910) [ CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE: clssgmRPC: rpc 0xb4c9d910 (RPC#246) tag(f6002a) sent to node 1 [ CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE: clssgmDispatchCMXMSG(): msg type(5) src(1) dest(65535) size(416) tag(00000000) incarnation(5) [ CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE: clssgmHandleJoinUpdate: (src 2/1) grock SRVM.DATABASE.NODEAPPS.rac2, member 0 [ CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE: clssgmAddMember: granted member(0) flags(0x1) node(2) grock (0x93dcd50/SRVM.DATABASE.NODEAPPS.rac2) [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmQueueGrockEvent: lockName(SRVM.DATABASE.NODEAPPS.rac2) type(2) count (1/1) xwaiters(0) event(1) to memberNo(0) [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmCommonAddMember: Local member(0) node(2) flags 0x1 0x1000001 grock (3/0x93dcd50/SRVM.DATABASE.NODEAPPS.rac2) [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmRPCDone: rpc 0xb4c9d910 (RPC#246) state 6, flags 0x100 [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmAddGrockMemCmpl: rpc 0xb4c9d910, ret 0, client 0x93dedb0 [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmAddGrockMemCmpl: sending type 6, size 532 to 0x93dedb0 [ CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE: clssgmFreeRPCIndex: freeing rpc 246 [ CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE: clssgmExecuteClientRequest: GRKEXIT recvd from client 1 (0x93dedb0) [ CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE: clssgmExitGrock: client 1 (0x93dedb0), grock SRVM.DATABASE.NODEAPPS.rac2, member 0 [ CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE: clssgmUnregisterClient(): removing proc 21 client 1, flags 0x04000000 [ CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE: clssgmAllocateRPCIndex: allocated rpc 247 (0xb4c9d95c) [ CSSD]2013-01-29 22:19:05.043 [2996820880] >TRACE: clssgmRPC: rpc 0xb4c9d95c (RPC#247) tag(f7002a) sent to node 1 [ CSSD]2013-01-29 22:19:05.043 [2996820880] >TRACE: clssgmUnregisterClient: client 0x93dedb0 expiring [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmDispatchCMXMSG(): msg type(6) src(1) dest(65535) size(344) tag(00000000) incarnation(5) [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmHandleExitUpdate: (src 1) grock SRVM.DATABASE.NODEAPPS.rac2, member 0 [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmRPCDone: rpc 0xb4c9d95c (RPC#247) state 6, flags 0x100 [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmDelMemCompl: rpc 0xb4c9d95c, ret 0, client 0x93dedb0 [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmFreeRPCIndex: freeing rpc 247 [ CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.rac2) member(0/0x93e3dc0) nodeNum(2) flags(0x1) type(3) [ CSSD]2013-01-29 22:19:05.243 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:19:05.244 [2946464656] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE: clssgmDeadProc: proc 0x93df420 [ CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE: clssgmDeleteClientListener: deleting cmProc (0x93df420), with 0 clients [ CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE: clssgmDeleteClientListener: cleanup for proc(0x93df420) con(0x93e0778) pid() [ CSSD]2013-01-29 22:19:09.249 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:19:09.250 [2946464656] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-01-29 22:19:14.255 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:19:14.256 [2946464656] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-01-29 22:19:18.261 [2946464656] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-01-29 22:19:18.262 [2946464656] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes 下边省略正常trace消息..
crsd.log crs停止和启动的整个过程记录如下:
2013-01-29 21:08:49.058: [ CRSRES][2617236368]0CRS-1002: Resource 'ora.racdb.db' is already running on member 'rac2' 2013-01-29 21:09:40.287: [ CRSRES][2617236368]0StopResource: setting CLI values --此处开始人为停止crs后的消息记录,包括停止所有资源的消息记录 2013-01-29 21:48:44.714: [ CRSD][2608843664]0SM: Recovery processing done.0 2013-01-29 21:48:44.715: [ CRSD][2608843664]0SM: Ongoing commands done.0 2013-01-29 21:48:44.759: [ CRSD][2608843664]0SM: asked E2E to exit. 2013-01-29 21:48:44.760: [ CRSCOMM][2736737168]0CLEANUP: quiesceing connection 0x9e7c670 2013-01-29 21:48:44.760: [ CRSMAIN][2736737168]0runCommandServer for E2E exiting. 2013-01-29 21:48:44.761: [ CRSD][2608843664]0SM: E2E port closed.3 2013-01-29 21:48:44.761: [ CRSD][2608843664]0SM: stoppedResources: all =1 m_state = 3 2013-01-29 21:48:44.814: [ CRSRES][2583665552]0StopResource: setting CLI values 2013-01-29 21:48:44.836: [ CRSRES][2617236368]0StopResource: setting CLI values 2013-01-29 21:48:44.872: [ CRSRES][2736737168]0StopResource: setting CLI values 2013-01-29 21:48:44.951: [ CRSRES][2583665552]0Attempting to stop `ora.rac2.gsd` on member `rac2` 2013-01-29 21:48:45.156: [ CRSRES][2617236368]0Attempting to stop `ora.rac2.ons` on member `rac2` 2013-01-29 21:48:45.384: [ CRSRES][2736737168]0Attempting to stop `ora.racdb.db` on member `rac2` 2013-01-29 21:48:46.961: [ CRSRES][2736737168]0Stop of `ora.racdb.db` on member `rac2` succeeded. 2013-01-29 21:48:47.430: [ CRSRES][2583665552]0Stop of `ora.rac2.gsd` on member `rac2` succeeded. 2013-01-29 21:48:47.834: [ CRSRES][2617236368]0Stop of `ora.rac2.ons` on member `rac2` succeeded. 2013-01-29 21:48:47.907: [ CRSRES][2617236368]0StopResource: setting CLI values 2013-01-29 21:48:47.917: [ CRSRES][2583665552]0StopResource: setting CLI values 2013-01-29 21:48:48.023: [ CRSRES][2617236368]0Attempting to stop `ora.racdb.racdb2.inst` on member `rac2` 2013-01-29 21:48:48.175: [ CRSRES][2583665552]0Attempting to stop `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` 2013-01-29 21:48:50.604: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began. 2013-01-29 21:48:50.931: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began. 2013-01-29 21:48:51.281: [ CRSRES][2583665552]0Stop of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded. 2013-01-29 21:48:51.320: [ CRSRES][2583665552]0StopResource: setting CLI values 2013-01-29 21:48:51.604: [ CRSRES][2583665552]0StopResource: setting CLI values 2013-01-29 21:48:51.723: [ CRSRES][2583665552]0Attempting to stop `ora.rac2.vip` on member `rac2` 2013-01-29 21:48:52.400: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began. 2013-01-29 21:48:53.398: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began. 2013-01-29 21:48:55.982: [ CRSRES][2583665552]0Stop of `ora.rac2.vip` on member `rac2` succeeded. 2013-01-29 21:49:32.346: [ CRSRES][2617236368]0Stop of `ora.racdb.racdb2.inst` on member `rac2` succeeded. 2013-01-29 21:49:32.380: [ CRSRES][2617236368]0StopResource: setting CLI values 2013-01-29 21:49:32.656: [ CRSRES][2617236368]0StopResource: setting CLI values 2013-01-29 21:49:32.778: [ CRSRES][2617236368]0Attempting to stop `ora.rac2.ASM2.asm` on member `rac2` 2013-01-29 21:49:36.878: [ CRSRES][2617236368]0Stop of `ora.rac2.ASM2.asm` on member `rac2` succeeded. 2013-01-29 21:49:37.089: [ CSSCLNT][2956516240]clssgsGGetStatus: CSS shutting down. 2013-01-29 21:49:37.089: [ CSSCLNT][2956516240]clssgsGGetStatus: returning 22 2013-01-29 21:49:37.111: [ OCRMAS][2956516240]th_master:21:clssgsgrpstat failed css ret code = 22 [ OCRAPI][2956516240]procr_ctx_set_invalid_no_abort: ctx set to invalid 2013-01-29 21:49:37.167: [ CRSEVT][2719951760]0Error dispatching EVM event; reconnecting 2013-01-29 21:49:37.181: [ CSSCLNT][2812271504]clssgsGGetStatus: CSS shutting down. 2013-01-29 21:49:37.182: [ CSSCLNT][2812271504]clssgsGGetStatus: returning 22 2013-01-29 21:49:37.182: [ CRSD][2812271504]0CRSD exiting on CSS shutdown request 2013-01-29 21:49:37.182: [ CRSD][2812271504]0Done. --节点2的资源停完后,crs此时停完 2013-01-29 22:06:35.879: [ default][3058108096][ENTER]0 --此处开始记录crs正常人为启动的开始时的消息记录 Oracle Database 10g CRS Release 10.2.0.5.0 Production Copyright 1996, 2004, Oracle. All rights reserved 2013-01-29 22:06:35.882: [ default][3058108096]0CRS Daemon Starting 2013-01-29 22:06:35.894: [ CRSMAIN][3058108096]0Checking the OCR device 2013-01-29 22:06:36.069: [ CRSMAIN][3058108096]0Connecting to the CSS Daemon 2013-01-29 22:06:36.278: [ COMMCRS][3057531792]clsc_connect: (0xa402d80) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_)) 2013-01-29 22:06:36.279: [ CSSCLNT][3058108096]clsssInitNative: connect failed, rc 9 中间省略重复输出... 2013-01-29 22:06:52.662: [ CRSRTI][3058108096]0CSS is not ready. Received status 3 from CSS. Waiting for good status .. 2013-01-29 22:06:53.844: [ COMMCRS][3057531792]clsc_connect: (0xa402dd8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_)) 2013-01-29 22:06:53.844: [ CSSCLNT][3058108096]clsssInitNative: connect failed, rc 9 2013-01-29 22:06:53.845: [ CRSRTI][3058108096]0CSS is not ready. Received status 3 from CSS. Waiting for good status .. 2013-01-29 22:06:58.462: [ CLSVER][3058108096]0Active Version from OCR:10.2.0.5.0 2013-01-29 22:06:58.463: [ CLSVER][3058108096]0Active Version and Software Version are same 2013-01-29 22:06:58.463: [ CRSMAIN][3058108096]0Initializing OCR 2013-01-29 22:06:58.654: [ OCRRAW][3058108096]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets (1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes (2) 2013-01-29 22:06:59.104: [ CRSD][3058108096]0ENV Logging level for Module: allcomp 0 2013-01-29 22:06:59.174: [ CRSD][3058108096]0ENV Logging level for Module: default 0 2013-01-29 22:06:59.267: [ CRSD][3058108096]0ENV Logging level for Module: COMMCRS 0 2013-01-29 22:06:59.323: [ CRSD][3058108096]0ENV Logging level for Module: COMMNS 0 2013-01-29 22:06:59.347: [ CRSD][3058108096]0ENV Logging level for Module: CRSUI 0 2013-01-29 22:06:59.366: [ CRSD][3058108096]0ENV Logging level for Module: CRSCOMM 0 2013-01-29 22:06:59.388: [ CRSD][3058108096]0ENV Logging level for Module: CRSRTI 0 2013-01-29 22:06:59.414: [ CRSD][3058108096]0ENV Logging level for Module: CRSMAIN 0 2013-01-29 22:06:59.430: [ CRSD][3058108096]0ENV Logging level for Module: CRSPLACE 0 2013-01-29 22:06:59.444: [ CRSD][3058108096]0ENV Logging level for Module: CRSAPP 0 2013-01-29 22:06:59.461: [ CRSD][3058108096]0ENV Logging level for Module: CRSRES 0 2013-01-29 22:06:59.480: [ CRSD][3058108096]0ENV Logging level for Module: CRSOCR 0 2013-01-29 22:06:59.528: [ CRSD][3058108096]0ENV Logging level for Module: CRSTIMER 0 2013-01-29 22:06:59.563: [ CRSD][3058108096]0ENV Logging level for Module: CRSEVT 0 2013-01-29 22:06:59.577: [ CRSD][3058108096]0ENV Logging level for Module: CRSD 0 2013-01-29 22:06:59.591: [ CRSD][3058108096]0ENV Logging level for Module: CLUCLS 0 2013-01-29 22:06:59.625: [ CRSD][3058108096]0ENV Logging level for Module: CLSVER 0 2013-01-29 22:06:59.642: [ CRSD][3058108096]0ENV Logging level for Module: OCRRAW 0 2013-01-29 22:06:59.659: [ CRSD][3058108096]0ENV Logging level for Module: OCROSD 0 2013-01-29 22:06:59.686: [ CRSD][3058108096]0ENV Logging level for Module: CSSCLNT 0 2013-01-29 22:06:59.721: [ CRSD][3058108096]0ENV Logging level for Module: OCRAPI 0 2013-01-29 22:06:59.738: [ CRSD][3058108096]0ENV Logging level for Module: OCRUTL 0 2013-01-29 22:06:59.760: [ CRSD][3058108096]0ENV Logging level for Module: OCRMSG 0 2013-01-29 22:06:59.772: [ CRSD][3058108096]0ENV Logging level for Module: OCRCLI 0 2013-01-29 22:06:59.787: [ CRSD][3058108096]0ENV Logging level for Module: OCRCAC 0 2013-01-29 22:06:59.803: [ CRSD][3058108096]0ENV Logging level for Module: OCRSRV 0 2013-01-29 22:06:59.818: [ CRSD][3058108096]0ENV Logging level for Module: OCRMAS 0 2013-01-29 22:06:59.819: [ CRSMAIN][3058108096]0Filename is /opt/ora10g/product/crs/crs/init/rac2.pid [ clsdmt][2805750672]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD)) 2013-01-29 22:07:00.035: [ CRSMAIN][3058108096]0Using Authorizer location: /opt/ora10g/product/crs/crs/auth/ 2013-01-29 22:07:00.168: [ CRSMAIN][3058108096]0Initializing RTI 2013-01-29 22:07:00.492: [CRSTIMER][2788965264]0Timer Thread Starting. 2013-01-29 22:07:00.504: [ CRSRES][3058108096]0Parameter SECURITY = 1, running in USER Mode 2013-01-29 22:07:00.504: [ CRSMAIN][3058108096]0Initializing EVMMgr 2013-01-29 22:07:00.695: [ COMMCRS][2780572560]clsc_connect: (0xa893e10) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)) 2013-01-29 22:07:01.124: [ COMMCRS][2780572560]clsc_connect: (0xa88ff28) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)) 2013-01-29 22:06:59.760: [ CRSD][3058108096]0ENV Logging level for Module: OCRMSG 0 2013-01-29 22:06:59.772: [ CRSD][3058108096]0ENV Logging level for Module: OCRCLI 0 2013-01-29 22:06:59.787: [ CRSD][3058108096]0ENV Logging level for Module: OCRCAC 0 2013-01-29 22:06:59.803: [ CRSD][3058108096]0ENV Logging level for Module: OCRSRV 0 2013-01-29 22:06:59.818: [ CRSD][3058108096]0ENV Logging level for Module: OCRMAS 0 2013-01-29 22:06:59.819: [ CRSMAIN][3058108096]0Filename is /opt/ora10g/product/crs/crs/init/rac2.pid [ clsdmt][2805750672]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD)) 2013-01-29 22:07:00.035: [ CRSMAIN][3058108096]0Using Authorizer location: /opt/ora10g/product/crs/crs/auth/ 2013-01-29 22:07:00.168: [ CRSMAIN][3058108096]0Initializing RTI 2013-01-29 22:07:00.492: [CRSTIMER][2788965264]0Timer Thread Starting. 2013-01-29 22:07:00.504: [ CRSRES][3058108096]0Parameter SECURITY = 1, running in USER Mode 2013-01-29 22:07:00.504: [ CRSMAIN][3058108096]0Initializing EVMMgr 2013-01-29 22:07:00.695: [ COMMCRS][2780572560]clsc_connect: (0xa893e10) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)) 2013-01-29 22:07:01.124: [ COMMCRS][2780572560]clsc_connect: (0xa88ff28) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)) 2013-01-29 22:07:05.230: [ CRSMAIN][3058108096]0CRSD locked during state recovery, please wait. 2013-01-29 22:07:07.642: [ CRSMAIN][3058108096]0CRSD recovered, unlocked. 2013-01-29 22:07:07.663: [ CRSMAIN][3058108096]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs)) 2013-01-29 22:07:07.663: [ CRSMAIN][3058108096]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs)) 2013-01-29 22:07:07.787: [ CRSMAIN][3058108096]0CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET)) 2013-01-29 22:07:07.811: [ CRSMAIN][3058108096]0E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=rac2-priv)(PORT=49896)) 2013-01-29 22:07:07.811: [ CRSMAIN][3058108096]0Starting Threads 2013-01-29 22:07:07.812: [ CRSMAIN][3058108096]0CRS Daemon Started. --crs启动完毕,开始启动资源 2013-01-29 22:07:07.815: [ CRSMAIN][2730216336]0Starting runCommandServer for (UI = 1, E2E = 0). 1 2013-01-29 22:07:07.821: [ CRSMAIN][2738609040]0Starting runCommandServer for (UI = 1, E2E = 0). 0 2013-01-29 22:07:08.943: [ CRSRES][3058108096]0 startup = 1 2013-01-29 22:07:09.348: [ CRSRES][3058108096]0 startup = 1 2013-01-29 22:07:09.760: [ CRSRES][3058108096]0 startup = 1 2013-01-29 22:07:10.489: [ CRSRES][3058108096]0 startup = 1 2013-01-29 22:07:11.251: [ CRSRES][3058108096]0 startup = 1 2013-01-29 22:07:11.729: [ CRSRES][2696645520]0StopResource: setting CLI values 2013-01-29 22:07:12.891: [ CRSRES][2705038224]0startRunnable: setting CLI values 2013-01-29 22:07:12.925: [ CRSRES][2696645520]0Attempting to stop `ora.rac2.vip` on member `rac1` 2013-01-29 22:07:13.111: [ CRSRES][2705038224]0Attempting to start `ora.rac2.ASM2.asm` on member `rac2` 2013-01-29 22:07:15.414: [ CRSRES][2696645520]0Stop of `ora.rac2.vip` on member `rac1` succeeded. 2013-01-29 22:07:15.604: [ CRSRES][2696645520]0startRunnable: setting CLI values 2013-01-29 22:07:15.604: [ CRSRES][2696645520]0Attempting to start `ora.rac2.vip` on member `rac2` 2013-01-29 22:07:22.250: [ CRSRES][2696645520]0Start of `ora.rac2.vip` on member `rac2` succeeded. 2013-01-29 22:07:22.977: [ CRSRES][2696645520]0startRunnable: setting CLI values 2013-01-29 22:07:23.504: [ CRSRES][2696645520]0Attempting to start `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` 2013-01-29 22:07:29.340: [ CRSRES][2696645520]0Start of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded. 2013-01-29 22:07:29.982: [ CRSRES][2661272464]0CRS-1002: Resource 'ora.rac2.LISTENER_RAC2.lsnr' is already running on member 'rac2' 2013-01-29 22:07:37.688: [ CRSRES][2661272464]0startRunnable: setting CLI values 2013-01-29 22:07:38.045: [ CRSRES][2661272464]0Attempting to start `ora.rac2.ons` on member `rac2` 2013-01-29 22:07:41.373: [ CRSRES][2661272464]0Start of `ora.rac2.ons` on member `rac2` succeeded. 2013-01-29 22:07:41.527: [ CRSRES][2705038224]0Start of `ora.rac2.ASM2.asm` on member `rac2` succeeded. 2013-01-29 22:07:42.692: [ CRSRES][2705038224]0startRunnable: setting CLI values 2013-01-29 22:07:42.785: [ CRSRES][2705038224]0Attempting to start `ora.racdb.racdb2.inst` on member `rac2` 2013-01-29 22:08:46.144: [ CRSRES][2705038224]0Start of `ora.racdb.racdb2.inst` on member `rac2` succeeded. 2013-01-29 22:08:46.188: [ CRSRES][2705038224]0Skip online resource: ora.rac2.ons 2013-01-29 22:08:46.907: [ CRSRES][2696645520]0startRunnable: setting CLI values 2013-01-29 22:08:46.998: [ CRSRES][2696645520]0Attempting to start `ora.rac2.gsd` on member `rac2` 2013-01-29 22:08:48.847: [ CRSRES][2696645520]0Start of `ora.rac2.gsd` on member `rac2` succeeded. 2013-01-29 22:08:51.321: [ OCRUTL][2864499600]u_freem: mem passed is null
以上就是一个正常的人为停止一个节点的crs的消息日志记录,便于大家进一步了解停止和启动crs的整个过程以及以后做对比使用。