A-A+

如何定位rac节点重启原因(1)–正常停止crs,正常的日志消息记录

2013年01月29日 DG&RAC&OGG 暂无评论 阅读 3,303 次

这篇是给以后该系列文章的一个基础,来说明rac中节点重启的诊断方式和日志是如何记录及输出的。

能够引起节点重启的进程相关如下:
1、首先查看ocss守护进程是否存在
该进程属于oracle主要监控rdbms实例是否运行正常,以及在该节点实例是否可见,通过初始的脚本将节点kill。由inittab中的/etc/init.d/init.cssd启动。

[oracle@rac1 ~]$ ps -ef | grep ocss | grep -v grep
oracle    5463  4869  2 11:02 ?        00:00:14 /opt/ora10g/product/crs/bin/ocssd.bin
[root@rac1 ~]# ps -ef | grep fatal | grep -v grep
root      4062     1  1 14:35 ?        00:00:02 /bin/sh /etc/init.d/init.cssd fatal

2、进程oprocd(10.2.0.4以后才有)
该进程主要是监测节点的硬件状态,如果一个主机资源无法释放,或者主机资源耗尽未给oracle可用资源,则该进程会将节点kill

[root@rac1 ~]# ps -ef | grep oprocd | grep -v grep
root      4984  4054  0 14:42 ?        00:00:00 /bin/sh /etc/init.d/init.cssd oprocd
root      5534  4984  0 14:43 ?        00:00:00 /opt/ora10g/product/crs/bin/oprocd.bin run -t 1000 -m 500 -f

3、进程OCLSOMON(10.2.0.2以后才有)
该进程主要用来监控css进程是否hang住,一旦发现出现文件,就会将节点重启。除非是一个oracle自己可感知的hang信息。

[root@rac1 ~]# ps -ef | grep oclsomon | grep -v grep
root      4994  4054  0 14:42 ?        00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon
root      5539  4994  0 14:43 ?        00:00:00 /sbin/runuser -l oracle -c /bin/sh -c 'cd /opt/ora10g/product/crs/log/rac1/cssd/oclsomon; ulimit -c unlimited; /opt/ora10g/product/crs/bin/oclsomon  || exit $?'
oracle    5540  5539  0 14:43 ?        00:00:00 /bin/sh -c cd /opt/ora10g/product/crs/log/rac1/cssd/oclsomon; ulimit -c unlimited; /opt/ora10g/product/crs/bin/oclsomon  || exit $?
oracle    5572  5540  0 14:43 ?        00:00:00 /opt/ora10g/product/crs/bin/oclsomon.bin

定位步骤
1、查看各节点主机messages信息
* Messages file locations:
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
Tru64: /var/adm/messages
Linux: /var/log/messages
IBM: /bin/errpt -a > messages.out

2、查看各个节点的css进程日志
** CSS log locations:
11.1 and 10.2: /log//cssd
10.1: /css/log

3、查看各个节点的Oprocd 进程日志
*** Oprocd log locations:
In /etc/oracle/oprocd or /var/opt/oracle/oprocd depending on version/platform.

情景1:人为停掉其中一个节点的crs,看该节点的日志消息记录情况。

[root@rac2 ~]# /opt/ora10g/product/crs/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

主机messages记录如下:

[root@rac2 ~]# tail -f /var/log/messages
Jan 29 21:44:40 rac2 kernel: hrtimer: interrupt took 3370332 ns
Jan 29 21:48:55 rac2 avahi-daemon[3700]: Withdrawing address record for 192.168.137.154 on eth0.
Jan 29 21:49:37 rac2 logger: Oracle CSS family monitor shutting down gracefully.
Jan 29 21:49:37 rac2 logger: Oracle clsomon shutdown successful.
Jan 29 21:49:40 rac2 logger: Oracle CSSD graceful shutdown
Jan 29 21:49:40 rac2 logger: Oprocd received graceful shutdown request. Shutting down.

数据库节点的alert记录如下:

[oracle@rac2 bdump]$ tail -f alert*.log
Tue Jan 29 21:48:50 CST 2013
Shutting down instance (abort)		--是crs将资源(instance)直接shutdown abort
License high water mark = 7
Termination issued to instance processes. Waiting for the processes to exit
Tue Jan 29 21:49:01 CST 2013
Instance termination failed to kill one or more processes
Tue Jan 29 21:49:31 CST 2013
Instance terminated by USER, pid = 23459

节点ocss日志记录如下:

[root@rac2 cssd]# tail -f ocssd.log 
[    CSSD]2013-01-29 21:49:37.965 [2996820880] >TRACE:   clssgmUnregisterClient: client 0xa0429d0 expiring
[    CSSD]2013-01-29 21:49:37.965 [2996820880] >WARNING: clssgmClientShutdown: graceful shutdown completed.
[    CSSD]2013-01-29 21:49:37.978 [3032300432] >TRACE:   clssnmvDoWork: type 7 for disk 0
[    CSSD]2013-01-29 21:49:37.992 [2996820880] >TRACE:   clssnmSendManualShut: Notifying all nodes that this node has been manually shut down
[    CSSD]2013-01-29 21:49:37.993 [3005213584] >TRACE:   clssnmHandleManualShut: Node 2, rac2, has been manually shut down
[    CSSD]2013-01-29 21:49:38.017 [2963250064] >TRACE:   clssgmDispatchCMXMSG(): msg type(13) src(1) dest(2) size(148) tag(00000000) incarnation(3)
[    CSSD]2013-01-29 21:49:38.017 [2963250064] >TRACE:   clssgmHandleDataInvalid: grock ocr_crs, member 1 node 1, birth 3
[    CSSD]2013-01-29 21:49:38.484 [2954857360] >WARNING: clssnmPollingThread: state(3) clusterState(1) exit
[    CSSD]2013-01-29 21:49:38.962 [2946464656] >WARNING: clssnmSendingThread: state(3) clusterState(0) exit
[    CSSD]2013-01-29 21:49:38.994 [2996820880] >WARNING: clssscExit: CSSD terminating normally from thread GMClientListener

正常启动crs

[root@rac2 ~]# /opt/ora10g/product/crs/bin/crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

主机messages记录如下:

[root@rac2 ~]# tail -f /var/log/messages
Jan 29 21:49:37 rac2 logger: Oracle CSS family monitor shutting down gracefully.
Jan 29 21:49:37 rac2 logger: Oracle clsomon shutdown successful.
Jan 29 21:49:40 rac2 logger: Oracle CSSD graceful shutdown
Jan 29 21:49:40 rac2 logger: Oprocd received graceful shutdown request. Shutting down.		
Jan 29 22:06:22 rac2 logger: Cluster Ready Services completed waiting on dependencies.		--从此处是正常启动crs的消息记录开始
Jan 29 22:06:34 rac2 last message repeated 2 times
Jan 29 22:06:34 rac2 logger: Oracle CSS Family monitor starting.
Jan 29 22:06:35 rac2 logger: Running CRSD with TZ = 
Jan 29 22:06:36 rac2 logger: Oracle CSS restart. 0, 1
Jan 29 22:07:21 rac2 avahi-daemon[3700]: Registering new address record for 192.168.137.154 on eth0.		--将资源注册到节点2的vip上

数据库节点的alert记录如下:

[oracle@rac2 bdump]$ tail -f alert*.log
Tue Jan 29 21:48:50 CST 2013
Shutting down instance (abort)
License high water mark = 7
Termination issued to instance processes. Waiting for the processes to exit
Tue Jan 29 21:49:01 CST 2013
Instance termination failed to kill one or more processes
Tue Jan 29 21:49:31 CST 2013
Instance terminated by USER, pid = 23459
Tue Jan 29 22:07:49 CST 2013		--从此处是开始正常启动crs后的消息记录
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 192.168.136.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 192.168.137.0 configured from OCR for use as  a public interface
Picked latch-free SCN scheme 2
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
  processes                = 150
  __shared_pool_size       = 155189248
  __large_pool_size        = 4194304
  __java_pool_size         = 4194304
  __streams_pool_size      = 0
  spfile                   = +DATA/racdb/spfileracdb.ora
  sga_target               = 603979776
  control_files            = +DATA/racdb/controlfile/current.276.796865453
  db_block_size            = 8192
  __db_cache_size          = 436207616
  compatible               = 10.2.0.5.0
  log_archive_dest_1       = LOCATION=+DATA/
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  cluster_database         = TRUE
  cluster_database_instances= 2
  db_create_file_dest      = +DATA
  thread                   = 2
  instance_number          = 2
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS2
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=racdbXDB)
  remote_listener          = LISTENERS_RACDB
  job_queue_processes      = 10
  background_dump_dest     = /opt/ora10g/admin/racdb/bdump
  user_dump_dest           = /opt/ora10g/admin/racdb/udump
  core_dump_dest           = /opt/ora10g/admin/racdb/cdump
  audit_file_dest          = /opt/ora10g/admin/racdb/adump
  db_name                  = racdb
  open_cursors             = 300
  pga_aggregate_target     = 200278016
Cluster communication is configured to use the following interface(s) for this instance
  192.168.136.152
Tue Jan 29 22:07:52 CST 2013
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=31443
DIAG started with pid=3, OS id=31452
PSP0 started with pid=4, OS id=31457
LMON started with pid=5, OS id=31463
LMD0 started with pid=6, OS id=31508
LMS0 started with pid=7, OS id=31515
MMAN started with pid=8, OS id=31519
DBW0 started with pid=9, OS id=31521
LGWR started with pid=10, OS id=31528
CKPT started with pid=11, OS id=31530
SMON started with pid=12, OS id=31532
RECO started with pid=13, OS id=31534
CJQ0 started with pid=14, OS id=31541
MMON started with pid=15, OS id=31543
Tue Jan 29 22:07:58 CST 2013
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=16, OS id=31545
Tue Jan 29 22:07:58 CST 2013
starting up 1 shared server(s) ...
Tue Jan 29 22:07:58 CST 2013
lmon registered with NM - instance id 2 (internal mem no 1)
Tue Jan 29 22:08:00 CST 2013
Reconfiguration started (old inc 0, new inc 8)
List of nodes:
 0 1
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE 
 Communication channels reestablished
 * domain 0 valid = 1 according to instance 0 
Tue Jan 29 22:08:00 CST 2013
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue Jan 29 22:08:00 CST 2013
 LMS 0: 0 GCS shadows cancelled, 0 closed
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
Tue Jan 29 22:08:00 CST 2013
 LMS 0: 0 GCS shadows traversed, 0 replayed
Tue Jan 29 22:08:00 CST 2013
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=19, OS id=31567
Tue Jan 29 22:08:02 CST 2013
ALTER DATABASE   MOUNT
Tue Jan 29 22:08:02 CST 2013
Starting background process ASMB
ASMB started with pid=21, OS id=31578
Starting background process RBAL
RBAL started with pid=22, OS id=31587
Tue Jan 29 22:08:09 CST 2013
SUCCESS: diskgroup DATA was mounted
Tue Jan 29 22:08:13 CST 2013
Setting recovery target incarnation to 2
Tue Jan 29 22:08:14 CST 2013
Successful mount of redo thread 2, with mount id 803617404
Tue Jan 29 22:08:14 CST 2013
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE   MOUNT
Tue Jan 29 22:08:17 CST 2013
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Tue Jan 29 22:08:18 CST 2013
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=24, OS id=31660
Tue Jan 29 22:08:19 CST 2013
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=25, OS id=31662
Tue Jan 29 22:08:19 CST 2013
Thread 2 opened at log sequence 23
  Current log# 3 seq# 23 mem# 0: +DATA/racdb/onlinelog/group_3.272.796865777
Successful open of redo thread 2
Tue Jan 29 22:08:19 CST 2013
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jan 29 22:08:19 CST 2013
SMON: enabling cache recovery
Tue Jan 29 22:08:19 CST 2013
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Tue Jan 29 22:08:19 CST 2013
ARC1: Becoming the heartbeat ARCH
Tue Jan 29 22:08:22 CST 2013
Successfully onlined Undo Tablespace 5.
Tue Jan 29 22:08:22 CST 2013
SMON: enabling tx recovery
Tue Jan 29 22:08:22 CST 2013
Database Characterset is WE8ISO8859P1
Opening with internal Resource Manager plan 
replication_dependency_tracking turned off (no async multimaster replication found)
Tue Jan 29 22:08:33 CST 2013
Starting background process QMNC
QMNC started with pid=29, OS id=31772
Tue Jan 29 22:08:36 CST 2013
Completed: ALTER DATABASE OPEN

ocssd.log记录如下:

[    CSSD]2013-01-29 21:49:38.994 [2996820880] >WARNING: clssscExit: CSSD terminating normally from thread GMClientListener
[    CSSD]2013-01-29 22:06:50.242 >USER:    Copyright 2013, Oracle version 10.2.0.5.0						--从此处开始记录用户正常手动启动crs消息开始
[    CSSD]2013-01-29 22:06:50.242 >USER:    Starting CSS daemon on node rac2, number 2, in cluster crs
[    CSSD]2013-01-29 22:06:50.256 [3060074704] >TRACE:   clssscmain: RT queue setting is at default value
[    CSSD]2013-01-29 22:06:50.308 [3060074704] >TRACE:   clssscmain: local-only set to false
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CSSD))
[    CSSD]2013-01-29 22:06:50.416 [3060074704] >TRACE:   clssnmReadNodeInfo: added node 1 (rac1) to cluster
[    CSSD]2013-01-29 22:06:50.562 [3060074704] >TRACE:   clssnmReadNodeInfo: added node 2 (rac2) to cluster
[    CSSD]2013-01-29 22:06:50.573 [3060074704] >TRACE:   clssnmInitNMInfo: Initialized with unique 1359468409
[    CSSD]2013-01-29 22:06:50.631 [3060074704] >TRACE:   clssNMInitialize: Initializing with OCR id (1464966774)
[    CSSD]2013-01-29 22:06:51.840 [3033107344] >TRACE:   clssnm_skgxninit: Compatible vendor clusterware not in use
[    CSSD]2013-01-29 22:06:51.906 [3060074704] >TRACE:   clssnmNMInitialize: misscount set to (60)
[    CSSD]2013-01-29 22:06:51.922 [3060074704] >TRACE:   clssnmStartNM: reboottime set to (3) sec
[    CSSD]2013-01-29 22:06:51.922 [3060074704] >TRACE:   clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig start (misscount) 60000 ms
[    CSSD]2013-01-29 22:06:51.996 [3060074704] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2)
[    CSSD]2013-01-29 22:06:51.997 [3033107344] >TRACE:   clssnmvDPT: spawned for disk 0 (/dev/raw/raw2)
[    CSSD]2013-01-29 22:06:52.024 [3033107344] >TRACE:   clssnmvDiskOpen: Overwrote kill block for voting disk /dev/raw/raw2
[    CSSD]2013-01-29 22:06:54.053 [3033107344] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2)
[    CSSD]2013-01-29 22:06:54.488 [3060074704] >TRACE:   clssnmFatalInit: fatal mode enabled
[    CSSD]2013-01-29 22:06:54.800 [3024034704] >TRACE:   clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw2) initial sleep interval (1000)ms
中间省略部分输出... ...
[    CSSD]2013-01-29 22:18:56.229 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:18:56.230 [2946464656] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-01-29 22:19:00.234 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:19:00.235 [2946464656] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-01-29 22:19:04.975 [2996820880] >TRACE:   clssgmAllocProc: (0x93df420) allocated
[    CSSD]2013-01-29 22:19:04.979 [2980035472] >TRACE:   Connect request from user root
[    CSSD]2013-01-29 22:19:04.981 [2996820880] >TRACE:   clssgmClientConnectMsg: Connect from con(0x93e0778) proc(0x93df420) pid() proto(10:2:1:1)
[    CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE:   clssgmDispatchCMXMSG(): msg type(5) src(1) dest(65535) size(416) tag(00000000) incarnation(5)
[    CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE:   clssgmHandleJoinUpdate: (src 1/1) grock SRVM.DATABASE.NODEAPPS.rac1, member 0
[    CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE:   clssgmAddMember: granted member(0) flags(0x1001) node(1) grock (0x93dd748/SRVM.DATABASE.NODEAPPS.rac1)
[    CSSD]2013-01-29 22:19:04.990 [2963250064] >TRACE:   clssgmCommonAddMember: Remote member(0) node(1) flags 0x1001 0x1000001 grock (3/0x93dd748/SRVM.DATABASE.NODEAPPS.rac1)
[    CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE:   clssgmDispatchCMXMSG(): msg type(6) src(1) dest(65535) size(344) tag(00000000) incarnation(5)
[    CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE:   clssgmHandleExitUpdate: (src 1) grock SRVM.DATABASE.NODEAPPS.rac1, member 0
[    CSSD]2013-01-29 22:19:05.005 [2963250064] >TRACE:   clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.rac1) member(0/0x93dcc18) nodeNum(1) flags(0x1001) type(3)
[    CSSD]2013-01-29 22:19:05.009 [2996820880] >TRACE:   clssgmRegisterClient: proc(21/0x93df420), client(1/0x93dedb0)
[    CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE:   clssgmExecuteClientRequest: GRKJOIN recvd from client 1 (0x93dedb0)
[    CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE:   clssgmJoinGrock: grock SRVM.DATABASE.NODEAPPS.rac2 new client 0x93dedb0 with con 0x93dc998, requested num -1
[    CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE:   clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.rac2
[    CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE:   clssgmAllocateRPCIndex: allocated rpc 246 (0xb4c9d910)
[    CSSD]2013-01-29 22:19:05.010 [2996820880] >TRACE:   clssgmRPC: rpc 0xb4c9d910 (RPC#246) tag(f6002a) sent to node 1
[    CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE:   clssgmDispatchCMXMSG(): msg type(5) src(1) dest(65535) size(416) tag(00000000) incarnation(5)
[    CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE:   clssgmHandleJoinUpdate: (src 2/1) grock SRVM.DATABASE.NODEAPPS.rac2, member 0
[    CSSD]2013-01-29 22:19:05.012 [2963250064] >TRACE:   clssgmAddMember: granted member(0) flags(0x1) node(2) grock (0x93dcd50/SRVM.DATABASE.NODEAPPS.rac2)
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmQueueGrockEvent: lockName(SRVM.DATABASE.NODEAPPS.rac2) type(2) count (1/1) xwaiters(0) event(1) to memberNo(0)
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmCommonAddMember: Local member(0) node(2) flags 0x1 0x1000001 grock (3/0x93dcd50/SRVM.DATABASE.NODEAPPS.rac2)
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmRPCDone: rpc 0xb4c9d910 (RPC#246) state 6, flags 0x100
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmAddGrockMemCmpl: rpc 0xb4c9d910, ret 0, client 0x93dedb0
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmAddGrockMemCmpl: sending type 6, size 532 to 0x93dedb0
[    CSSD]2013-01-29 22:19:05.013 [2963250064] >TRACE:   clssgmFreeRPCIndex: freeing rpc 246
[    CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE:   clssgmExecuteClientRequest: GRKEXIT recvd from client 1 (0x93dedb0)
[    CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE:   clssgmExitGrock: client 1 (0x93dedb0), grock SRVM.DATABASE.NODEAPPS.rac2, member 0
[    CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE:   clssgmUnregisterClient(): removing proc 21 client 1, flags 0x04000000
[    CSSD]2013-01-29 22:19:05.042 [2996820880] >TRACE:   clssgmAllocateRPCIndex: allocated rpc 247 (0xb4c9d95c)
[    CSSD]2013-01-29 22:19:05.043 [2996820880] >TRACE:   clssgmRPC: rpc 0xb4c9d95c (RPC#247) tag(f7002a) sent to node 1
[    CSSD]2013-01-29 22:19:05.043 [2996820880] >TRACE:   clssgmUnregisterClient: client 0x93dedb0 expiring
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmDispatchCMXMSG(): msg type(6) src(1) dest(65535) size(344) tag(00000000) incarnation(5)
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmHandleExitUpdate: (src 1) grock SRVM.DATABASE.NODEAPPS.rac2, member 0
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmRPCDone: rpc 0xb4c9d95c (RPC#247) state 6, flags 0x100
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmDelMemCompl: rpc 0xb4c9d95c, ret 0, client 0x93dedb0
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmFreeRPCIndex: freeing rpc 247
[    CSSD]2013-01-29 22:19:05.046 [2963250064] >TRACE:   clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.rac2) member(0/0x93e3dc0) nodeNum(2) flags(0x1) type(3)
[    CSSD]2013-01-29 22:19:05.243 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:19:05.244 [2946464656] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE:   clssgmDeadProc: proc 0x93df420
[    CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE:   clssgmDeleteClientListener: deleting cmProc (0x93df420), with 0 clients
[    CSSD]2013-01-29 22:19:06.396 [2996820880] >TRACE:   clssgmDeleteClientListener: cleanup for proc(0x93df420) con(0x93e0778) pid()
[    CSSD]2013-01-29 22:19:09.249 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:19:09.250 [2946464656] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
[    CSSD]2013-01-29 22:19:14.255 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:19:14.256 [2946464656] >TRACE:   clssnmSendingThread: sent 5 status msgs to all nodes
[    CSSD]2013-01-29 22:19:18.261 [2946464656] >TRACE:   clssnmSendingThread: sending status msg to all nodes
[    CSSD]2013-01-29 22:19:18.262 [2946464656] >TRACE:   clssnmSendingThread: sent 4 status msgs to all nodes
下边省略正常trace消息..

crsd.log crs停止和启动的整个过程记录如下:

2013-01-29 21:08:49.058: [  CRSRES][2617236368]0CRS-1002: Resource 'ora.racdb.db' is already running on member 'rac2'

2013-01-29 21:09:40.287: [  CRSRES][2617236368]0StopResource: setting CLI values		--此处开始人为停止crs后的消息记录,包括停止所有资源的消息记录
2013-01-29 21:48:44.714: [    CRSD][2608843664]0SM: Recovery processing done.0
2013-01-29 21:48:44.715: [    CRSD][2608843664]0SM: Ongoing commands done.0
2013-01-29 21:48:44.759: [    CRSD][2608843664]0SM: asked E2E to exit.
2013-01-29 21:48:44.760: [ CRSCOMM][2736737168]0CLEANUP: quiesceing connection 0x9e7c670
2013-01-29 21:48:44.760: [ CRSMAIN][2736737168]0runCommandServer for E2E exiting.
2013-01-29 21:48:44.761: [    CRSD][2608843664]0SM: E2E port closed.3
2013-01-29 21:48:44.761: [    CRSD][2608843664]0SM: stoppedResources: all =1 m_state = 3
2013-01-29 21:48:44.814: [  CRSRES][2583665552]0StopResource: setting CLI values
2013-01-29 21:48:44.836: [  CRSRES][2617236368]0StopResource: setting CLI values
2013-01-29 21:48:44.872: [  CRSRES][2736737168]0StopResource: setting CLI values
2013-01-29 21:48:44.951: [  CRSRES][2583665552]0Attempting to stop `ora.rac2.gsd` on member `rac2`
2013-01-29 21:48:45.156: [  CRSRES][2617236368]0Attempting to stop `ora.rac2.ons` on member `rac2`
2013-01-29 21:48:45.384: [  CRSRES][2736737168]0Attempting to stop `ora.racdb.db` on member `rac2`
2013-01-29 21:48:46.961: [  CRSRES][2736737168]0Stop of `ora.racdb.db` on member `rac2` succeeded.
2013-01-29 21:48:47.430: [  CRSRES][2583665552]0Stop of `ora.rac2.gsd` on member `rac2` succeeded.
2013-01-29 21:48:47.834: [  CRSRES][2617236368]0Stop of `ora.rac2.ons` on member `rac2` succeeded.
2013-01-29 21:48:47.907: [  CRSRES][2617236368]0StopResource: setting CLI values
2013-01-29 21:48:47.917: [  CRSRES][2583665552]0StopResource: setting CLI values
2013-01-29 21:48:48.023: [  CRSRES][2617236368]0Attempting to stop `ora.racdb.racdb2.inst` on member `rac2`
2013-01-29 21:48:48.175: [  CRSRES][2583665552]0Attempting to stop `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
2013-01-29 21:48:50.604: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began.
2013-01-29 21:48:50.931: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began.
2013-01-29 21:48:51.281: [  CRSRES][2583665552]0Stop of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
2013-01-29 21:48:51.320: [  CRSRES][2583665552]0StopResource: setting CLI values
2013-01-29 21:48:51.604: [  CRSRES][2583665552]0StopResource: setting CLI values
2013-01-29 21:48:51.723: [  CRSRES][2583665552]0Attempting to stop `ora.rac2.vip` on member `rac2`
2013-01-29 21:48:52.400: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began.
2013-01-29 21:48:53.398: [ CRSMAIN][2728344464]0UI Cmd received after shutdown began.
2013-01-29 21:48:55.982: [  CRSRES][2583665552]0Stop of `ora.rac2.vip` on member `rac2` succeeded.
2013-01-29 21:49:32.346: [  CRSRES][2617236368]0Stop of `ora.racdb.racdb2.inst` on member `rac2` succeeded.
2013-01-29 21:49:32.380: [  CRSRES][2617236368]0StopResource: setting CLI values
2013-01-29 21:49:32.656: [  CRSRES][2617236368]0StopResource: setting CLI values
2013-01-29 21:49:32.778: [  CRSRES][2617236368]0Attempting to stop `ora.rac2.ASM2.asm` on member `rac2`
2013-01-29 21:49:36.878: [  CRSRES][2617236368]0Stop of `ora.rac2.ASM2.asm` on member `rac2` succeeded.
2013-01-29 21:49:37.089: [ CSSCLNT][2956516240]clssgsGGetStatus: CSS shutting down.

2013-01-29 21:49:37.089: [ CSSCLNT][2956516240]clssgsGGetStatus: returning 22

2013-01-29 21:49:37.111: [  OCRMAS][2956516240]th_master:21:clssgsgrpstat failed css ret code = 22
[  OCRAPI][2956516240]procr_ctx_set_invalid_no_abort: ctx set to invalid
2013-01-29 21:49:37.167: [  CRSEVT][2719951760]0Error dispatching EVM event; reconnecting
2013-01-29 21:49:37.181: [ CSSCLNT][2812271504]clssgsGGetStatus: CSS shutting down.

2013-01-29 21:49:37.182: [ CSSCLNT][2812271504]clssgsGGetStatus: returning 22

2013-01-29 21:49:37.182: [    CRSD][2812271504]0CRSD exiting on CSS shutdown request
2013-01-29 21:49:37.182: [    CRSD][2812271504]0Done.		--节点2的资源停完后,crs此时停完

2013-01-29 22:06:35.879: [ default][3058108096][ENTER]0		--此处开始记录crs正常人为启动的开始时的消息记录
Oracle Database 10g CRS Release 10.2.0.5.0 Production Copyright 1996, 2004, Oracle.  All rights reserved
2013-01-29 22:06:35.882: [ default][3058108096]0CRS Daemon Starting
2013-01-29 22:06:35.894: [ CRSMAIN][3058108096]0Checking the OCR device
2013-01-29 22:06:36.069: [ CRSMAIN][3058108096]0Connecting to the CSS Daemon
2013-01-29 22:06:36.278: [ COMMCRS][3057531792]clsc_connect: (0xa402d80) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_))

2013-01-29 22:06:36.279: [ CSSCLNT][3058108096]clsssInitNative: connect failed, rc 9

中间省略重复输出...

2013-01-29 22:06:52.662: [  CRSRTI][3058108096]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2013-01-29 22:06:53.844: [ COMMCRS][3057531792]clsc_connect: (0xa402dd8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_))

2013-01-29 22:06:53.844: [ CSSCLNT][3058108096]clsssInitNative: connect failed, rc 9

2013-01-29 22:06:53.845: [  CRSRTI][3058108096]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2013-01-29 22:06:58.462: [  CLSVER][3058108096]0Active Version from OCR:10.2.0.5.0
2013-01-29 22:06:58.463: [  CLSVER][3058108096]0Active Version and Software Version are same
2013-01-29 22:06:58.463: [ CRSMAIN][3058108096]0Initializing OCR
2013-01-29 22:06:58.654: [  OCRRAW][3058108096]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets (1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2013-01-29 22:06:59.104: [    CRSD][3058108096]0ENV Logging level for Module: allcomp  0
2013-01-29 22:06:59.174: [    CRSD][3058108096]0ENV Logging level for Module: default  0
2013-01-29 22:06:59.267: [    CRSD][3058108096]0ENV Logging level for Module: COMMCRS  0
2013-01-29 22:06:59.323: [    CRSD][3058108096]0ENV Logging level for Module: COMMNS  0
2013-01-29 22:06:59.347: [    CRSD][3058108096]0ENV Logging level for Module: CRSUI  0
2013-01-29 22:06:59.366: [    CRSD][3058108096]0ENV Logging level for Module: CRSCOMM  0
2013-01-29 22:06:59.388: [    CRSD][3058108096]0ENV Logging level for Module: CRSRTI  0
2013-01-29 22:06:59.414: [    CRSD][3058108096]0ENV Logging level for Module: CRSMAIN  0
2013-01-29 22:06:59.430: [    CRSD][3058108096]0ENV Logging level for Module: CRSPLACE  0
2013-01-29 22:06:59.444: [    CRSD][3058108096]0ENV Logging level for Module: CRSAPP  0
2013-01-29 22:06:59.461: [    CRSD][3058108096]0ENV Logging level for Module: CRSRES  0
2013-01-29 22:06:59.480: [    CRSD][3058108096]0ENV Logging level for Module: CRSOCR  0
2013-01-29 22:06:59.528: [    CRSD][3058108096]0ENV Logging level for Module: CRSTIMER  0
2013-01-29 22:06:59.563: [    CRSD][3058108096]0ENV Logging level for Module: CRSEVT  0
2013-01-29 22:06:59.577: [    CRSD][3058108096]0ENV Logging level for Module: CRSD  0
2013-01-29 22:06:59.591: [    CRSD][3058108096]0ENV Logging level for Module: CLUCLS  0
2013-01-29 22:06:59.625: [    CRSD][3058108096]0ENV Logging level for Module: CLSVER  0
2013-01-29 22:06:59.642: [    CRSD][3058108096]0ENV Logging level for Module: OCRRAW  0
2013-01-29 22:06:59.659: [    CRSD][3058108096]0ENV Logging level for Module: OCROSD  0
2013-01-29 22:06:59.686: [    CRSD][3058108096]0ENV Logging level for Module: CSSCLNT  0
2013-01-29 22:06:59.721: [    CRSD][3058108096]0ENV Logging level for Module: OCRAPI  0
2013-01-29 22:06:59.738: [    CRSD][3058108096]0ENV Logging level for Module: OCRUTL  0
2013-01-29 22:06:59.760: [    CRSD][3058108096]0ENV Logging level for Module: OCRMSG  0
2013-01-29 22:06:59.772: [    CRSD][3058108096]0ENV Logging level for Module: OCRCLI  0
2013-01-29 22:06:59.787: [    CRSD][3058108096]0ENV Logging level for Module: OCRCAC  0
2013-01-29 22:06:59.803: [    CRSD][3058108096]0ENV Logging level for Module: OCRSRV  0
2013-01-29 22:06:59.818: [    CRSD][3058108096]0ENV Logging level for Module: OCRMAS  0
2013-01-29 22:06:59.819: [ CRSMAIN][3058108096]0Filename is /opt/ora10g/product/crs/crs/init/rac2.pid
[  clsdmt][2805750672]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))
2013-01-29 22:07:00.035: [ CRSMAIN][3058108096]0Using Authorizer location: /opt/ora10g/product/crs/crs/auth/
2013-01-29 22:07:00.168: [ CRSMAIN][3058108096]0Initializing RTI
2013-01-29 22:07:00.492: [CRSTIMER][2788965264]0Timer Thread Starting.
2013-01-29 22:07:00.504: [  CRSRES][3058108096]0Parameter SECURITY = 1, running in USER Mode
2013-01-29 22:07:00.504: [ CRSMAIN][3058108096]0Initializing EVMMgr
2013-01-29 22:07:00.695: [ COMMCRS][2780572560]clsc_connect: (0xa893e10) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2013-01-29 22:07:01.124: [ COMMCRS][2780572560]clsc_connect: (0xa88ff28) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2013-01-29 22:06:59.760: [    CRSD][3058108096]0ENV Logging level for Module: OCRMSG  0
2013-01-29 22:06:59.772: [    CRSD][3058108096]0ENV Logging level for Module: OCRCLI  0
2013-01-29 22:06:59.787: [    CRSD][3058108096]0ENV Logging level for Module: OCRCAC  0
2013-01-29 22:06:59.803: [    CRSD][3058108096]0ENV Logging level for Module: OCRSRV  0
2013-01-29 22:06:59.818: [    CRSD][3058108096]0ENV Logging level for Module: OCRMAS  0
2013-01-29 22:06:59.819: [ CRSMAIN][3058108096]0Filename is /opt/ora10g/product/crs/crs/init/rac2.pid
[  clsdmt][2805750672]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))
2013-01-29 22:07:00.035: [ CRSMAIN][3058108096]0Using Authorizer location: /opt/ora10g/product/crs/crs/auth/
2013-01-29 22:07:00.168: [ CRSMAIN][3058108096]0Initializing RTI
2013-01-29 22:07:00.492: [CRSTIMER][2788965264]0Timer Thread Starting.
2013-01-29 22:07:00.504: [  CRSRES][3058108096]0Parameter SECURITY = 1, running in USER Mode
2013-01-29 22:07:00.504: [ CRSMAIN][3058108096]0Initializing EVMMgr
2013-01-29 22:07:00.695: [ COMMCRS][2780572560]clsc_connect: (0xa893e10) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2013-01-29 22:07:01.124: [ COMMCRS][2780572560]clsc_connect: (0xa88ff28) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2013-01-29 22:07:05.230: [ CRSMAIN][3058108096]0CRSD locked during state recovery, please wait.
2013-01-29 22:07:07.642: [ CRSMAIN][3058108096]0CRSD recovered, unlocked.
2013-01-29 22:07:07.663: [ CRSMAIN][3058108096]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2013-01-29 22:07:07.663: [ CRSMAIN][3058108096]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2013-01-29 22:07:07.787: [ CRSMAIN][3058108096]0CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2013-01-29 22:07:07.811: [ CRSMAIN][3058108096]0E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=rac2-priv)(PORT=49896))
2013-01-29 22:07:07.811: [ CRSMAIN][3058108096]0Starting Threads
2013-01-29 22:07:07.812: [ CRSMAIN][3058108096]0CRS Daemon Started.					--crs启动完毕,开始启动资源
2013-01-29 22:07:07.815: [ CRSMAIN][2730216336]0Starting runCommandServer for (UI = 1, E2E = 0). 1
2013-01-29 22:07:07.821: [ CRSMAIN][2738609040]0Starting runCommandServer for (UI = 1, E2E = 0). 0
2013-01-29 22:07:08.943: [  CRSRES][3058108096]0 startup = 1
2013-01-29 22:07:09.348: [  CRSRES][3058108096]0 startup = 1
2013-01-29 22:07:09.760: [  CRSRES][3058108096]0 startup = 1
2013-01-29 22:07:10.489: [  CRSRES][3058108096]0 startup = 1
2013-01-29 22:07:11.251: [  CRSRES][3058108096]0 startup = 1
2013-01-29 22:07:11.729: [  CRSRES][2696645520]0StopResource: setting CLI values
2013-01-29 22:07:12.891: [  CRSRES][2705038224]0startRunnable: setting CLI values
2013-01-29 22:07:12.925: [  CRSRES][2696645520]0Attempting to stop `ora.rac2.vip` on member `rac1`
2013-01-29 22:07:13.111: [  CRSRES][2705038224]0Attempting to start `ora.rac2.ASM2.asm` on member `rac2`
2013-01-29 22:07:15.414: [  CRSRES][2696645520]0Stop of `ora.rac2.vip` on member `rac1` succeeded.
2013-01-29 22:07:15.604: [  CRSRES][2696645520]0startRunnable: setting CLI values
2013-01-29 22:07:15.604: [  CRSRES][2696645520]0Attempting to start `ora.rac2.vip` on member `rac2`
2013-01-29 22:07:22.250: [  CRSRES][2696645520]0Start of `ora.rac2.vip` on member `rac2` succeeded.
2013-01-29 22:07:22.977: [  CRSRES][2696645520]0startRunnable: setting CLI values
2013-01-29 22:07:23.504: [  CRSRES][2696645520]0Attempting to start `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
2013-01-29 22:07:29.340: [  CRSRES][2696645520]0Start of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
2013-01-29 22:07:29.982: [  CRSRES][2661272464]0CRS-1002: Resource 'ora.rac2.LISTENER_RAC2.lsnr' is already running on member 'rac2'

2013-01-29 22:07:37.688: [  CRSRES][2661272464]0startRunnable: setting CLI values
2013-01-29 22:07:38.045: [  CRSRES][2661272464]0Attempting to start `ora.rac2.ons` on member `rac2`
2013-01-29 22:07:41.373: [  CRSRES][2661272464]0Start of `ora.rac2.ons` on member `rac2` succeeded.
2013-01-29 22:07:41.527: [  CRSRES][2705038224]0Start of `ora.rac2.ASM2.asm` on member `rac2` succeeded.
2013-01-29 22:07:42.692: [  CRSRES][2705038224]0startRunnable: setting CLI values
2013-01-29 22:07:42.785: [  CRSRES][2705038224]0Attempting to start `ora.racdb.racdb2.inst` on member `rac2`
2013-01-29 22:08:46.144: [  CRSRES][2705038224]0Start of `ora.racdb.racdb2.inst` on member `rac2` succeeded.
2013-01-29 22:08:46.188: [  CRSRES][2705038224]0Skip online resource: ora.rac2.ons
2013-01-29 22:08:46.907: [  CRSRES][2696645520]0startRunnable: setting CLI values
2013-01-29 22:08:46.998: [  CRSRES][2696645520]0Attempting to start `ora.rac2.gsd` on member `rac2`
2013-01-29 22:08:48.847: [  CRSRES][2696645520]0Start of `ora.rac2.gsd` on member `rac2` succeeded.
2013-01-29 22:08:51.321: [  OCRUTL][2864499600]u_freem: mem passed is null

以上就是一个正常的人为停止一个节点的crs的消息日志记录,便于大家进一步了解停止和启动crs的整个过程以及以后做对比使用。

给我留言

Copyright © YallonKing 保留所有权利.   Theme  Ality

用户登录

分享到: