A-A+
sql语句hang住后的分析解决–视图 VS 10046 VS hanganalyze
该篇文章,从一条sql hang住,从视图,10046事件以及hanganalyze分析并解决。
版本如下:
SQL> set line 400 SQL> select * from gv$version; INST_ID BANNER ---------- -------------------------------------------------------------------------------- 1 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production 1 PL/SQL Release 11.2.0.3.0 - Production 1 CORE 11.2.0.3.0 Production 1 TNS for Linux: Version 11.2.0.3.0 - Production 1 NLSRTL Version 11.2.0.3.0 - Production
模拟如下:
SQL> conn yallonking/yallonking Connected. SQL> update yallonking set name='cfca' where id=3; --hang
该会话此刻hang住
(一)通过相关视图查看锁信息
SQL> SELECT inst_id,DECODE(request, 0, 'Holder :', 'Waiter :') || sid sess, 2 id1, 3 id2, 4 lmode, 5 2 request, 6 type 7 FROM gV$LOCK 8 WHERE (id1, id2, type) IN 9 (SELECT id1, id2, type FROM V$LOCK WHERE request > 0) 10 ORDER BY inst_id,id1, request; INST_ID SESS ID1 ID2 LMODE REQUEST TY ---------- ------------------------------------------------ ---------- ---------- ---------- ---------- -- 1 Holder :33 589849 746 6 2 TX 1 Waiter :32 589849 746 0 2 TX
此处发现32号会话在等待33号会话所持有行锁
TM - DML enqueue
LMODE 6 - exclusive (X)
REQUEST 2 - row-S (SS)
查看32号会话对应的sql(也是hang住的sql)
SQL> select s.inst_id,s.sid,s.MACHINE, s.serial#, s.status, s.username, t.sql_text from gv$session s, gv$sqltext t where s.sql_address = t.address and s.sid in ('33','32') order by s.sid, t.piece; INST_ID SID MACHINE SERIAL# STATUS USERNAME SQL_TEXT ------- ---- --------------- ---------- -------- --------------- ---------------------------------------------------------------- 1 32 rhl6.0x64 401 ACTIVE YALLONKING update yallonking set name='cfca' where id=3
(二)通过10046查看该会话所对应的等待事件
SQL> select a.spid from v$process a ,v$session b 2 where a.addr=b.paddr and b.sid in (32); SPID ------------------------ 9601 SQL> oradebug setospid 9601 Oracle pid: 27, Unix process pid: 9601, image: oracle@rhl6.0x64 (TNS V1-V3) SQL> oradebug unlimit Statement processed. SQL> oradebug tracefile_name /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc SQL> oradebug Event 10046 trace name context forever, level 12 Statement processed. //此处重新在原窗口执行hang住的sql语句,并等待一会儿 SQL> oradebug Event 10046 trace name context off Statement processed.
查看10046生成的trace文件
[root@rhl6 ~]# cat /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc Trace file /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1 System name: Linux Node name: rhl6.0x64 Release: 2.6.32-71.el6.x86_64 Version: #1 SMP Wed Sep 1 01:33:01 EDT 2010 Machine: x86_64 VM name: VMWare Version: 6 Instance name: ora11gr2 Redo thread mounted by this instance: 1 Oracle process number: 27 Unix process pid: 9601, image: oracle@rhl6.0x64 (TNS V1-V3) ... 省略 ... *** 2013-11-15 06:14:55.634 Finished processing ORADEBUG command (#7) 'Event 10046 trace name context forever, level 12' *** 2013-11-15 06:14:58.767 WAIT #140147045660632: nam='enq: TX - row lock contention' ela= 772381307 name|mode=1415053318 usn<<16 | slot=589849 sequence=746 obj#=71458 tim=1384467298767000 //发现等待事件,对象信息 ERROR #140147045660632:err=1013 tim=1384467298767346 //用户人为取消当前操作 WAIT #140147045660632: nam='SQL*Net break/reset to client' ela= 686 driver id=1650815232 break?=0 p3=0 obj#=71458 tim=1384467298768299 WAIT #140147045660632: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=71458 tim=1384467298768388 *** 2013-11-15 06:15:06.190 WAIT #140147045660632: nam='SQL*Net message from client' ela= 7421834 driver id=1650815232 #bytes=1 p3=0 obj#=71458 tim=1384467306190279 CLOSE #140147045660632:c=0,e=18,dep=0,type=1,tim=1384467306190408 ===================== PARSING IN CURSOR #140147045663736 len=44 dep=0 uid=80 oct=6 lid=80 tim=1384467306240216 hv=1429454996 ad='8ee71a38' sqlid='ft3g9b9am7h4n' update yallonking set name='cfca' where id=3 //等待的会话所执行的sql END OF STMT PARSE #140147045663736:c=3999,e=49757,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1625575613,tim=1384467306240215 *** 2013-11-15 06:16:39.102 Received ORADEBUG command (#8) 'Event 10046 trace name context off' from process 'Unix process pid: 9655, image: <none>' *** 2013-11-15 06:16:39.103 Finished processing ORADEBUG command (#8) 'Event 10046 trace name context off'
从上边可以看出等待事件是"enq: TX - row lock contention" obj#=71458
SQL> col OWNER for a20 SQL> col OBJECT_NAME for a20 SQL> col OBJECT_ID for 999999 SQL> select OWNER,OBJECT_NAME,OBJECT_ID from dba_objects where OBJECT_ID=71458; OWNER OBJECT_NAME OBJECT_ID -------------------- -------------------- --------- YALLONKING YALLONKING 71458
(三)通过hanganalyze分析
在另一个会话中
SQL>oradebug setmypid SQL>oradebug unlimit; SQL>oradebug dump hanganalyze 3 //此处选择3级,指定单一进程 SQL> oradebug tracefile_name /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
查看生成的trace文件
[root@rhl6 ~]# cat /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc Trace file /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1 System name: Linux Node name: rhl6.0x64 Release: 2.6.32-71.el6.x86_64 Version: #1 SMP Wed Sep 1 01:33:01 EDT 2010 Machine: x86_64 VM name: VMWare Version: 6 Instance name: ora11gr2 Redo thread mounted by this instance: 1 Oracle process number: 28 Unix process pid: 9781, image: oracle@rhl6.0x64 (TNS V1-V3) *** 2013-11-15 06:20:03.265 *** SESSION ID:(35.153) 2013-11-15 06:20:03.265 *** CLIENT ID:() 2013-11-15 06:20:03.265 *** SERVICE NAME:(SYS$USERS) 2013-11-15 06:20:03.265 *** MODULE NAME:(sqlplus@rhl6.0x64 (TNS V1-V3)) 2013-11-15 06:20:03.265 *** ACTION NAME:() 2013-11-15 06:20:03.265 Processing Oradebug command 'setmypid' *** 2013-11-15 06:20:03.265 Oradebug command 'setmypid' console output: <none> *** 2013-11-15 06:20:08.081 Processing Oradebug command 'unlimit' *** 2013-11-15 06:20:08.081 Oradebug command 'unlimit' console output: <none> *** 2013-11-15 06:20:13.734 Processing Oradebug command 'dump hanganalyze 3' *** 2013-11-15 06:20:13.805 =============================================================================== HANG ANALYSIS: instances (db_name.oracle_sid): ora11gr2.ora11gr2 oradebug_node_dump_level: 3 os thread scheduling delay history: (sampling every 1.000000 secs) 0.000000 secs at [ 06:20:13 ] NOTE: scheduling delay has not been sampled for 0.692331 secs 0.000000 secs from [ 06:20:09 - 06:20:14 ], 5 sec avg 0.000000 secs from [ 06:19:14 - 06:20:14 ], 1 min avg 0.000000 secs from [ 06:15:14 - 06:20:14 ], 5 min avg =============================================================================== Chains most likely to have caused the hang: [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 =============================================================================== Non-intersecting chains: ------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (ora11gr2.ora11gr2) os id: 9601 process id: 27, oracle@rhl6.0x64 (TNS V1-V3) session id: 32 session serial #: 401 } is waiting for 'enq: TX - row lock contention' with wait info: //32号会话等待行锁 { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x90019 p3: 'sequence'=0x2ea time in wait: 5 min 7 sec //等待时长 timeout after: never wait id: 36 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.050514 sec 1. event: 'SQL*Net message from client' time waited: 7.421834 sec wait id: 35 p1: 'driver id'=0x62657100 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000057 sec 2. event: 'SQL*Net message to client' time waited: 0.000002 sec wait id: 34 p1: 'driver id'=0x62657100 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.000087 sec 3. event: 'SQL*Net break/reset to client' time waited: 0.000686 sec wait id: 33 p1: 'driver id'=0x62657100 p2: 'break?'=0x0 } and is blocked by => Oracle session identified by: //32号会话被33号会话阻塞 { instance: 1 (ora11gr2.ora11gr2) os id: 9548 process id: 26, oracle@rhl6.0x64 (TNS V1-V3) session id: 33 session serial #: 263 } which is waiting for 'SQL*Net message from client' with wait info: //33号会话是一个空闲等待的会话 { p1: 'driver id'=0x62657100 p2: '#bytes'=0x1 time in wait: 24 min 49 sec //空闲时长 timeout after: never wait id: 53 blocking: 1 session wait history: * time between current wait and wait #1: 0.000011 sec 1. event: 'SQL*Net message to client' time waited: 0.000002 sec wait id: 52 p1: 'driver id'=0x62657100 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.002205 sec 2. event: 'db file sequential read' time waited: 0.015846 sec wait id: 51 p1: 'file#'=0x1 p2: 'block#'=0x109b8 p3: 'blocks'=0x1 * time between wait #2 and #3: 0.001734 sec 3. event: 'SQL*Net message from client' time waited: 1 min 11 sec wait id: 50 p1: 'driver id'=0x62657100 p2: '#bytes'=0x1 } Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- =============================================================================== Extra information that will be dumped at higher levels: [level 4] : 1 node dumps -- [LEAF] [LEAF_NW] [level 5] : 1 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] State of ALL nodes ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): [31]/1/32/401/0x97bbfaa0/9601/NLEAF/[32] [32]/1/33/263/0x97bbcaf8/9548/LEAF/ *** 2013-11-15 06:20:13.807 =============================================================================== END OF HANG ANALYSIS =============================================================================== *** 2013-11-15 06:20:13.807 =============================================================================== HANG ANALYSIS DUMPS: oradebug_node_dump_level: 3 =============================================================================== State of LOCAL nodes ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): [31]/1/32/401/0x97bbfaa0/9601/NLEAF/[32] [32]/1/33/263/0x97bbcaf8/9548/LEAF/ No processes qualify for dumping. =============================================================================== HANG ANALYSIS DUMPS: END =============================================================================== *** 2013-11-15 06:20:13.808 Oradebug command 'dump hanganalyze 3' console output: <none> *** 2013-11-15 06:21:12.687 Processing Oradebug command 'tracefile_name' *** 2013-11-15 06:21:12.687 Oradebug command 'tracefile_name' console output: /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc
杀掉持有锁的会话(该会话是一个空闲会话),阻塞的会话(32)即刻恢复
SQL> select 'alter system kill session '||''''||c.sid||','||c.serial#||''''||';' from v$session c where c.sid=33; 'ALTERSYSTEMKILLSESSION'||''''||C.SID||','||C.SERIAL#||''''||';' -------------------------------------------------------------------------------- alter system kill session '33,263'; SQL> alter system kill session '33,263'; System altered.