A-A+

sql语句hang住后的分析解决–视图 VS 10046 VS hanganalyze

2013年12月06日 TroubleShooting 暂无评论 阅读 1,575 次

该篇文章,从一条sql hang住,从视图,10046事件以及hanganalyze分析并解决。

版本如下:

SQL> set line 400
SQL> select * from gv$version;

   INST_ID BANNER
---------- --------------------------------------------------------------------------------
         1 Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
         1 PL/SQL Release 11.2.0.3.0 - Production
         1 CORE 11.2.0.3.0      Production
         1 TNS for Linux: Version 11.2.0.3.0 - Production
         1 NLSRTL Version 11.2.0.3.0 - Production

模拟如下:

SQL> conn yallonking/yallonking
Connected.
SQL> update yallonking set name='cfca' where id=3;
--hang

该会话此刻hang住

(一)通过相关视图查看锁信息

SQL> SELECT inst_id,DECODE(request, 0, 'Holder :', 'Waiter :') || sid sess,
  2         id1,
  3         id2,
  4         lmode,
  5         2 request,
  6         type
  7    FROM gV$LOCK
  8   WHERE (id1, id2, type) IN
  9         (SELECT id1, id2, type FROM V$LOCK WHERE request > 0)
 10   ORDER BY inst_id,id1, request;

   INST_ID SESS                                                    ID1        ID2      LMODE    REQUEST TY
---------- ------------------------------------------------ ---------- ---------- ---------- ---------- --
         1 Holder :33                                           589849        746          6          2 TX
         1 Waiter :32                                           589849        746          0          2 TX

此处发现32号会话在等待33号会话所持有行锁
TM - DML enqueue
LMODE 6 - exclusive (X)
REQUEST 2 - row-S (SS)

查看32号会话对应的sql(也是hang住的sql)

SQL> select s.inst_id,s.sid,s.MACHINE, s.serial#, s.status, s.username, t.sql_text  from gv$session s, gv$sqltext t where s.sql_address = t.address   and s.sid in ('33','32') order by s.sid, t.piece;

INST_ID  SID MACHINE            SERIAL# STATUS   USERNAME        SQL_TEXT
------- ---- --------------- ---------- -------- --------------- ----------------------------------------------------------------
      1   32 rhl6.0x64              401 ACTIVE   YALLONKING      update yallonking set name='cfca' where id=3		 

(二)通过10046查看该会话所对应的等待事件

SQL> select a.spid from v$process a ,v$session b
  2  where a.addr=b.paddr and b.sid in (32);

SPID
------------------------
9601
 
SQL> oradebug setospid 9601
Oracle pid: 27, Unix process pid: 9601, image: oracle@rhl6.0x64 (TNS V1-V3)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc
SQL> oradebug Event 10046 trace name context forever, level 12
Statement processed.
//此处重新在原窗口执行hang住的sql语句,并等待一会儿
SQL> oradebug Event 10046 trace name context off
Statement processed.

查看10046生成的trace文件

[root@rhl6 ~]# cat /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc
Trace file /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9601.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      rhl6.0x64
Release:        2.6.32-71.el6.x86_64
Version:        #1 SMP Wed Sep 1 01:33:01 EDT 2010
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: ora11gr2
Redo thread mounted by this instance: 1
Oracle process number: 27
Unix process pid: 9601, image: oracle@rhl6.0x64 (TNS V1-V3)
... 省略 ...
*** 2013-11-15 06:14:55.634
Finished processing ORADEBUG command (#7) 'Event 10046 trace name context forever, level 12'

*** 2013-11-15 06:14:58.767
WAIT #140147045660632: nam='enq: TX - row lock contention' ela= 772381307 name|mode=1415053318 usn<<16 | slot=589849 sequence=746 obj#=71458 tim=1384467298767000	//发现等待事件,对象信息
ERROR #140147045660632:err=1013 tim=1384467298767346	//用户人为取消当前操作
WAIT #140147045660632: nam='SQL*Net break/reset to client' ela= 686 driver id=1650815232 break?=0 p3=0 obj#=71458 tim=1384467298768299
WAIT #140147045660632: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=71458 tim=1384467298768388

*** 2013-11-15 06:15:06.190
WAIT #140147045660632: nam='SQL*Net message from client' ela= 7421834 driver id=1650815232 #bytes=1 p3=0 obj#=71458 tim=1384467306190279
CLOSE #140147045660632:c=0,e=18,dep=0,type=1,tim=1384467306190408
=====================
PARSING IN CURSOR #140147045663736 len=44 dep=0 uid=80 oct=6 lid=80 tim=1384467306240216 hv=1429454996 ad='8ee71a38' sqlid='ft3g9b9am7h4n'
update yallonking set name='cfca' where id=3	//等待的会话所执行的sql
END OF STMT
PARSE #140147045663736:c=3999,e=49757,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1625575613,tim=1384467306240215

*** 2013-11-15 06:16:39.102
Received ORADEBUG command (#8) 'Event 10046 trace name context off' from process 'Unix process pid: 9655, image: <none>'

*** 2013-11-15 06:16:39.103
Finished processing ORADEBUG command (#8) 'Event 10046 trace name context off'

从上边可以看出等待事件是"enq: TX - row lock contention" obj#=71458

SQL> col OWNER for a20
SQL> col OBJECT_NAME for a20
SQL> col OBJECT_ID for 999999
SQL> select OWNER,OBJECT_NAME,OBJECT_ID from dba_objects where OBJECT_ID=71458;

OWNER                OBJECT_NAME          OBJECT_ID
-------------------- -------------------- ---------
YALLONKING           YALLONKING               71458

(三)通过hanganalyze分析
在另一个会话中

SQL>oradebug setmypid
SQL>oradebug unlimit;
SQL>oradebug dump hanganalyze 3	//此处选择3级,指定单一进程
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc

10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all

查看生成的trace文件

[root@rhl6 ~]# cat /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc
Trace file /u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      rhl6.0x64
Release:        2.6.32-71.el6.x86_64
Version:        #1 SMP Wed Sep 1 01:33:01 EDT 2010
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: ora11gr2
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 9781, image: oracle@rhl6.0x64 (TNS V1-V3)


*** 2013-11-15 06:20:03.265
*** SESSION ID:(35.153) 2013-11-15 06:20:03.265
*** CLIENT ID:() 2013-11-15 06:20:03.265
*** SERVICE NAME:(SYS$USERS) 2013-11-15 06:20:03.265
*** MODULE NAME:(sqlplus@rhl6.0x64 (TNS V1-V3)) 2013-11-15 06:20:03.265
*** ACTION NAME:() 2013-11-15 06:20:03.265
 
Processing Oradebug command 'setmypid'

*** 2013-11-15 06:20:03.265
Oradebug command 'setmypid' console output: <none>

*** 2013-11-15 06:20:08.081
Processing Oradebug command 'unlimit'

*** 2013-11-15 06:20:08.081
Oradebug command 'unlimit' console output: <none>

*** 2013-11-15 06:20:13.734
Processing Oradebug command 'dump hanganalyze 3'

*** 2013-11-15 06:20:13.805
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): ora11gr2.ora11gr2
  oradebug_node_dump_level: 3
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 06:20:13 ]
      NOTE: scheduling delay has not been sampled for 0.692331 secs    0.000000 secs from [ 06:20:09 - 06:20:14 ], 5 sec avg
    0.000000 secs from [ 06:19:14 - 06:20:14 ], 1 min avg
    0.000000 secs from [ 06:15:14 - 06:20:14 ], 5 min avg
===============================================================================
 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
     Chain 1 Signature Hash: 0x38c48850
 
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (ora11gr2.ora11gr2)
                   os id: 9601
              process id: 27, oracle@rhl6.0x64 (TNS V1-V3)
              session id: 32
        session serial #: 401
    }
    is waiting for 'enq: TX - row lock contention' with wait info:		//32号会话等待行锁
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x90019
                      p3: 'sequence'=0x2ea
            time in wait: 5 min 7 sec	//等待时长
           timeout after: never
                 wait id: 36
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.050514 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 7.421834 sec
                     wait id: 35              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000057 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000002 sec
                     wait id: 34              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000087 sec
              3.       event: 'SQL*Net break/reset to client'
                 time waited: 0.000686 sec
                     wait id: 33              p1: 'driver id'=0x62657100
                                              p2: 'break?'=0x0
    }
    and is blocked by
 => Oracle session identified by:		//32号会话被33号会话阻塞
    {
                instance: 1 (ora11gr2.ora11gr2)
                   os id: 9548
              process id: 26, oracle@rhl6.0x64 (TNS V1-V3)
              session id: 33
        session serial #: 263
    }
    which is waiting for 'SQL*Net message from client' with wait info:		//33号会话是一个空闲等待的会话
    {
                      p1: 'driver id'=0x62657100
                      p2: '#bytes'=0x1
            time in wait: 24 min 49 sec		//空闲时长
           timeout after: never
                 wait id: 53
                blocking: 1 session
            wait history:
              * time between current wait and wait #1: 0.000011 sec
              1.       event: 'SQL*Net message to client'
                 time waited: 0.000002 sec
                     wait id: 52              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.002205 sec
              2.       event: 'db file sequential read'
                 time waited: 0.015846 sec
                     wait id: 51              p1: 'file#'=0x1
                                              p2: 'block#'=0x109b8
                                              p3: 'blocks'=0x1
              * time between wait #2 and #3: 0.001734 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 1 min 11 sec
                     wait id: 50              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }
 
Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------
 
===============================================================================
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [LEAF] [LEAF_NW] 
[level  5] :   1 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] 
 
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[31]/1/32/401/0x97bbfaa0/9601/NLEAF/[32]
[32]/1/33/263/0x97bbcaf8/9548/LEAF/

*** 2013-11-15 06:20:13.807
===============================================================================
END OF HANG ANALYSIS
===============================================================================

*** 2013-11-15 06:20:13.807
===============================================================================
HANG ANALYSIS DUMPS:
  oradebug_node_dump_level: 3
===============================================================================
 
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[31]/1/32/401/0x97bbfaa0/9601/NLEAF/[32]
[32]/1/33/263/0x97bbcaf8/9548/LEAF/
 
 
No processes qualify for dumping.
 
===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================

*** 2013-11-15 06:20:13.808
Oradebug command 'dump hanganalyze 3' console output: <none>

*** 2013-11-15 06:21:12.687
Processing Oradebug command 'tracefile_name'

*** 2013-11-15 06:21:12.687
Oradebug command 'tracefile_name' console output: 
/u01/app/oracle/diag/rdbms/ora11gr2/ora11gr2/trace/ora11gr2_ora_9781.trc

杀掉持有锁的会话(该会话是一个空闲会话),阻塞的会话(32)即刻恢复

SQL> select 'alter system kill session '||''''||c.sid||','||c.serial#||''''||';' from v$session c where c.sid=33;

'ALTERSYSTEMKILLSESSION'||''''||C.SID||','||C.SERIAL#||''''||';'
--------------------------------------------------------------------------------
alter system kill session '33,263';

SQL> alter system kill session '33,263';

System altered.
标签:

给我留言

Copyright © YallonKing 保留所有权利.   Theme  Ality

用户登录

分享到: