ASM磁盘组失败情景测试-normal redundancy
ASM磁盘组失败情景测试
测试场景1:
有一个磁盘组data2,含有2个失败组(fgdata2_c,fgdata2_d),每个失败组的所有磁盘分布在单独的磁盘控制器上(sdc,sdd)
当其中一个磁盘控制器失败,然后添加新的磁盘,重新添加失败组到该磁盘组,全过程均不影响数据库正常操作。
==========================================
测试场景2:
有一个磁盘组data3,含有2个失败组(fgdata3_cd1,fgdata3_cd2),每个失败组的所有磁盘均匀的分布在2个磁盘控制器上(sdc,sdd)
当其中一个磁盘控制器失败,然后添加新的磁盘,重新添加失败组到该磁盘组,全过程均不影响数据库正常操作。
==========================================
下边测试场景1:
虚拟机添加2块盘,各分4个区
[root@test1 ~]# fdisk -l /dev/sdc Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 63 506016 83 Linux /dev/sdc2 64 126 506047+ 83 Linux /dev/sdc3 127 189 506047+ 83 Linux /dev/sdc4 190 261 578340 83 Linux [root@test1 ~]# fdisk -l /dev/sdd Disk /dev/sdd: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 63 506016 83 Linux /dev/sdd2 64 126 506047+ 83 Linux /dev/sdd3 127 189 506047+ 83 Linux /dev/sdd4 190 261 578340 83 Linux
绑定后如下:
[root@test1 rules.d]# ls -l /dev/disk{c,d}* brw-rw---- 1 oracle oinstall 8, 33 May 30 14:26 /dev/diskc_1 brw-rw---- 1 oracle oinstall 8, 34 May 30 14:26 /dev/diskc_2 brw-rw---- 1 oracle oinstall 8, 35 May 30 14:26 /dev/diskc_3 brw-rw---- 1 oracle oinstall 8, 36 May 30 14:26 /dev/diskc_4 brw-rw---- 1 oracle oinstall 8, 49 May 30 14:26 /dev/diskd_1 brw-rw---- 1 oracle oinstall 8, 50 May 30 14:26 /dev/diskd_2 brw-rw---- 1 oracle oinstall 8, 51 May 30 14:26 /dev/diskd_3 brw-rw---- 1 oracle oinstall 8, 52 May 30 14:26 /dev/diskd_4
创建情景1的磁盘组
SQL> create diskgroup data2 normal redundancy 2 failgroup fgdata2_c disk 3 '/dev/diskc_1', 4 '/dev/diskc_2', 5 '/dev/diskc_3', 6 '/dev/diskc_4' 7 failgroup fgdata2_d disk 8 '/dev/diskd_1', 9 '/dev/diskd_2', 10 '/dev/diskd_3', 11 '/dev/diskd_4'; Diskgroup created. SQL> select GROUP_NUMBER,NAME,TYPE,STATE,TOTAL_MB,FREE_MB from v$asm_diskgroup; GROUP_NUMBER NAME TYPE STATE TOTAL_MB FREE_MB ------------ ------------------------------ ------ ----------- ---------- ---------- 1 DATA2 NORMAL MOUNTED 4092 3978 SQL> select group_number g#,disk_number d#,name,mount_status,header_status,total_mb,free_mb,path from v$asm_disk; G# D# NAME MOUNT_S HEADER_STATU TOTAL_MB FREE_MB PATH ---------- ---------- ------------------------------ ------- ------------ ---------- ---------- -------------------- 1 6 DATA2_0006 CACHED MEMBER 494 477 /dev/diskd_3 1 7 DATA2_0007 CACHED MEMBER 564 554 /dev/diskd_4 1 5 DATA2_0005 CACHED MEMBER 494 480 /dev/diskd_2 1 4 DATA2_0004 CACHED MEMBER 494 478 /dev/diskd_1 1 1 DATA2_0001 CACHED MEMBER 494 478 /dev/diskc_2 1 3 DATA2_0003 CACHED MEMBER 564 553 /dev/diskc_4 1 2 DATA2_0002 CACHED MEMBER 494 479 /dev/diskc_3 1 0 DATA2_0000 CACHED MEMBER 494 479 /dev/diskc_1 8 rows selected.
创建测试数据
SQL> sho user USER is "SYS" SQL> create tablespace yallonking datafile '+data2/yallonking01.dbf' size 10m; Tablespace created. SQL> create table yallonking tablespace yallonking as select * from dba_objects; Table created. SQL> select count(*) from YALLONKING; COUNT(*) ---------- 9400
下边人为将sdd磁盘所有分区全部格式化并将磁盘从虚拟机拿下,模拟磁盘存储故障
[root@test1 ~]# dd if=/dev/zero of=/dev/sdd1 bs=1M count=512 dd: writing `/dev/sdd1': No space left on device 495+0 records in 494+0 records out 518160384 bytes (518 MB) copied, 2.80467 seconds, 185 MB/s [root@test1 ~]# dd if=/dev/zero of=/dev/sdd2 bs=1M count=512 dd: writing `/dev/sdd2': No space left on device 495+0 records in 494+0 records out 518192640 bytes (518 MB) copied, 3.36884 seconds, 154 MB/s [root@test1 ~]# dd if=/dev/zero of=/dev/sdd3 bs=1M count=512 dd: writing `/dev/sdd3': No space left on device 495+0 records in 494+0 records out 518192640 bytes (518 MB) copied, 9.98685 seconds, 51.9 MB/s [root@test1 ~]# dd if=/dev/zero of=/dev/sdd4 bs=1M count=1000 dd: writing `/dev/sdd4': No space left on device 565+0 records in 564+0 records out 592220160 bytes (592 MB) copied, 16.1438 seconds, 36.7 MB/s [root@test1 ~]# fdisk -l Disk /dev/sda: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 64 514048+ 83 Linux /dev/sda2 65 586 4192965 82 Linux swap / Solaris /dev/sda3 587 2610 16257780 83 Linux Disk /dev/sdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 1305 10482381 83 Linux Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 63 506016 83 Linux /dev/sdc2 64 126 506047+ 83 Linux /dev/sdc3 127 189 506047+ 83 Linux /dev/sdc4 190 261 578340 83 Linux SQL> alter system flush buffer_cache; System altered.
可能会遭遇此bug,如下:
SQL> select count(*) from YALLONKING; select count(*) from YALLONKING * ERROR at line 1: ORA-08103: object no longer exists
此时,oracle无报错。
asm中修复
SQL> alter diskgroup data2 check all norepair; Diskgroup altered. SQL> alter system flush buffer_cache; System altered. SQL> select count(*) from YALLONKING; COUNT(*) ---------- 9400
asm日志
Fri May 30 12:37:04 2014 SQL> alter diskgroup data2 check all norepair Fri May 30 12:37:04 2014 NOTE: starting check of diskgroup DATA2 WARNING: cache failed to read dsk=4 blk=3 from disk(s): 4 ORA-15196: invalid ASM block header [kfc.c:7910] [endian_kfbh] [2147483652] [3] [0 != 1] NOTE: a corrupted block was dumped to the trace file System State dumped to trace file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/udump/+asm_ora_5569.trc NOTE: cache initiating offline of disk 4 group 1 WARNING: offlining disk 4.3916022481 (DATA2_0004) with mask 0x3 NOTE: PST update: grp = 1, dsk = 4, mode = 0x6 Fri May 30 12:37:05 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) NOTE: group DATA2: relocated PST to: disk 0005 (PST copy 1) Fri May 30 12:37:05 2014 NOTE: PST update: grp = 1, dsk = 4, mode = 0x4 NOTE: cache closing disk 4 of grp 1: DATA2_0004 WARNING: cache failed to read dsk=5 blk=3 from disk(s): 5 ORA-15196: invalid ASM block header [kfc.c:7910] [endian_kfbh] [2147483653] [3] [0 != 1] NOTE: a corrupted block was dumped to the trace file System State dumped to trace file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/udump/+asm_ora_5569.trc NOTE: cache initiating offline of disk 5 group 1 WARNING: offlining disk 5.3916022482 (DATA2_0005) with mask 0x3 NOTE: PST update: grp = 1, dsk = 5, mode = 0x6 Fri May 30 12:37:05 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) NOTE: group DATA2: relocated PST to: disk 0006 (PST copy 1) Fri May 30 12:37:05 2014 NOTE: PST update: grp = 1, dsk = 5, mode = 0x4 NOTE: cache closing disk 5 of grp 1: DATA2_0005 WARNING: cache failed to read dsk=6 blk=3 from disk(s): 6 ORA-15196: invalid ASM block header [kfc.c:7910] [endian_kfbh] [2147483654] [3] [0 != 1] NOTE: a corrupted block was dumped to the trace file System State dumped to trace file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/udump/+asm_ora_5569.trc NOTE: cache initiating offline of disk 6 group 1 WARNING: offlining disk 6.3916022483 (DATA2_0006) with mask 0x3 NOTE: PST update: grp = 1, dsk = 6, mode = 0x6 Fri May 30 12:37:05 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) NOTE: group DATA2: relocated PST to: disk 0007 (PST copy 1) Fri May 30 12:37:05 2014 NOTE: PST update: grp = 1, dsk = 6, mode = 0x4 NOTE: cache closing disk 6 of grp 1: DATA2_0006 WARNING: cache failed to read dsk=7 blk=3 from disk(s): 7 ORA-15196: invalid ASM block header [kfc.c:7910] [endian_kfbh] [2147483655] [3] [0 != 1] NOTE: a corrupted block was dumped to the trace file System State dumped to trace file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/udump/+asm_ora_5569.trc NOTE: cache initiating offline of disk 7 group 1 WARNING: offlining disk 7.3916022484 (DATA2_0007) with mask 0x3 NOTE: PST update: grp = 1, dsk = 7, mode = 0x6 Fri May 30 12:37:05 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) Fri May 30 12:37:05 2014 NOTE: PST update: grp = 1, dsk = 7, mode = 0x4 Fri May 30 12:37:05 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) NOTE: cache closing disk 7 of grp 1: DATA2_0007 Fri May 30 12:37:05 2014 SUCCESS: check of diskgroup DATA2 found no errors
Fri May 30 12:38:13 2014 WARNING: PST-initiated drop disk 1(520696412).4(3916022481) (DATA2_0004) WARNING: PST-initiated drop disk 1(520696412).5(3916022482) (DATA2_0005) WARNING: PST-initiated drop disk 1(520696412).6(3916022483) (DATA2_0006) WARNING: PST-initiated drop disk 1(520696412).7(3916022484) (DATA2_0007) NOTE: PST update: grp = 1 Fri May 30 12:38:13 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) Fri May 30 12:38:13 2014 NOTE: requesting all-instance PST refresh for group=1 Fri May 30 12:38:13 2014 NOTE: PST refresh pending for group 1/0x1f09325c (DATA2) SUCCESS: refreshed PST for 1/0x1f09325c (DATA2) Fri May 30 12:38:16 2014 NOTE: starting rebalance of group 1/0x1f09325c (DATA2) at power 1 Starting background process ARB0 ARB0 started with pid=15, OS id=5858 Fri May 30 12:38:16 2014 NOTE: assigning ARB0 to group 1/0x1f09325c (DATA2) Fri May 30 12:38:16 2014 SUCCESS: PST-initiated disk drop completed Fri May 30 12:38:20 2014 Errors in file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm_arb0_5858.trc: ORA-15041: diskgroup space exhausted Fri May 30 12:38:20 2014 NOTE: stopping process ARB0 Fri May 30 12:38:22 2014 WARNING: rebalance not completed for group 1/0x1f09325c (DATA2) Fri May 30 12:38:22 2014 SUCCESS: rebalance completed for group 1/0x1f09325c (DATA2) NOTE: PST update: grp = 1 Fri May 30 12:38:22 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) Fri May 30 12:38:22 2014 WARNING: offline disk number 4 has references (14 AUs) WARNING: offline disk number 5 has references (14 AUs) WARNING: offline disk number 6 has references (16 AUs) WARNING: offline disk number 7 has references (17 AUs) NOTE: PST update: grp = 1 Fri May 30 12:38:22 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0)
注意:asm先定位失败的磁盘位置以及影响到的AU,尝试进行rebalance,发现只有一个failgroup可用,故报磁盘空间不足,rebalance失败并将相关磁盘offline。
查看测试数据
SQL> alter system flush buffer_cache; System altered. SQL> select count(*) from YALLONKING; COUNT(*) ---------- 9400 SQL> insert into yallonking select * from yallonking; 9400 rows created. SQL> commit; Commit complete. SQL> select count(*) from YALLONKING; COUNT(*) ---------- 18800
查看asm磁盘情况
SQL> select group_number g#,disk_number d#,name,mount_status,header_status,total_mb,free_mb,path from v$asm_disk; G# D# NAME MOUNT_S HEADER_STATU TOTAL_MB FREE_MB PATH ---------- ---------- ------------------------------ ------- ------------ ---------- ---------- -------------------- 1 7 DATA2_0007 MISSING CANDIDATE 0 0 1 4 DATA2_0004 MISSING CANDIDATE 0 0 1 5 DATA2_0005 MISSING CANDIDATE 0 0 1 6 DATA2_0006 MISSING CANDIDATE 0 0 1 0 DATA2_0000 CACHED MEMBER 494 475 /dev/diskc_1 1 1 DATA2_0001 CACHED MEMBER 494 476 /dev/diskc_2 1 2 DATA2_0002 CACHED MEMBER 494 477 /dev/diskc_3 1 3 DATA2_0003 CACHED MEMBER 564 549 /dev/diskc_4 8 rows selected.
注意:此时,asm已经标记disk丢失。
重启数据库,ASM实例,主机;防止缓存影响
SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> shutdown immediate; ASM diskgroups dismounted ASM instance shutdown [root@test1 ~]# reboot Broadcast message from root (pts/1) (Fri May 30 15:18:59 2014): The system is going down for reboot NOW!
启动asm实例
[oracle@test1 ~]$ export ORACLE_SID=+ASM [oracle@test1 ~]$ sqlplus "/as sysdba" SQL*Plus: Release 10.2.0.1.0 - Production on Fri May 30 15:21:53 2014 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ASM instance started Total System Global Area 83886080 bytes Fixed Size 1217836 bytes Variable Size 57502420 bytes ASM Cache 25165824 bytes ASM diskgroups mounted
启动db实例并测试数据
SQL> startup ORACLE instance started. Total System Global Area 314572800 bytes Fixed Size 1219184 bytes Variable Size 100664720 bytes Database Buffers 209715200 bytes Redo Buffers 2973696 bytes Database mounted. Database opened. SQL> insert into yallonking select * from yallonking; 18800 rows created. SQL> commit; Commit complete. SQL> select count(*) from yallonking; COUNT(*) ---------- 37600
下边添加虚拟机磁盘模拟磁盘恢复正常
[root@test1 ~]# fdisk -l /dev/sdd Disk /dev/sdd: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdd doesn't contain a valid partition table
分区后如下:
[root@test1 ~]# fdisk -l /dev/sdd Disk /dev/sdd: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 63 506016 83 Linux /dev/sdd2 64 126 506047+ 83 Linux /dev/sdd3 127 189 506047+ 83 Linux /dev/sdd4 190 261 578340 83 Linux
绑定后如下:
[root@test1 ~]# ls -l /dev/diskd* brw-rw---- 1 oracle oinstall 8, 49 May 30 15:51 /dev/diskd_1 brw-rw---- 1 oracle oinstall 8, 50 May 30 15:51 /dev/diskd_2 brw-rw---- 1 oracle oinstall 8, 51 May 30 15:51 /dev/diskd_3 brw-rw---- 1 oracle oinstall 8, 52 May 30 15:51 /dev/diskd_4
查看asm磁盘状态
SQL> select group_number g#,disk_number d#,name,mount_status,header_status,total_mb,free_mb,path from v$asm_disk; G# D# NAME MOUNT_S HEADER_STATU TOTAL_MB FREE_MB PATH ---------- ---------- ------------------------------ ------- ------------ ---------- ---------- -------------------- 0 0 CLOSED CANDIDATE 494 0 /dev/diskd_2 0 1 CLOSED CANDIDATE 564 0 /dev/diskd_4 0 2 CLOSED CANDIDATE 494 0 /dev/diskd_3 0 3 CLOSED CANDIDATE 494 0 /dev/diskd_1 1 6 DATA2_0006 MISSING CANDIDATE 0 0 1 7 DATA2_0007 MISSING CANDIDATE 0 0 1 5 DATA2_0005 MISSING CANDIDATE 0 0 1 4 DATA2_0004 MISSING CANDIDATE 0 0 1 1 DATA2_0001 CACHED MEMBER 494 476 /dev/diskc_2 1 3 DATA2_0003 CACHED MEMBER 564 549 /dev/diskc_4 1 0 DATA2_0000 CACHED MEMBER 494 475 /dev/diskc_1 1 2 DATA2_0002 CACHED MEMBER 494 477 /dev/diskc_3 12 rows selected. SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,MODE_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk; GROUP_NUMBER DISK_NUMBER MOUNT_S MODE_ST STATE REDUNDA FAILGROUP PATH ------------ ----------- ------- ------- -------- ------- ------------------------------ -------------------- 0 0 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_2 0 1 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_4 0 2 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_3 0 3 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_1 1 6 MISSING OFFLINE HUNG UNKNOWN 1 7 MISSING OFFLINE HUNG UNKNOWN 1 5 MISSING OFFLINE HUNG UNKNOWN 1 4 MISSING OFFLINE HUNG UNKNOWN 1 1 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_2 1 3 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_4 1 0 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_1 1 2 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_3 12 rows selected.
注意:此处asm将原来的asm disk标记为丢失,将新添加的disk标记为候选者
将新的失败组添加到磁盘组
SQL> alter diskgroup data2 add 2 failgroup fgdata2_d disk 3 '/dev/diskd_1', 4 '/dev/diskd_2', 5 '/dev/diskd_3', 6 '/dev/diskd_4'; Diskgroup altered.
查看asm磁盘情况
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,MODE_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk; GROUP_NUMBER DISK_NUMBER MOUNT_S MODE_ST STATE REDUNDA FAILGROUP PATH ------------ ----------- ------- ------- -------- ------- ------------------------------ -------------------- 1 0 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_1 1 2 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_3 1 1 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_2 1 3 CACHED ONLINE NORMAL UNKNOWN FGDATA2_C /dev/diskc_4 1 10 CACHED ONLINE NORMAL UNKNOWN FGDATA2_D /dev/diskd_3 1 11 CACHED ONLINE NORMAL UNKNOWN FGDATA2_D /dev/diskd_4 1 9 CACHED ONLINE NORMAL UNKNOWN FGDATA2_D /dev/diskd_2 1 8 CACHED ONLINE NORMAL UNKNOWN FGDATA2_D /dev/diskd_1 8 rows selected.
asm后台日志
Fri May 30 14:00:12 2014 SQL> alter diskgroup data2 add failgroup fgdata2_d disk '/dev/diskd_1', '/dev/diskd_2', '/dev/diskd_3', '/dev/diskd_4' Fri May 30 14:00:12 2014 NOTE: initializing header on grp 1 disk DATA2_0008 NOTE: initializing header on grp 1 disk DATA2_0009 NOTE: initializing header on grp 1 disk DATA2_0010 NOTE: initializing header on grp 1 disk DATA2_0011 NOTE: cache opening disk 8 of grp 1: DATA2_0008 path:/dev/diskd_1 NOTE: cache opening disk 9 of grp 1: DATA2_0009 path:/dev/diskd_2 NOTE: cache opening disk 10 of grp 1: DATA2_0010 path:/dev/diskd_3 NOTE: cache opening disk 11 of grp 1: DATA2_0011 path:/dev/diskd_4 NOTE: PST update: grp = 1 Fri May 30 14:00:12 2014 NOTE: group DATA2: relocated PST to: disk 0000 (PST copy 0) NOTE: group DATA2: relocated PST to: disk 0008 (PST copy 1) Fri May 30 14:00:12 2014 NOTE: requesting all-instance disk validation for group=1 Fri May 30 14:00:12 2014 NOTE: disk validation pending for group 1/0x2098c19b (DATA2) SUCCESS: validated disks for 1/0x2098c19b (DATA2) Fri May 30 14:00:13 2014 NOTE: PST update: grp = 1 NOTE: requesting all-instance PST refresh for group=1 Fri May 30 14:00:13 2014 NOTE: PST refresh pending for group 1/0x2098c19b (DATA2) SUCCESS: refreshed PST for 1/0x2098c19b (DATA2) Fri May 30 14:00:16 2014 NOTE: requesting all-instance PST refresh for group=1 Fri May 30 14:00:16 2014 NOTE: PST refresh pending for group 1/0x2098c19b (DATA2) SUCCESS: refreshed PST for 1/0x2098c19b (DATA2) Fri May 30 14:00:22 2014 NOTE: starting rebalance of group 1/0x2098c19b (DATA2) at power 1 Starting background process ARB0 ARB0 started with pid=15, OS id=5309 Fri May 30 14:00:23 2014 NOTE: assigning ARB0 to group 1/0x2098c19b (DATA2) Fri May 30 14:00:23 2014 NOTE: F1X0 copy 2 relocating from 4:2 to 11:2 NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 65534:4294967294 Fri May 30 14:00:36 2014 NOTE: stopping process ARB0 Fri May 30 14:00:38 2014 SUCCESS: rebalance completed for group 1/0x2098c19b (DATA2) Fri May 30 14:00:38 2014 SUCCESS: rebalance completed for group 1/0x2098c19b (DATA2) NOTE: PST update: grp = 1 SUCCESS: disk number 4 force dropped offline SUCCESS: disk number 5 force dropped offline SUCCESS: disk number 6 force dropped offline SUCCESS: disk number 7 force dropped offline NOTE: PST update: grp = 1
注意:此处,asm先将新的候选者初始化并cache到磁盘组,然后进行rebalance,最后将失效的磁盘删除掉。
下边再进行测试
SQL> insert into yallonking select * from yallonking; 37600 rows created. SQL> commit; Commit complete. SQL> alter system flush buffer_cache; System altered. SQL> select count(*) from yallonking; COUNT(*) ---------- 75200
至此,该情景测试成功!
=================================================
测试情景2:
有一个磁盘组data3,含有2个失败组(fgdata3_cd1,fgdata3_cd2),每个失败组的所有磁盘均匀的分布在2个磁盘控制器上(sdc,sdd)
当其中一个磁盘控制器失败,然后添加新的磁盘,重新添加失败组到该磁盘组,全过程均不影响数据库正常操作。
=================================================
虚拟机添加2块盘,各分4个区
[root@test1 ~]# fdisk -l /dev/sdc Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 63 506016 83 Linux /dev/sdc2 64 126 506047+ 83 Linux /dev/sdc3 127 189 506047+ 83 Linux /dev/sdc4 190 261 578340 83 Linux [root@test1 ~]# fdisk -l /dev/sdd Disk /dev/sdd: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdd1 1 63 506016 83 Linux /dev/sdd2 64 126 506047+ 83 Linux /dev/sdd3 127 189 506047+ 83 Linux /dev/sdd4 190 261 578340 83 Linux
绑定后如下:
[root@test1 rules.d]# ls -l /dev/disk{c,d}* brw-rw---- 1 oracle oinstall 8, 33 May 30 14:26 /dev/diskc_1 brw-rw---- 1 oracle oinstall 8, 34 May 30 14:26 /dev/diskc_2 brw-rw---- 1 oracle oinstall 8, 35 May 30 14:26 /dev/diskc_3 brw-rw---- 1 oracle oinstall 8, 36 May 30 14:26 /dev/diskc_4 brw-rw---- 1 oracle oinstall 8, 49 May 30 14:26 /dev/diskd_1 brw-rw---- 1 oracle oinstall 8, 50 May 30 14:26 /dev/diskd_2 brw-rw---- 1 oracle oinstall 8, 51 May 30 14:26 /dev/diskd_3 brw-rw---- 1 oracle oinstall 8, 52 May 30 14:26 /dev/diskd_4
创建符合情景2的磁盘组
SQL> drop diskgroup data2 including contents; Diskgroup dropped. SQL> create diskgroup data3 normal redundancy 2 failgroup fgdata3_cd1 disk 3 '/dev/diskc_1', 4 '/dev/diskc_3', 5 '/dev/diskd_1', 6 '/dev/diskd_3' 7 failgroup fgdata3_cd2 disk 8 '/dev/diskc_2', 9 '/dev/diskc_4', 10 '/dev/diskd_2', 11 '/dev/diskd_4'; Diskgroup created. SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,MODE_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk; GROUP_NUMBER DISK_NUMBER MOUNT_S MODE_ST STATE REDUNDA FAILGROUP PATH ------------ ----------- ------- ------- -------- ------- ------------------------------ -------------------- 1 0 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD1 /dev/diskc_1 1 1 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD1 /dev/diskc_3 1 4 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD2 /dev/diskc_2 1 5 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD2 /dev/diskc_4 1 7 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD2 /dev/diskd_4 1 3 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD1 /dev/diskd_3 1 6 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD2 /dev/diskd_2 1 2 CACHED ONLINE NORMAL UNKNOWN FGDATA3_CD1 /dev/diskd_1 8 rows selected.
创建测试数据
SQL> drop tablespace yallonking INCLUDING CONTENTS and datafiles; Tablespace dropped. SQL> create tablespace yallonking datafile '+data3/yallonking01.dbf' size 10m; Tablespace created. SQL> create table yallonking tablespace yallonking as select * from dba_objects; Table created. SQL> select TABLE_NAME,TABLESPACE_NAME from dba_tables where TABLE_NAME='YALLONKING'; TABLE_NAME TABLESPACE_NAME ------------------------------ ------------------------------ YALLONKING YALLONKING SQL> select count(*) from YALLONKING; COUNT(*) ---------- 9400
下边人为将sdd磁盘从虚拟机拿走模拟磁盘失败
[root@test1 ~]# fdisk -l Disk /dev/sda: 21.4 GB, 21474836480 bytes 255 heads, 63 sectors/track, 2610 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 64 514048+ 83 Linux /dev/sda2 65 586 4192965 82 Linux swap / Solaris /dev/sda3 587 2610 16257780 83 Linux Disk /dev/sdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 1305 10482381 83 Linux Disk /dev/sdc: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 63 506016 83 Linux /dev/sdc2 64 126 506047+ 83 Linux /dev/sdc3 127 189 506047+ 83 Linux /dev/sdc4 190 261 578340 83 Linux 注意:此处sdd已经失败
查看测试数据
SQL> select count(*) from YALLONKING; COUNT(*) ---------- 9400 SQL> alter system flush BUFFER_CACHE; System altered. SQL> select count(*) from YALLONKING; select count(*) from YALLONKING * ERROR at line 1: ORA-01115: IO error reading block from file 9 (block # 12) ORA-01110: data file 9: '+DATA3/yallonking01.dbf' ORA-15078: ASM diskgroup was forcibly dismounted
查看asm磁盘情况
SQL> select GROUP_NUMBER,NAME,STATE from v$asm_diskgroup; GROUP_NUMBER NAME STATE ------------ -------------------- ----------- 0 DATA3 DISMOUNTED SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,MODE_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk; GROUP_NUMBER DISK_NUMBER MOUNT_S MODE_ST STATE REDUNDA FAILGROUP PATH ------------ ----------- ------- ------- -------- ------- ------------------------------ -------------------- 0 0 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_4 0 1 CLOSED ONLINE NORMAL UNKNOWN /dev/diskc_1 0 2 CLOSED ONLINE NORMAL UNKNOWN /dev/diskc_3 0 7 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_2 0 4 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_3 0 5 CLOSED ONLINE NORMAL UNKNOWN /dev/diskc_2 0 6 CLOSED ONLINE NORMAL UNKNOWN /dev/diskc_4 0 3 CLOSED ONLINE NORMAL UNKNOWN /dev/diskd_1 8 rows selected.
此时asm后台日志如下:
Fri May 30 16:27:16 2014 WARNING: offlining disk 7.3915950641 (DATA3_0007) with mask 0x3 NOTE: PST update: grp = 1, dsk = 7, mode = 0x6 NOTE: PST update: grp = 1, dsk = 7, mode = 0x4 NOTE: cache closing disk 7 of grp 1: DATA3_0007 Fri May 30 16:30:03 2014 WARNING: PST-initiated drop disk 1(2211994373).7(3915950641) (DATA3_0007) Fri May 30 16:30:03 2014 Errors in file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm_lgwr_3434.trc: ORA-27091: unable to queue I/O ORA-27072: File I/O error Linux Error: 5: Input/output error Additional information: 4 Additional information: 4248 Additional information: -1 Fri May 30 16:30:03 2014 NOTE: cache initiating offline of disk 2 group 1 WARNING: offlining disk 2.3915950644 (DATA3_0002) with mask 0x3 NOTE: PST update: grp = 1, dsk = 2, mode = 0x6 Fri May 30 16:30:03 2014 ERROR: too many offline disks in PST (grp 1) Fri May 30 16:30:03 2014 NOTE: halting all I/Os to diskgroup DATA3 NOTE: active pin found: 0x0x243512f0 NOTE: active pin found: 0x0x2435135c Fri May 30 16:30:03 2014 ERROR: ORA-15130 signalled during reconfiguration of diskgroup DATA3 ERROR: PST-initiated disk drop failed Fri May 30 16:30:03 2014 Errors in file /u01/app/oracle/product/10.2.0/db_1/admin/+ASM/bdump/+asm_b000_5382.trc: ORA-15130: diskgroup "" is being dismounted Fri May 30 16:30:03 2014 ERROR: PST-initiated MANDATORY DISMOUNT of group DATA3 NOTE: cache dismounting group 1/0x83D85B05 (DATA3) SUCCESS: diskgroup DATA3 was dismounted
至此,情景2失败。
结论:
测试场景1:
有一个磁盘组data2,含有2个失败组(fgdata2_c,fgdata2_d),每个失败组的所有磁盘分布在单独的磁盘控制器上(sdc,sdd)
当其中一个磁盘控制器失败,然后添加新的磁盘,重新添加失败组到该磁盘组,全过程均不影响数据库正常操作。
测试场景2:
有一个磁盘组data3,含有2个失败组(fgdata3_cd1,fgdata3_cd2),每个失败组的所有磁盘均匀的分布在2个磁盘控制器上(sdc,sdd)
当其中一个磁盘控制器失败,然后添加新的磁盘,重新添加失败组到该磁盘组,全过程均不影响数据库正常操作。
情景1成功(可能会遭遇bug),情景2失败!