Case Study 2 – Replacement of Faulty HDD in NetApp FAS 2050 Storage

By: | Tags: | Comments: 0

Client:

Wipro Technologies

What customer wanted?

Customer Logged Case sharing the Auto_FS logs, High lighting “HDD Failed in NetApp 2050 Hard Disk ID < 0c.00.14 >.”

 Logs Analysed,

Disk Id 0c.00.14 Broken

Error : Tue Oct 18 09:28:22 IST [nec-netapp1: raid.assim.disk.badlabelversion:error]: Disk 0c.00.14 Shelf 0 Bay 14 [NETAPP   X290_S15K7560A15 NA00] S/N [3SL0PL9800009045W7Q5] has raid label with version (10), which is not within the currently supported range (5 – 9). Please contact NetApp Global Services.

Tue Oct 18 09:28:22 IST [nec-netapp1: raid.config.disk.bad.label.version:error]: Disk 0c.00.14 Shelf 0 Bay 14 [NETAPP   X290_S15K7560A15 NA00] S/N [3SL0PL9800009045W7Q5] has an unsupported label version.

 Plan Of action:

  • Confirm the HDD has failed via NetApp auto-support mail and note the disk ID and capacity.
  • Ensure the spare disk carrying for replacement is unowned and disk-label to Zero.
  • physical verification of the storage and identifying the failed HDD by amber notification
  • Ensure the status of the controller to auto-assign = false. To prevent the controller from auto assigning the disk.
  • login via console and check to which controller the failed HDD Id has been assigned to.
  • log into the controller to which failed HDD id has been assigned to
  • blink-on/off the led to confirm the particular HDD, (Not Necessary if there is an Amber indication on failed HDD)
  • remove the HDD and wait for ~60 sec and replace with the new HDD. (ensure the replaced HDD is same capacity, P/N as the replacement
  • check if the disk id is detected in storage.
  • assign the disk ID as it will be unowned
  • check if the disk id is taken into pool spare disks of the controller
  • if yes Call Closed.

Activities performed:

All Commands, procedure with example has been mentioned below.

  1. > disk show

“to check the status and to which controller the HDD is assigned”

 

eg: nec-netapp2> disk show

DISK       OWNER                  POOL   SERIAL NUMBER

0c.00.19     nec-netapp2(135068646)   Pool0  LXVHKJJM

0c.00.17     nec-netapp2(135068646)   Pool0  LXWU79EL

0c.00.1      nec-netapp2(135068646)   Pool0  JZVLNW2J

0c.00.15     nec-netapp2(135068646)   Pool0  LXVUPD5M

0c.00.5      nec-netapp2(135068646)   Pool0  CZY8JS2N

0c.00.7      nec-netapp2(135068646)   Pool0  CZY78MTN

0c.00.8      nec-netapp1(135068651)   Pool0  6SL3TAS30000N2401EX3

0c.00.18     nec-netapp2(135068646)   Pool0  3SL01T5V00009010Q4A2

0c.00.6      nec-netapp1(135068651)   Pool0  3SL05E1N0000901713GA

0c.00.14     nec-netapp2(135068646)   Pool0  3SL0PL9800009045W7Q5 à failed

0c.00.4      nec-netapp1(135068651)   Pool0  6SL3SCA90000N240MN5X

0c.00.9      nec-netapp1(135068651)   Pool0  3SL056G600009017TQX7

0c.00.13     nec-netapp1(135068651)   Pool0  3SL0PRSM00009045XQLL

0c.00.12     nec-netapp1(135068651)   Pool0  3SL01Y2X00009008GCFLN

0c.00.10     nec-netapp1(135068651)   Pool0  3SL02MLT00009008GCCJ

0c.00.11     nec-netapp1(135068651)   Pool0  3SL05E1Z00009018163B

0c.00.2      nec-netapp1(135068651)   Pool0  3SL05EQE00009018HT1H

0c.00.0      nec-netapp2(135068646)   Pool0  3SL0PAHH00009045G4ZW

0c.00.3      nec-netapp2(135068646)   Pool0  3SL053J300009017076C

0a.24        nec-netapp2(135068646)   Pool0  J80S2HEL

0a.27        nec-netapp1(135068651)   Pool0  J80PSE5L

0a.29        nec-netapp1(135068651)   Pool0  J80SNR5L

0a.17        nec-netapp1(135068651)   Pool0  J80SP1HL

0a.16        nec-netapp2(135068646)   Pool0  J80N202L

0a.19        nec-netapp1(135068651)   Pool0  J80SNZLL

0a.20        nec-netapp2(135068646)   Pool0  J80NWKVL

0a.26        nec-netapp2(135068646)   Pool0  J80S23ML

0a.28        nec-netapp2(135068646)   Pool0  J80STU5L

0a.23        nec-netapp1(135068651)   Pool0  J80S1Y7L

0a.22        nec-netapp2(135068646)   Pool0  J80S240L

0a.25        nec-netapp1(135068651)   Pool0  J80PSGPL

0a.21        nec-netapp1(135068651)   Pool0  J80RZ4EL

0a.18        nec-netapp2(135068646)   Pool0  J80SMTUL

0c.00.16     nec-netapp1(135068651)   Pool0  6SL75TX40000N4102TAX

 

2) >aggr status -f

“To check if any broken HDD is present”

 

eg: nec-netapp2> aggr status -f

 

Broken disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

label version   0c.00.14        0c    0   14  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

 

3) >disk assign 0c.00.14 -s unowned -f

“to Un-own the failed HDD”

 

4) >aggr status -f

“To check if any broken HDD is present after unowned the ID”

 

eg: nec-netapp2> aggr status -f

 

Broken disks (empty)

 

5) >aggr status -r

“to check raid disk and parity disk”

 

eg: nec-netapp2> aggr status -r

Aggregate aggr0 (online, raid_dp) (block checksums)

Plex /aggr0/plex0 (online, normal, active, pool0)

RAID group /aggr0/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

dparity   0c.00.5         0c    0   5   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

parity    0c.00.3         0c    0   3   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.19        0c    0   19  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.0         0c    0   0   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.17        0c    0   17  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.7         0c    0   7   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.1         0c    0   1   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.18        0c    0   18  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.15        0c    0   15  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

 

Aggregate aggr1 (online, raid4) (block checksums)

Plex /aggr1/plex0 (online, normal, active, pool0)

RAID group /aggr1/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

parity    0a.16           0a    1   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.18           0a    1   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.20           0a    1   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.22           0a    1   6   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.24           0a    1   8   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.26           0a    1   10  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Pool1 spare disks (empty)

 

Pool0 spare disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0a.28           0a    1   12  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Partner disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

partner         0c.00.13        0c    0   13  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.16        0c    0   16  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.8         0c    0   8   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.4         0c    0   4   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.10        0c    0   10  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.9         0c    0   9   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.2         0c    0   2   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.11        0c    0   11  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.6         0c    0   6   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.12        0c    0   12  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0a.25           0a    1   9   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.27           0a    1   11  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.29           0a    1   13  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.23           0a    1   7   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.21           0a    1   5   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.19           0a    1   3   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.17           0a    1   1   FC:A   0  ATA   7200 0/0               847827/1736350304

 

REPLACE HDD and continue

 

6) >disk show -n

 

eg : nec-netapp2> disk show -n

DISK       OWNER                  POOL   SERIAL NUMBER

0c.00.14     Not Owned            NONE   3SL0PRVG00009045XPBG

 

7) >disk assign 0c.00.14

“ to assign the HDD to the controller”

 

8) >disk show -n

 

eg: nec-netapp2> disk show -n

disk show: No disks match option -n.

 

9) >disk show

eg: nec-netapp2> disk show

DISK       OWNER                   POOL   SERIAL NUMBER

0c.00.19     nec-netapp2(135068646)   Pool0  LXVHKJJM

0c.00.17     nec-netapp2(135068646)   Pool0  LXWU79EL

0c.00.1      nec-netapp2(135068646)   Pool0  JZVLNW2J

0c.00.15     nec-netapp2(135068646)   Pool0  LXVUPD5M

0c.00.5      nec-netapp2(135068646)   Pool0  CZY8JS2N

0c.00.7      nec-netapp2(135068646)   Pool0  CZY78MTN

0c.00.8      nec-netapp1(135068651)   Pool0  6SL3TAS30000N2401EX3

0c.00.18     nec-netapp2(135068646)   Pool0  3SL01T5V00009010Q4A2

0c.00.6      nec-netapp1(135068651)   Pool0  3SL05E1N0000901713GA

0c.00.4      nec-netapp1(135068651)   Pool0  6SL3SCA90000N240MN5X

0c.00.9      nec-netapp1(135068651)   Pool0  3SL056G600009017TQX7

0c.00.13     nec-netapp1(135068651)   Pool0  3SL0PRSM00009045XQLL

0c.00.12     nec-netapp1(135068651)   Pool0  3SL01Y2X00009008GCFL

0c.00.10     nec-netapp1(135068651)   Pool0  3SL02MLT00009008GCCJ

0c.00.11     nec-netapp1(135068651)   Pool0  3SL05E1Z00009018163B

0c.00.2      nec-netapp1(135068651)   Pool0  3SL05EQE00009018HT1H

0c.00.0      nec-netapp2(135068646)   Pool0  3SL0PAHH00009045G4ZW

0c.00.3      nec-netapp2(135068646)   Pool0  3SL053J300009017076C

0c.00.14     nec-netapp2(135068646)   Pool0  3SL0PRVG00009045XPBG

0a.24        nec-netapp2(135068646)   Pool0  J80S2HEL

0a.27        nec-netapp1(135068651)   Pool0  J80PSE5L

0a.29        nec-netapp1(135068651)   Pool0  J80SNR5L

0a.17        nec-netapp1(135068651)   Pool0  J80SP1HL

0a.16        nec-netapp2(135068646)   Pool0  J80N202L

0a.19        nec-netapp1(135068651)   Pool0  J80SNZLL

0a.20        nec-netapp2(135068646)   Pool0  J80NWKVL

0a.26        nec-netapp2(135068646)   Pool0  J80S23ML

0a.28        nec-netapp2(135068646)   Pool0  J80STU5L

0a.23        nec-netapp1(135068651)   Pool0  J80S1Y7L

0a.22        nec-netapp2(135068646)   Pool0  J80S240L

0a.25        nec-netapp1(135068651)   Pool0  J80PSGPL

0a.21        nec-netapp1(135068651)   Pool0  J80RZ4EL

0a.18        nec-netapp2(135068646)   Pool0  J80SMTUL

0c.00.16     nec-netapp1(135068651)   Pool0  6SL75TX40000N4102TAX

 

Above highlighted, The Disk Is assigned to the Disk ID provided at Step 7

 

10) aggr status -r

nec-netapp2> aggr status -r

Aggregate aggr0 (online, raid_dp) (block checksums)

Plex /aggr0/plex0 (online, normal, active, pool0)

RAID group /aggr0/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

dparity   0c.00.5         0c    0   5   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

parity    0c.00.3         0c    0   3   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.19        0c    0   19  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.0         0c    0   0   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.17        0c    0   17  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.7         0c    0   7   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.1         0c    0   1   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.18        0c    0   18  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.15        0c    0   15  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

 

Aggregate aggr1 (online, raid4) (block checksums)

Plex /aggr1/plex0 (online, normal, active, pool0)

RAID group /aggr1/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

parity    0a.16           0a    1   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.18           0a    1   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.20           0a    1   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.22           0a    1   6   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.24           0a    1   8   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.26           0a    1   10  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Pool1 spare disks (empty)

 

Pool0 spare disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0a.28           0a    1   12  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

spare           0c.00.14        0c    0   14  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

 

Partner disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

partner         0c.00.13        0c    0   13  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.16        0c    0   16  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.8         0c    0   8   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.4         0c    0   4   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.10        0c    0   10  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.9         0c    0   9   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.2         0c    0   2   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.11        0c    0   11  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.6         0c    0   6   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.12        0c    0   12  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0a.25           0a    1   9   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.27           0a    1   11  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.29           0a    1   13  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.23           0a    1   7   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.21           0a    1   5   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.19           0a    1   3   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.17           0a    1   1   FC:A   0  ATA   7200 0/0               847827/1736350304

nec-netapp2>

 

Conclusion:

Above highlighted, The Disk Is assigned to the Disk ID and is in spare, Call Closed.