Case Study 2 – Replacement of Faulty HDD in NetApp FAS 2050 Storage

Case Study 2 – Replacement of Faulty HDD in NetApp FAS 2050 Storage

Client:

Wipro Technologies

What customer wanted?

Customer Logged Case sharing the Auto_FS logs, High lighting “HDD Failed in NetApp 2050 Hard Disk ID < 0c.00.14 >.”

 Logs Analysed,

Disk Id 0c.00.14 Broken

Error : Tue Oct 18 09:28:22 IST [nec-netapp1: raid.assim.disk.badlabelversion:error]: Disk 0c.00.14 Shelf 0 Bay 14 [NETAPP   X290_S15K7560A15 NA00] S/N [3SL0PL9800009045W7Q5] has raid label with version (10), which is not within the currently supported range (5 – 9). Please contact NetApp Global Services.

Tue Oct 18 09:28:22 IST [nec-netapp1: raid.config.disk.bad.label.version:error]: Disk 0c.00.14 Shelf 0 Bay 14 [NETAPP   X290_S15K7560A15 NA00] S/N [3SL0PL9800009045W7Q5] has an unsupported label version.

 Plan Of action:

  • Confirm the HDD has failed via NetApp auto-support mail and note the disk ID and capacity.
  • Ensure the spare disk carrying for replacement is unowned and disk-label to Zero.
  • physical verification of the storage and identifying the failed HDD by amber notification
  • Ensure the status of the controller to auto-assign = false. To prevent the controller from auto assigning the disk.
  • login via console and check to which controller the failed HDD Id has been assigned to.
  • log into the controller to which failed HDD id has been assigned to
  • blink-on/off the led to confirm the particular HDD, (Not Necessary if there is an Amber indication on failed HDD)
  • remove the HDD and wait for ~60 sec and replace with the new HDD. (ensure the replaced HDD is same capacity, P/N as the replacement
  • check if the disk id is detected in storage.
  • assign the disk ID as it will be unowned
  • check if the disk id is taken into pool spare disks of the controller
  • if yes Call Closed.

Activities performed:

All Commands, procedure with example has been mentioned below.

  1. > disk show

“to check the status and to which controller the HDD is assigned”

 

eg: nec-netapp2> disk show

DISK       OWNER                  POOL   SERIAL NUMBER

0c.00.19     nec-netapp2(135068646)   Pool0  LXVHKJJM

0c.00.17     nec-netapp2(135068646)   Pool0  LXWU79EL

0c.00.1      nec-netapp2(135068646)   Pool0  JZVLNW2J

0c.00.15     nec-netapp2(135068646)   Pool0  LXVUPD5M

0c.00.5      nec-netapp2(135068646)   Pool0  CZY8JS2N

0c.00.7      nec-netapp2(135068646)   Pool0  CZY78MTN

0c.00.8      nec-netapp1(135068651)   Pool0  6SL3TAS30000N2401EX3

0c.00.18     nec-netapp2(135068646)   Pool0  3SL01T5V00009010Q4A2

0c.00.6      nec-netapp1(135068651)   Pool0  3SL05E1N0000901713GA

0c.00.14     nec-netapp2(135068646)   Pool0  3SL0PL9800009045W7Q5 à failed

0c.00.4      nec-netapp1(135068651)   Pool0  6SL3SCA90000N240MN5X

0c.00.9      nec-netapp1(135068651)   Pool0  3SL056G600009017TQX7

0c.00.13     nec-netapp1(135068651)   Pool0  3SL0PRSM00009045XQLL

0c.00.12     nec-netapp1(135068651)   Pool0  3SL01Y2X00009008GCFLN

0c.00.10     nec-netapp1(135068651)   Pool0  3SL02MLT00009008GCCJ

0c.00.11     nec-netapp1(135068651)   Pool0  3SL05E1Z00009018163B

0c.00.2      nec-netapp1(135068651)   Pool0  3SL05EQE00009018HT1H

0c.00.0      nec-netapp2(135068646)   Pool0  3SL0PAHH00009045G4ZW

0c.00.3      nec-netapp2(135068646)   Pool0  3SL053J300009017076C

0a.24        nec-netapp2(135068646)   Pool0  J80S2HEL

0a.27        nec-netapp1(135068651)   Pool0  J80PSE5L

0a.29        nec-netapp1(135068651)   Pool0  J80SNR5L

0a.17        nec-netapp1(135068651)   Pool0  J80SP1HL

0a.16        nec-netapp2(135068646)   Pool0  J80N202L

0a.19        nec-netapp1(135068651)   Pool0  J80SNZLL

0a.20        nec-netapp2(135068646)   Pool0  J80NWKVL

0a.26        nec-netapp2(135068646)   Pool0  J80S23ML

0a.28        nec-netapp2(135068646)   Pool0  J80STU5L

0a.23        nec-netapp1(135068651)   Pool0  J80S1Y7L

0a.22        nec-netapp2(135068646)   Pool0  J80S240L

0a.25        nec-netapp1(135068651)   Pool0  J80PSGPL

0a.21        nec-netapp1(135068651)   Pool0  J80RZ4EL

0a.18        nec-netapp2(135068646)   Pool0  J80SMTUL

0c.00.16     nec-netapp1(135068651)   Pool0  6SL75TX40000N4102TAX

 

2) >aggr status -f

“To check if any broken HDD is present”

 

eg: nec-netapp2> aggr status -f

 

Broken disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

label version   0c.00.14        0c    0   14  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

 

3) >disk assign 0c.00.14 -s unowned -f

“to Un-own the failed HDD”

 

4) >aggr status -f

“To check if any broken HDD is present after unowned the ID”

 

eg: nec-netapp2> aggr status -f

 

Broken disks (empty)

 

5) >aggr status -r

“to check raid disk and parity disk”

 

eg: nec-netapp2> aggr status -r

Aggregate aggr0 (online, raid_dp) (block checksums)

Plex /aggr0/plex0 (online, normal, active, pool0)

RAID group /aggr0/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

dparity   0c.00.5         0c    0   5   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

parity    0c.00.3         0c    0   3   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.19        0c    0   19  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.0         0c    0   0   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.17        0c    0   17  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.7         0c    0   7   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.1         0c    0   1   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.18        0c    0   18  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.15        0c    0   15  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

 

Aggregate aggr1 (online, raid4) (block checksums)

Plex /aggr1/plex0 (online, normal, active, pool0)

RAID group /aggr1/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

parity    0a.16           0a    1   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.18           0a    1   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.20           0a    1   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.22           0a    1   6   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.24           0a    1   8   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.26           0a    1   10  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Pool1 spare disks (empty)

 

Pool0 spare disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0a.28           0a    1   12  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Partner disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

partner         0c.00.13        0c    0   13  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.16        0c    0   16  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.8         0c    0   8   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.4         0c    0   4   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.10        0c    0   10  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.9         0c    0   9   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.2         0c    0   2   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.11        0c    0   11  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.6         0c    0   6   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.12        0c    0   12  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0a.25           0a    1   9   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.27           0a    1   11  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.29           0a    1   13  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.23           0a    1   7   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.21           0a    1   5   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.19           0a    1   3   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.17           0a    1   1   FC:A   0  ATA   7200 0/0               847827/1736350304

 

REPLACE HDD and continue

 

6) >disk show -n

 

eg : nec-netapp2> disk show -n

DISK       OWNER                  POOL   SERIAL NUMBER

0c.00.14     Not Owned            NONE   3SL0PRVG00009045XPBG

 

7) >disk assign 0c.00.14

“ to assign the HDD to the controller”

 

8) >disk show -n

 

eg: nec-netapp2> disk show -n

disk show: No disks match option -n.

 

9) >disk show

eg: nec-netapp2> disk show

DISK       OWNER                   POOL   SERIAL NUMBER

0c.00.19     nec-netapp2(135068646)   Pool0  LXVHKJJM

0c.00.17     nec-netapp2(135068646)   Pool0  LXWU79EL

0c.00.1      nec-netapp2(135068646)   Pool0  JZVLNW2J

0c.00.15     nec-netapp2(135068646)   Pool0  LXVUPD5M

0c.00.5      nec-netapp2(135068646)   Pool0  CZY8JS2N

0c.00.7      nec-netapp2(135068646)   Pool0  CZY78MTN

0c.00.8      nec-netapp1(135068651)   Pool0  6SL3TAS30000N2401EX3

0c.00.18     nec-netapp2(135068646)   Pool0  3SL01T5V00009010Q4A2

0c.00.6      nec-netapp1(135068651)   Pool0  3SL05E1N0000901713GA

0c.00.4      nec-netapp1(135068651)   Pool0  6SL3SCA90000N240MN5X

0c.00.9      nec-netapp1(135068651)   Pool0  3SL056G600009017TQX7

0c.00.13     nec-netapp1(135068651)   Pool0  3SL0PRSM00009045XQLL

0c.00.12     nec-netapp1(135068651)   Pool0  3SL01Y2X00009008GCFL

0c.00.10     nec-netapp1(135068651)   Pool0  3SL02MLT00009008GCCJ

0c.00.11     nec-netapp1(135068651)   Pool0  3SL05E1Z00009018163B

0c.00.2      nec-netapp1(135068651)   Pool0  3SL05EQE00009018HT1H

0c.00.0      nec-netapp2(135068646)   Pool0  3SL0PAHH00009045G4ZW

0c.00.3      nec-netapp2(135068646)   Pool0  3SL053J300009017076C

0c.00.14     nec-netapp2(135068646)   Pool0  3SL0PRVG00009045XPBG

0a.24        nec-netapp2(135068646)   Pool0  J80S2HEL

0a.27        nec-netapp1(135068651)   Pool0  J80PSE5L

0a.29        nec-netapp1(135068651)   Pool0  J80SNR5L

0a.17        nec-netapp1(135068651)   Pool0  J80SP1HL

0a.16        nec-netapp2(135068646)   Pool0  J80N202L

0a.19        nec-netapp1(135068651)   Pool0  J80SNZLL

0a.20        nec-netapp2(135068646)   Pool0  J80NWKVL

0a.26        nec-netapp2(135068646)   Pool0  J80S23ML

0a.28        nec-netapp2(135068646)   Pool0  J80STU5L

0a.23        nec-netapp1(135068651)   Pool0  J80S1Y7L

0a.22        nec-netapp2(135068646)   Pool0  J80S240L

0a.25        nec-netapp1(135068651)   Pool0  J80PSGPL

0a.21        nec-netapp1(135068651)   Pool0  J80RZ4EL

0a.18        nec-netapp2(135068646)   Pool0  J80SMTUL

0c.00.16     nec-netapp1(135068651)   Pool0  6SL75TX40000N4102TAX

 

Above highlighted, The Disk Is assigned to the Disk ID provided at Step 7

 

10) aggr status -r

nec-netapp2> aggr status -r

Aggregate aggr0 (online, raid_dp) (block checksums)

Plex /aggr0/plex0 (online, normal, active, pool0)

RAID group /aggr0/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

dparity   0c.00.5         0c    0   5   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

parity    0c.00.3         0c    0   3   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.19        0c    0   19  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.0         0c    0   0   SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.17        0c    0   17  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.7         0c    0   7   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.1         0c    0   1   SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

data      0c.00.18        0c    0   18  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

data      0c.00.15        0c    0   15  SA:A   0  SAS  15000 560000/1146880000 560879/1148681096

 

Aggregate aggr1 (online, raid4) (block checksums)

Plex /aggr1/plex0 (online, normal, active, pool0)

RAID group /aggr1/plex0/rg0 (normal)

 

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

parity    0a.16           0a    1   0   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.18           0a    1   2   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.20           0a    1   4   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.22           0a    1   6   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.24           0a    1   8   FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

data      0a.26           0a    1   10  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

 

Pool1 spare disks (empty)

 

Pool0 spare disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

Spare disks for block or zoned checksum traditional volumes or aggregates

spare           0a.28           0a    1   12  FC:A   0  ATA   7200 847555/1735794176 847827/1736350304

spare           0c.00.14        0c    0   14  SA:A   0  SAS  15000 560000/1146880000 560208/1147307688

 

Partner disks

 

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)

partner         0c.00.13        0c    0   13  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.16        0c    0   16  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.8         0c    0   8   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.4         0c    0   4   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.10        0c    0   10  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.9         0c    0   9   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.2         0c    0   2   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.11        0c    0   11  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.6         0c    0   6   SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0c.00.12        0c    0   12  SA:A   0  SAS  15000 0/0               560208/1147307688

partner         0a.25           0a    1   9   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.27           0a    1   11  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.29           0a    1   13  FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.23           0a    1   7   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.21           0a    1   5   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.19           0a    1   3   FC:A   0  ATA   7200 0/0               847827/1736350304

partner         0a.17           0a    1   1   FC:A   0  ATA   7200 0/0               847827/1736350304

nec-netapp2>

 

Conclusion:

Above highlighted, The Disk Is assigned to the Disk ID and is in spare, Call Closed.

Navigator System business model is to offer a comprehensive portfolio of hardware and software products, services and solutions for the most diverse platforms and multi-brand environments. Call +91 984545 1006 or Email sales@navigatorsystem.com

NAVIGATOR SYSTEMS PRIVATE LIMITED
NO. 37/27, MEANEE AVENUE, TANK ROAD CROSS
OPP LAKE SIDE HOSPITAL
BANGALORE – 560042

Phone: +91 080 25307537/ 38/ 49
Call: +91 9986288377
Email: sales@navigatorsystem.com

Case Study 1 – Replacement of SPS in EMC CX4-120 Storage

Case Study 1 – Replacement of SPS in EMC CX4-120 Storage

Client: – Net4India

Location: – Chennai

The Challenge: – EMC Storage shows alert on Standby Power Supply (SPS).

Scenario: – Enclosure SPE SPS A Failure

Error: – SPS A: (1.2KW) FLT

How To Replace an EMC CX3 / CX4 SPS:-

Replacing a faulty or dead EMC CX4-120 storage SPS is crucial to keeping your EMC CX4 series storage system up and running. A faulty SPS can cause all kinds of performance issues and prevent your system from working all together.

Plan of Action:-

  1. Asked customer to send the latest SP logs for analysis.
  2. Once received the logs from customer we started analyzing the Logs.
  3. Found SPS A showing faulty & SPS B cabling unknown state.
  4. Physically verified the storage and identifying the failed SPS by Amber notification

SPS B:  (1.2KW) OK but showing Unknown Configuration State.

Solution: – Need to Replace SPS A.

After Replacing SPS A, Customer found that the EMC storage resulted in slow performance, and we had a discussion with the customer & asked him to share the latest SP logs after replacing the SPS A Power Supply. We found that the SPS LED glowed Green as per the Onsite Engineer and once we analyzed the sent Logs & found both SPS cabling showing unknown State & Write Cache Disabled.

Scenario: – Enclosure SPE SPS A Cabling State: Cabling Status is unknown

Enclosure SPE SPS B Cabling State: Cabling Status is unknown

Error: – Write Cache Disabled

EMC CX3 / CX4 SPS

 

This issue has been addressed in the initial release of FLARE OS, it is required to Reboot the SPS on the error reporting side & before restarting the SPS, need to check LUN trespassed, multipathing, Host connectivity & any HDD predictive failure.

EMC CX3 / CX4 SPS

Taken the remote session from customer & checked the storage, found 4 path connected on host, some LUN connected to Celerra NAS storage this LUN not confirm Multipath, asked to customer we required only SPS restarts & IOPS stop.

 

EMC CX3 / CX4 SPS

 

EMC CX3 / CX4 SPS

 

Customer asked some queries :-

Please check below answer for their questions…

  1. How can we stops IOPS from Customers END?
  • Ask customers not to use the mapped LUNS till the reboot completes.

 

  1. How you sure celerra (NFS) not in multipath?
  • In CX Storage level found 4 paths assigned to Celerra NAS box which we analyzed in the SPS logs
  1. Rebooting the SPA and SPB, if problem not resolved then what your next plan?
  • After rebooting it will definitely works, recommended by EMC.
  1. We are rebooting only the controllers one by one not Power Cycling the Entire storage.

Solution: – Required SPS Restart…

POA: – 1)    Check Both SPS Ping’s,

2)    Before Start Activity collect fresh SP logs,

3)    Confirm Multipathing configure on Host side

4)    Check Host connectivity

5)    Customer not confirmed Multipathing then Confirm IOPS stops from User end.

6)    Restart SP A and after SP A Comes up online then restart SP B.

8)    After the restart both SPS checked & the Write Cache Enabled.

After confirm above some points from customer taken remote session & restart SP A,

After the SP A Reboot found Write Cache enable on both SPS & Storage working fine.

cs6

 

 

 

 

Navigator System business model is to offer a comprehensive portfolio of hardware and software products, services and solutions for the most diverse platforms and multi-brand environments. Call +91 984545 1006 or Email sales@navigatorsystem.com

NAVIGATOR SYSTEMS PRIVATE LIMITED
NO. 37/27, MEANEE AVENUE, TANK ROAD CROSS
OPP LAKE SIDE HOSPITAL
BANGALORE – 560042

Phone: +91 080 25307537/ 38/ 49
Call: +91 9986288377
Email: sales@navigatorsystem.com