HBA Card Failure

Dev Account
Dev Account
  • Updated

HBA Card Failure

Summary

LSI or similar host bus adapter cards that stop enumerating in BIOS or OS, taking all attached drives offline. This is distinct from ordinary drive failure because the storage devices disappear behind the add-in controller rather than failing individually.

Frequency

2 tickets

Common Causes

  1. The HBA itself no longer enumerates in BIOS or lspci strongly suggesting card-level failure rather than a single-disk problem (Tickets #14712, #21276)
  2. Early part-identification confusion slowing the path when the system also contains other PCIe networking cards and the failing add-in card is not named clearly at first (Ticket #14712)
  3. Catastrophic controller failure where the card visibly smoked or burned and the system only booted again after physical removal (Ticket #40247)

Diagnostic Steps

  1. Confirm the missing device at both BIOS inventory and lspci level before assuming an OS-only driver issue (Tickets #14712, #21276)
  2. Separate the HBA from other installed PCIe adapters so the customer and agent are discussing the same card family (Ticket #14712)
  3. Use slot or host swap testing when practical to distinguish a dead HBA from a motherboard or slot issue, but do not over-insist when production storage impact makes disruptive testing costly (Ticket #14712)

Solutions

  1. Move to component RMA once the HBA is absent from BIOS and the storage impact is already severe especially when additional swap testing is operationally expensive (Tickets #14712, #21276)
  2. Document the exact controller family and downstream impact so replacement logistics are not delayed by PCIe-card ambiguity (Ticket #14712)
  3. Remove the physically damaged card immediately when there is evidence of smoke or burning before attempting any other diagnostic steps (Ticket #40247)

Edge Cases

  • Ceph/JBOD environments magnifying the impact because one failed HBA can strand many otherwise healthy drives at once (Ticket #14712)
  • RAID controller smoking where the card needed immediate removal and the system only recovered after the damaged controller was physically pulled (Ticket #40247)
  • Controller stopping enumeration across systems confirming card-level failure rather than a board or slot problem (Ticket #21276)

Related Issues

Referenced by

  • Dennis Cuenca — handled tickets on this issue (×1)
  • RMA Workflow — co-occurs with this issue (×3)
  • Nam Luong — handled tickets on this issue (×1)

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.