HBA Card Failure
Summary
LSI or similar host bus adapter cards that stop enumerating in BIOS or OS, taking all attached drives offline. This is distinct from ordinary drive failure because the storage devices disappear behind the add-in controller rather than failing individually.
Frequency
2 tickets
Common Causes
-
The HBA itself no longer enumerates in BIOS or
lspcistrongly suggesting card-level failure rather than a single-disk problem (Tickets #14712, #21276) - Early part-identification confusion slowing the path when the system also contains other PCIe networking cards and the failing add-in card is not named clearly at first (Ticket #14712)
- Catastrophic controller failure where the card visibly smoked or burned and the system only booted again after physical removal (Ticket #40247)
Diagnostic Steps
-
Confirm the missing device at both BIOS inventory and
lspcilevel before assuming an OS-only driver issue (Tickets #14712, #21276) - Separate the HBA from other installed PCIe adapters so the customer and agent are discussing the same card family (Ticket #14712)
- Use slot or host swap testing when practical to distinguish a dead HBA from a motherboard or slot issue, but do not over-insist when production storage impact makes disruptive testing costly (Ticket #14712)
Solutions
- Move to component RMA once the HBA is absent from BIOS and the storage impact is already severe especially when additional swap testing is operationally expensive (Tickets #14712, #21276)
- Document the exact controller family and downstream impact so replacement logistics are not delayed by PCIe-card ambiguity (Ticket #14712)
- Remove the physically damaged card immediately when there is evidence of smoke or burning before attempting any other diagnostic steps (Ticket #40247)
Edge Cases
- Ceph/JBOD environments magnifying the impact because one failed HBA can strand many otherwise healthy drives at once (Ticket #14712)
- RAID controller smoking where the card needed immediate removal and the system only recovered after the damaged controller was physically pulled (Ticket #40247)
- Controller stopping enumeration across systems confirming card-level failure rather than a board or slot problem (Ticket #21276)
Related Issues
Referenced by
- Dennis Cuenca — handled tickets on this issue (×1)
- RMA Workflow — co-occurs with this issue (×3)
- Nam Luong — handled tickets on this issue (×1)
Comments
0 comments
Please sign in to leave a comment.