Power Supply Failure
Summary
These tickets cover systems or redundant PSUs that lose standby power, show amber or fault LEDs, pop, spark, or shut the system off under load. Some are clean single-PSU failures; others begin as suspected PSU faults and later prove to be cable, firmware, motherboard, cooling, or power-distribution problems.
Frequency
- 240 tickets
Common Causes
-
A PSU itself failed and the fault followed the unit
The clearest pattern is a bad module that still faults after slot, cable, or chassis swaps. Examples: #10000, #10784, #17087, #27403, #9685, and 40 more. -
Intermittent or load-triggered power loss
Some systems stayed up at idle but shut down during training, burn-in, or heavy GPU use, with PSU behavior only becoming obvious under sustained draw. Examples: #10301, #18948, #18951, #25196, #26472, and 17 more. -
Cabling, AC source, or redundant-PSU path issues
A meaningful subset involved bad power cords, slot-specific backplane issues, one PSU in a redundant pair, or confusion around wall power and feed redundancy. Examples: #10664, #11064, #12820, #13885, #15219, and 23 more. -
Burn, pop, or visible electrical damage
Several tickets report sparks, popping sounds, burnt smell, smoke, or charred hardware, making the PSU failure explicit and urgent. Examples: #12927, #17374, #18086, #20065, #23811, and 10 more. -
Suspected PSU failure that turned out to be broader platform trouble
Some cases later traced to motherboard, cooler, PDB, or firmware behavior rather than the PSU module alone. Examples: #12820, #18951, #19329, #24079, #42135, and 18 more.
Diagnostic Steps
-
Prove whether the fault follows the PSU
Swap the suspect unit into another slot or chassis, or insert a known-good PSU into the system, before assuming the server itself is bad. Representative tickets: #10000, #10784, #12820, #17087, #27403. -
Observe the PSU indicators and failure mode carefully
Note amber LEDs, no standby LED, fan behavior, alarm state, pop sounds, smell, or whether the system still runs on the remaining PSU. Representative tickets: #14762, #15219, #20374, #23811, #9685. -
Check system logs for power events, not just customer description
SEL, BMC, and PSU over-current or thermal messages helped separate PSU failure from thermal or motherboard events. Representative tickets: #13564, #18951, #24079, #39535, #39651. -
Rule out simple external power-path issues
Confirm outlet, PDU, cable, and feed behavior, especially in redundant or datacenter deployments. Representative tickets: #11064, #13592, #13812, #20720, #29782. -
Escalate quickly when there is electrical damage or repeated no-power behavior
Smoke, popping, burnt odor, or repeat no-power after swap-testing usually justified immediate RMA. Representative tickets: #12927, #17374, #18086, #20065, #37472.
Solutions
-
Replace the failed PSU module
The most reliable fix was component RMA or advance replacement once swap tests proved the fault followed the PSU. Examples: #10784, #17087, #27403, #36217, #9685, and 60 more. -
RMA the full system when the power fault was not isolated to one module
Full-system return was common when shutdowns persisted, multiple subsystems were implicated, or the backplane or board might be involved. Examples: #14762, #18951, #19329, #42135, #24079, and 35 more. -
Correct cabling, feed, or redundant-power configuration
Some tickets resolved after fixing external power-path problems rather than replacing hardware. Examples: #10664, #13592, #13812, #20720, #29782. -
Apply firmware or platform remediation when PSU hardware passed
A minority of cases stabilized after BIOS, BMC, or PSU firmware updates changed power behavior under load. Examples: #18951, #19685, #37170, #41046. -
Set realistic out-of-warranty replacement or quote paths
Older systems often ended with replacement-part sales guidance instead of warranty RMA. Examples: #15058, #19110, #38890, #42137.
Edge Cases
- Thermal trips can look like PSU failure: some systems shut off under load and only later proved to have CPU or chassis cooling faults rather than bad PSUs. See #18951, #24079, #34618.
- Motherboard or PDB faults can mimic dead PSUs: several cases started as bad-PSU reports but later implicated the board-level power path. See #12820, #19329, #35091, #39631.
- Known manufacturing or fleet defects: a few tickets were batch-style replacement requests tied to known PSU defects rather than new diagnosis. See #8210, #36217, #39510.
- System still runs on one PSU, masking urgency: redundant designs sometimes hid the failure until alarms or slot swaps made the bad module obvious. See #10000, #10784, #27403, #9685.
Related Issues
- GPU Jobs Crash Node Poweroff
- Power Distribution Board Failure
- Motherboard Hardware Failure
- Overheating
- System Boot Failure
- RMA Workflow
Referenced by
- Pws 2k20a 1r — product affected by this issue (×13)
- Sheng Ye — handled tickets on this issue (×3)
- Vws 135223847 — product affected by this issue (×3)
- Ian Dicarlo — handled tickets on this issue (×41)
- David Nguyen — handled tickets on this issue (×4)
- RMA Workflow — co-occurs with this issue (×184)
- David — handled tickets on this issue (×9)
- Jared Royster — handled tickets on this issue (×24)
- Jason Chen — handled tickets on this issue (×16)
- Shipping Damage — co-occurs with this issue (×3)
Comments
0 comments
Please sign in to leave a comment.