Fan Speed Issues

Dev Account
Dev Account
  • Updated

Fan Speed Issues

Summary

Fan speed issues cover fans that run too fast at idle, fluctuate, fail to spin, report wrong telemetry or GPU fan ERR, or make abnormal noise; causes include fan modules, BMC/BIOS control, chassis/fan-board paths, seating/contact issues, thermal-control mismatch, and normal-but-loud airflow.

Frequency

  • 192 tickets mention fan speed, fan noise, fan telemetry, or fan-control faults.

Common Causes

  1. BMC/BIOS fan-control or telemetry faults. Fans ran high, oscillated, reacted backwards to temperature, or appeared mislabeled because control firmware or sensor interpretation was wrong (#22547, #24162, #31541, #42596, #41986, …and 60+ more).
  2. Single failed or noisy fan module. Cases often narrowed to one chassis, CPU-adjacent, or liquid-cooler radiator fan with grinding, humming, non-spin, or repeated warnings (#21703, #32480, #38267, #41804, #43535, #44453, …and 50+ more).
  3. Chassis, fan-board, or harness faults. Some systems needed chassis/fan-board repair or broader depot work because fan swaps did not clear the path fault (#31541, #32471, #34438, #35680, #40699).
  4. Thermal-control mismatch after service/configuration change. Fan behavior sometimes changed after RMA, firmware, or platform updates while the system otherwise ran (#24162, #24164, #30085, #40557, #40811).
  5. Expected or unconfirmed acoustics. Some reports were normal high airflow, separate CPU/GPU fan zones, load-dependent behavior, or intermittent noise closed before root cause confirmation; one 4x RTX PRO 6000 Blackwell Max-Q TS4 server was described as expected to run around ~11,000 RPM at idle and nearly double under load (#11548, #17481, #23091, #37709, #43894, #44674).

Diagnostic Steps

  1. Classify the symptom. Separate constant high RPM, oscillation/reversed response, one non-spinning/noisy fan, telemetry-only alarms, and loud-but-normal cooling (#21703, #22547, #31541, #37709).
  2. Check management evidence. Review BMC/IPMI readings, fan mode/profile, BIOS/BMC versions, SEL/event logs, sensor-to-temperature consistency, and physical label mapping; for normal-but-loud servers, confirm whether IPMI Power Saving mode is available before deeper firmware changes (#22547, #24164, #31541, #39977, #42596, #44674).
  3. Isolate the physical path. Swap/reseat suspect fans, GPUs, and cables as appropriate, inspect fan boards/chassis harnesses, and check dust or foreign-object obstruction before assuming firmware or chassis failure (#32471, #34438, #38267, #39878, #43894, #44357).
  4. Reproduce under controlled load/temperature. Compare idle and load behavior when inverted or unstable fan response is suspected (#31541, #36410, #40557).

Solutions

  1. Replace failed fan/module. Clean fix for noise, non-spin, or single-fan alerts, including liquid-cooler radiator fans (#21703, #32480, #38267, #41804, #43535, #44453, …and 50+ more).
  2. Update/reset BMC or BIOS fan control. Firmware/settings remediation resolves false telemetry, unstable curves, or control issues when hardware is healthy (#22547, #24164, #31541, #37005, #41268).
  3. Repair chassis-side control hardware. Use chassis replacement, fan-board work, or depot repair when fan swaps fail (#31541, #32471, #34438, #35680, #40699).
  4. Validate before return. Burn-in and thermal checks confirm corrected fan behavior after reproduction/repair (#22547, #31541, #32471, #36410).
  5. Clarify expected behavior or monitor after reseat. Explain normal load acoustics, separate CPU/GPU fan banks, or platform-specific server airflow when no defect is found; IPMI Power Saving mode may reduce idle noise somewhat but may not make a large difference and fans can still scale under workload, while intermittent GPU fan ERR can be monitored after GPU reseat/swap when symptoms clear (#11548, #17481, #23091, #42596, #44357, #44674).

Edge Cases

  • Repeat post-RMA fan behavior can recur after prior repair (#24162, #24164, #40699).
  • Reversed control logic can make fans speed up as temperatures drop or otherwise react opposite expectation (#31541, #36410).
  • Fan tickets may co-occur with overheating, GPU instability, software corruption, or boot failures, complicating intake; one follow-up loud-fan/high-temperature case ultimately recovered after university IT reinstalled the OS rather than after confirmed cooling hardware repair (#30085, #32471, #40557, #41684, #43116).
  • Fan inoperability can be secondary during no-POST: a Tensor system's fans recovered after DIMM reseat/CMOS reset, but POST 00/no-video still required platform RMA (#43675).
  • IPMI/part-label ambiguity may look like mixed PSU/fan telemetry failure, while comparison testing shows normal CPU/GPU fan-zone behavior (#42596).
  • 0-RPM IPMI readings with physically spinning GPU fans can indicate telemetry/firmware interpretation rather than failed fan modules (#41986).
  • Intermittent GPU fan ERR can resolve after reseating/swapping GPU positions, so not every GPU fan alert requires immediate fan or GPU replacement when the symptom clears under follow-up testing (#44357).
  • False high CPU temperature telemetry can drive full-speed fans at idle; support requested IPMI sensor/SEL, GPU, and workload evidence before repair disposition (#39977).
  • Simple fan replacement or acoustic inspection can still slow on part ID, shipment timing, follow-up gaps, return logistics, or clarifying whether only the fan versus the full workstation must be returned (#21703, #32480, #41804, #43535, #43894, #44453).
  • A dead fan inside a PSU should be treated as a PSU module failure rather than a standalone chassis-fan replacement; customer photos of a non-spinning PSU fan and orange fault LED can be sufficient RMA evidence when logs are unavailable (#44958).
  • Reducing the number of active power supplies is not a meaningful acoustic fix for normal loud server airflow when system fans dominate the sound, and BIOS/firmware power-management changes such as ASPM should be avoided unless validated because they can cause GPUs to fall off the bus (#44674).

Related Issues

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.