Dev Account
Articles
Recent activity by Dev Account
-
GPU Jobs Crash Node Poweroff
GPU Jobs Crash Node Poweroff Summary Systems in this cluster of tickets stay up at idle or light load, then freeze, reboot, hang, or power off once sustained GPU work starts. The pattern appears ac...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
GPU Hardware Failure
Summary GPU hardware failure covers cards that disappear from nvidia-smi, throw ECC or Xid faults, lose display output, or enter ERR state under load. These symptoms often overlap with motherboard,...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
Fan Speed Issues
Fan Speed Issues Summary Fan speed issues cover fans that run too fast at idle, fluctuate, fail to spin, report wrong telemetry or GPU fan ERR, or make abnormal noise; causes include fan modules, B...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
Defective Storage Drives
Defective Storage Drives Summary Defective storage drives covers HDDs, SATA SSDs, and NVMe devices that fail outright, develop bad sectors, disappear from the system, throw I/O or SMART errors, or ...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
CryoSPARC Integration
CryoSPARC Integration Summary CryoSPARC integration issues cover installation, upgrades, runtime startup failures, scratch-storage configuration, GPU/driver compatibility, and post-RMA reconfigurat...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
Credential Recovery
Credential Recovery Summary Credential recovery covers requests for default usernames, passwords, BMC/IPMI access, root access, and reset guidance when customers lose the original system informatio...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
CPU Hardware Failure
CPU Hardware Failure Summary CPU hardware failure covers processors that prevent POST, trigger machine-check or CATERR events, drop cores, or crash reliably under load. In this set, CPU faults ofte...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
CPU Cooler Failure
CPU Cooler Failure Summary CPU cooler failure usually appears as rapid temperature climb at idle or in BIOS, thermal shutdowns, loud or seized fans, zero or misleading pump RPM, or systems that pow...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
BIOS Firmware Update
BIOS Firmware Update Summary BIOS or BMC firmware updates appear in tickets where systems fail to POST, hang during bring-up, lose BMC access, reject a target image, or require a vendor-recommended...
- Updated
- 1 follower
- 0 comments
- 0 votes
-
BIOS BMC Issues
Summary BIOS/BMC issues cover systems that fail to boot, lose device visibility, or break management access because firmware state, boot mode, controller settings, or BMC updates are incorrect or c...
- Updated
- 1 follower
- 0 comments
- 0 votes