User

Dev Account

Member since
Total activity 136
Last activity
Following 0 users
Followed by 0 users
Votes 0
Subscriptions 68

Articles

Recent activity by Dev Account

  • GPU Jobs Crash Node Poweroff

    GPU Jobs Crash Node Poweroff Summary Systems in this cluster of tickets stay up at idle or light load, then freeze, reboot, hang, or power off once sustained GPU work starts. The pattern appears ac...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • GPU Hardware Failure

    Summary GPU hardware failure covers cards that disappear from nvidia-smi, throw ECC or Xid faults, lose display output, or enter ERR state under load. These symptoms often overlap with motherboard,...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • Fan Speed Issues

    Fan Speed Issues Summary Fan speed issues cover fans that run too fast at idle, fluctuate, fail to spin, report wrong telemetry or GPU fan ERR, or make abnormal noise; causes include fan modules, B...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • Defective Storage Drives

    Defective Storage Drives Summary Defective storage drives covers HDDs, SATA SSDs, and NVMe devices that fail outright, develop bad sectors, disappear from the system, throw I/O or SMART errors, or ...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • CryoSPARC Integration

    CryoSPARC Integration Summary CryoSPARC integration issues cover installation, upgrades, runtime startup failures, scratch-storage configuration, GPU/driver compatibility, and post-RMA reconfigurat...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • Credential Recovery

    Credential Recovery Summary Credential recovery covers requests for default usernames, passwords, BMC/IPMI access, root access, and reset guidance when customers lose the original system informatio...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • CPU Hardware Failure

    CPU Hardware Failure Summary CPU hardware failure covers processors that prevent POST, trigger machine-check or CATERR events, drop cores, or crash reliably under load. In this set, CPU faults ofte...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • CPU Cooler Failure

    CPU Cooler Failure Summary CPU cooler failure usually appears as rapid temperature climb at idle or in BIOS, thermal shutdowns, loud or seized fans, zero or misleading pump RPM, or systems that pow...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • BIOS Firmware Update

    BIOS Firmware Update Summary BIOS or BMC firmware updates appear in tickets where systems fail to POST, hang during bring-up, lose BMC access, reject a target image, or require a vendor-recommended...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes
  • BIOS BMC Issues

    Summary BIOS/BMC issues cover systems that fail to boot, lose device visibility, or break management access because firmware state, boot mode, controller settings, or BMC updates are incorrect or c...

    • Updated
    • 1 follower
    • 0 comments
    • 0 votes