A100

Dev Account
Dev Account
  • Updated

A100

Overview

The A100 appears in this ticket set as both a standalone NVIDIA GPU and as part of larger Exxact AI / HGX servers and workstations. The evidence includes A100 40GB, A100 80GB, multi-GPU A100 servers, and systems using NVLink, with tickets spanning hardware failure, power and cooling behavior, upgrade planning, and pre-sales sizing. (#10065, #14573, #27169, #29945, #35787)

Known Issues

  • gpu-hardware-failure , most common in this set. Repeated evidence includes uncorrectable ECC, memory faults, row-remapper / Xid events, and “fallen off the bus” failures that often led to RMA or replacement. (#11864, #14573, #22781, #31300, #40657, and 10+ more)
  • fan-speed-issues , recurring but often chassis-side rather than GPU-side. Customers sometimes attribute noise to the A100, but at least some tickets show the passive card was not the noise source and point instead to system cooling or fan behavior. (#20537, #25315, #7159)
  • power-supply-failure , important in dense A100 systems. Tickets include PSU isolation, breaker trips during gpu-burn, and questions about whether partial PSU population is sufficient for full 8x A100 load. (#35787, #7159, #34261)
  • firmware-driver-compatibility , moderate frequency. Several tickets revolve around performance gaps, OS-upgrade planning, cable or integration mismatches, and host-environment differences rather than confirmed GPU failure. (#10065, #17776, #27169, #41042)
  • no-trouble-found-rma , notable for borderline memory-error cases. Some A100s that customers believed were bad passed Exxact stress testing and were returned without replacement. (#41042, #26170, #38630)

Common Questions

  • How do I tell a bad A100 from a bad slot or host? Multiple tickets start with Exxact asking customers to move the card to another slot or another server before approving RMA, because that is the fastest way to separate GPU failure from host issues. (#11864, #16481, #22781, #31300)
  • What do ECC, row-remapper, or Xid memory errors usually mean? They are treated seriously and often lead to evidence collection or RMA review, but not every report reproduces as a hardware fault in-house. (#11864, #14573, #40657, #41042)
  • Is A100 fan noise normal? The A100 itself may be passive in some Exxact systems, so loud cooling behavior can come from chassis fans or PSU cooling rather than the card. (#20537, #25315, #7159)
  • Can I upgrade older A100 systems to newer Ubuntu and CUDA stacks? Yes, but the safe path may require staged LTS upgrades or a clean install, especially for older Ubuntu 18.04 deployments targeting newer CUDA and Ubuntu releases. (#27169)
  • What happens if I only have some PSUs active in a large A100 server? At least one ticket shows this as a real operational concern during gpu-burn testing, so PSU count and site power capacity matter for dense A100 configurations. (#35787, #7159)
  • How should A100 RMAs be validated? Strong cases usually include repeated reproduction, cross-system testing, ECC/Xid evidence, and the correct physical serial or Exxact purchase reference. (#14573, #22781, #31300, #40657, #4706)

Related Products

  • A100 40GB vs A100 80GB , both appear in this set and are often treated similarly operationally, but cable, memory-capacity, and replacement details differ. (#10065, #14573, #27169, #31300)
  • HGX A100 servers , common when the support issue is really about platform power, cooling, PSU redundancy, or dense multi-GPU behavior rather than a single card. (#35787, #7159, #34261)
  • RTX A6000 , appears as a comparison point in pre-sales sizing where customers ask whether A100 is the better fit for large-scale deep-learning workloads. (#29945)
  • Other datacenter / add-in GPUs such as L40-class or RTX professional parts can look similar in intake, but the ticket evidence here consistently treats A100 as the product family for compute-heavy HPC and AI deployments. (#29945, #40657)

Referenced by

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.