GPU Troubleshooting Guide: When GPU is Not Visible with LSPCI

Alexander Hill
Alexander Hill
  • Updated

Purpose

This document provides systematic troubleshooting procedures for scenarios where NVIDIA GPUs are not visible via NVIDIA-SMI commands in Linux systems. It serves as a reference guide for those who need to diagnose and resolve GPU detection issues.

Pre-requisites

Before diving into troubleshooting, ensure you have:

  • Verified your System BIOS is the latest version
  • Installed the latest NVIDIA driver for your GPU
  • Confirmed your PSU provides enough total wattage AND amperage per rail

Setting Persistent Mode

To permanently set NVIDIA persistent mode:

  1. Create a systemd service file:
vi /etc/systemd/system/nvidia-persistenced.service
  1. Add the following content:
[Unit] 
Description=Enable NVIDIA Persistence Mode
After=default.target

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm ENABLED
RemainAfterExit=true

[Install] WantedBy=multi-user.target
 
  1. Enable and start the service:
systemctl restart nvidia-persistenced.service
  1. Verify that persistence mode is enabled:
nvidia-smi -q | grep "Persistence Mode"
 

Troubleshooting Steps

Step 1: Verify GPU Visibility

Check if the GPU is visible via PCI:

lspci -tvnn | grep -i nvidia
 

Step 2: Check System Logs

If the GPU is not visible, check logs for any GPU-related errors:

dmesg | egrep -i 'err|nvrm|xid'
ipmitool sel list
 

Step 3: System Reboot

Reboot the system to see if PCI devices reinitialize properly.

If this resolves the issue:

  • Verify you're using the latest driver version
  • Consider adding kernel parameters to GRUB_CMDLINE_LINUX:
    example:
    GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=28070ec6-f366-44fc-90c9-1ec719f63153 rhgb quiet pcie_aspm=off acpi=off"
     
  • Run update-grub after making changes

RHEL/CentOS/Rocky/Fedora

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Ubuntu/Debian

sudo update-grub

Reboot your system

sudo reboot

 

Step 4: Hardware Check

If issues persist after reboot:

  • Physically reseat the GPUs and their power cables
  • Ensure all connections are secure

If reseating fixes the issue:

  • Update to the latest driver
  • Add the recommended kernel parameters as in Step 3

Step 5: Isolate GPU Issues

Identify Specific GPU Issues

If only certain GPUs aren't showing:

  1. Map all visible GPUs and their locations:
nvidia-smi --query-gpu=name,index,serial,pci.bus_id --format=csv
  1. Compare this output with your expected GPU configuration to identify which specific GPUs are missing
  2. Note the PCI bus ID of any missing or problematic GPUs for further troubleshooting

Isolate Hardware Issues

If problems continue:

  • Remove all GPUs and boot with only one GPU
  • If this works, reinstall the latest driver and add kernel parameters
  • Add GPUs back one at a time

If GPUs stop appearing after adding more, you may have hit one of these limits:

  • Power limit
  • Thermal limit
  • BIOS limitation
  • PCIe bifurcation limit

Step 6: Check and Adjust Power Settings

First, check the current power information and limits:

nvidia-smi -q -d POWER

This will show you:

  • Power management mode
  • Power draw
  • Enforced power limit
  • Default power limit
  • Min/max power limits

If you suspect power issues, reduce consumption to see if you're hitting system power limits:

nvidia-smi -pl 250 # Set 250W power limit, adjust to your GPU model
 

Step 7: Consider RMA

If you've tried all steps and the unit consistently fails, consider returning the system or GPU under warranty (RMA).

Common Causes of LSPCI Not Showing NVIDIA GPUs

 Power Issues: Insufficient power or loose connections

 Driver Problems: Incompatible or corrupted drivers

 BIOS Settings: Outdated BIOS or incorrect PCIe settings

 Hardware Failures: Damaged GPU or PCIe slots

 System Overloading: Too many GPUs for your system configuration

 Kernel Issues: Linux kernel parameters interfering with GPU detection

 

Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.