Frequently Encountered Issues - Network Troubleshooting in Linux

Andrew Rodriguez
Andrew Rodriguez
  • Updated

1 . Scope & Assumptions

  • Applies to: RHEL 8/9 (CentOS Stream, Rocky, Alma), Ubuntu 20.04/22.04 LTS, Debian 12, openSUSE Leap 15, and SUSE Linux Enterprise 15.

  • Hardware is x86-64 based CPUs with PCIe or SoC NICs (Intel, Broadcom, Mellanox/NVIDIA, Marvell).

  • Uses systemd + NetworkManager by default (RHEL/Fedora, Ubuntu desktop) or ifupdown/netplan/systemd-networkd on server flavours.

  • User has sudo access and console/KVM or IPMI in case of network loss.

 

2 . Troubleshooting Workflow (Rule-Out Method)

LayerGoalKey ToolsQuick Verdict
PhysicalLink present & correct speedethtool <SUP_IF>, LEDsNo link = hardware/cable
Driver/FirmwareDriver loaded & NIC enumeratedlspci -nnk, lsmod, dmesg, fwupdateMissing/failed = driver/fw
Config (IP/L2-L3)Correct IP, mask, VLAN, MTUip addr, ip route, nmcli, netplan applyWrong/missing = config
Host FirewallNot blockingnft list ruleset / firewall-cmd --list-all / ufw statusDrop rules = adjust
ConnectivityPath and DNS resolveping, traceroute, dig, curlFail = path or DNS
PerformanceExpected throughput/latencyiperf3, ss, sar -n DEVPoor = perf tuning

Proceed layer-by-layer; stop when the fault is found. Details follow.

 

3 . Physical & Hardware Checks

  1. Verify cabling & optics

    • Confirm correct cable type (Cat6A/SFP+/QSFP) and known-good port on the switch.

    • Observe NIC and switch link/activity LEDs. No light → suspect cable, transceiver, or switch port.

  2. Check link parameters

    bash
     
    sudo ethtool enp129s0 # replace with your interface
    • Link detected: yes?

    • Speed / Duplex matches expected (e.g., 10000Mb/s, Full).

    • Speed mismatch → autoneg or switch config problem.

  3. Validate NIC presence

    bash
     
    lspci -nnk | grep -i -A3 Ethernet
    • Ensure the device and driver in use are listed.

    • Absent device → seating/BIOS/PCIe slot issue.

  4. Inspect kernel logs

    bash
     
    sudo dmesg --follow
    • Look for PCI errors (AER: Corrected error), firmware load failures, or link flaps.

  5. Firmware & driver updates

    • RHEL/SUSE: vendor DKMS or kmod-<driver> + fwupdate packages.

    • Ubuntu/Debian: apt install firmware-linux-nonfree or vendor scripts (mlxup, bnxt-vf).

 

4 . Software Configuration Checks

4.1 Identify interface names

bash
 
ip link # universal nmcli dev status # NetworkManager distros

4.2 Verify IP settings

bash
 
ip addr show enp129s0 ip route
  • Missing address?

    • NetworkManager: nmcli con show; nmcli con up <name>

    • Netplan (Ubuntu Server ≥18.04): edit /etc/netplan/*.yaml, sudo netplan apply.

    • ifupdown (Debian): edit /etc/network/interfaces, sudo ifup enp129s0.

    • wicked (SUSE): wicked ifstatus eth0.

4.3 DNS resolution

bash
 
systemd-resolve --status

Check /etc/resolv.conf symlink (systemd-resolved vs static file). Mis-configured DNS often looks like network loss.

4.4 VLAN & MTU

bash
 
ip -d link show enp129s0 # reveals VLAN IDs, qdisc, MTU

Ensure VLAN tags match switch port; jumbo MTU must match end-to-end.

 

5 . Host Firewall & SELinux/AppArmor

  1. List rules:

    • nftables: sudo nft list ruleset

    • firewalld (RHEL, Fedora, openSUSE): sudo firewall-cmd --list-all

    • ufw (Ubuntu): sudo ufw status verbose

  2. Temporarily disable for test (replace after!):

    bash
     
    sudo systemctl stop firewalld sudo ufw disable
  3. SELinux (RHEL): getenforce → set to Permissive for quick test (sudo setenforce 0).

 

6 . Connectivity & Routing Tests

TestCommandInterpretation
Loopbackping -c3 127.0.0.1Fail = kernel network stack
Local NICping -c3 <own_IP>Fail = IP binding
Gatewayping -c3 <gw_IP>Fail = L2 problem
Outsidetraceroute 8.8.8.8Stops early = routing/NAT
DNSdig example.com @<DNS>NXDOMAIN/time-out = DNS issues
HTTPcurl -Iv https://example.comTLS/connect failures clue proxies

 

7 . Performance Problems

  1. Measure throughput & latency

    bash
     
    iperf3 -s # on server iperf3 -c server # on client
  2. Detect congestion/offload issues

    bash
     
    ethtool -K enp129s0 ss -plt # check for LISTEN backlog sar -n DEV 1 5 # utilisation spikes
  3. Tune

    • Interrupt coalescing: ethtool -C enp129s0 rx-usecs 3 rx-frames 0.

    • Ring buffers: ethtool -G enp129s0 rx 4096 tx 4096.

    • Enable flow-control or RSS for multi-core.

 

8 . Hardware Isolation Steps

  1. Move NIC to different PCIe slot (eliminates backplane issues).

  2. Swap cable and switch port.

  3. Boot Live-USB of another distro (proves OS install vs hardware).

  4. Loopback plug test (fiber loop or twinax).

  5. Replace NIC with known-good unit.

  6. If dual-port card: test each port separately.

 

9 . Distro-Specific Notes

TaskRHEL/CentOS/RockyUbuntu (netplan)Debian (ifupdown)openSUSE/SLE
Restart networkingnmcli networking off && nmcli networking onnetplan applyifdown eth0 && ifup eth0wicked ifrestart eth0
System logsjournalctl -u NetworkManagerjournalctl -u systemd-networkd/var/log/syslogjournalctl -u wickedd
Enable servicesystemctl enable --now NetworkManagersystemctl enable --now systemd-networkdN/Asystemctl enable --now wicked
Persistent firewallfirewall-cmd --permanentufw enableiptables-savefirewall-cmd

 

10 . When to Escalate

  • Repeated kernel oopses or PCIe AER fatal errors → open hardware RMA.

  • Driver bug reported in upstream change-log; try latest kernel from distro backport channel.

  • Switch shows errors (FCS, CRC, flapping) on port even after cable/NIC swap → escalate to network team.

 

11 . Quick Reference Command Cheat-Sheet

bash
 
# Interface & link ip link show ethtool -i enp129s0 watch -n1 -d cat /proc/net/dev # Driver & firmware lspci -nnk modinfo <driver> sudo ethtool -i enp129s0 # Logs sudo journalctl -k -b | grep -i 'eth\|enp' # Connectivity ping -c3 $(ip route | awk '/default/ {print $3}') traceroute 8.8.8.8 dig +short google.com

 

12 . Conclusion

Start at the physical layer, work upward, and change one variable at a time. Document every step (interface name, IP, switch port, cable ID) so you can roll back and provide clear information if you must open a vendor ticket. Following this structured approach will isolate 90 % + of network faults on Linux workstations and servers.

Related to

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.