1 . Scope & Assumptions
Applies to: RHEL 8/9 (CentOS Stream, Rocky, Alma), Ubuntu 20.04/22.04 LTS, Debian 12, openSUSE Leap 15, and SUSE Linux Enterprise 15.
Hardware is x86-64 based CPUs with PCIe or SoC NICs (Intel, Broadcom, Mellanox/NVIDIA, Marvell).
Uses systemd + NetworkManager by default (RHEL/Fedora, Ubuntu desktop) or ifupdown/netplan/systemd-networkd on server flavours.
User has
sudoaccess and console/KVM or IPMI in case of network loss.
2 . Troubleshooting Workflow (Rule-Out Method)
| Layer | Goal | Key Tools | Quick Verdict |
|---|---|---|---|
| Physical | Link present & correct speed | ethtool <SUP_IF>, LEDs | No link = hardware/cable |
| Driver/Firmware | Driver loaded & NIC enumerated | lspci -nnk, lsmod, dmesg, fwupdate | Missing/failed = driver/fw |
| Config (IP/L2-L3) | Correct IP, mask, VLAN, MTU | ip addr, ip route, nmcli, netplan apply | Wrong/missing = config |
| Host Firewall | Not blocking | nft list ruleset / firewall-cmd --list-all / ufw status | Drop rules = adjust |
| Connectivity | Path and DNS resolve | ping, traceroute, dig, curl | Fail = path or DNS |
| Performance | Expected throughput/latency | iperf3, ss, sar -n DEV | Poor = perf tuning |
Proceed layer-by-layer; stop when the fault is found. Details follow.
3 . Physical & Hardware Checks
Verify cabling & optics
Confirm correct cable type (Cat6A/SFP+/QSFP) and known-good port on the switch.
Observe NIC and switch link/activity LEDs. No light → suspect cable, transceiver, or switch port.
Check link parameters
Link detected: yes?Speed / Duplexmatches expected (e.g., 10000Mb/s, Full).Speed mismatch → autoneg or switch config problem.
Validate NIC presence
Ensure the device and driver in use are listed.
Absent device → seating/BIOS/PCIe slot issue.
Inspect kernel logs
Look for PCI errors (
AER: Corrected error), firmware load failures, or link flaps.
Firmware & driver updates
RHEL/SUSE: vendor DKMS or
kmod-<driver>+fwupdatepackages.Ubuntu/Debian:
apt install firmware-linux-nonfreeor vendor scripts (mlxup, bnxt-vf).
4 . Software Configuration Checks
4.1 Identify interface names
4.2 Verify IP settings
Missing address?
NetworkManager:
nmcli con show; nmcli con up <name>Netplan (Ubuntu Server ≥18.04): edit
/etc/netplan/*.yaml,sudo netplan apply.ifupdown (Debian): edit
/etc/network/interfaces,sudo ifup enp129s0.wicked (SUSE):
wicked ifstatus eth0.
4.3 DNS resolution
Check /etc/resolv.conf symlink (systemd-resolved vs static file). Mis-configured DNS often looks like network loss.
4.4 VLAN & MTU
Ensure VLAN tags match switch port; jumbo MTU must match end-to-end.
5 . Host Firewall & SELinux/AppArmor
List rules:
nftables:
sudo nft list rulesetfirewalld (RHEL, Fedora, openSUSE):
sudo firewall-cmd --list-allufw (Ubuntu):
sudo ufw status verbose
Temporarily disable for test (replace after!):
SELinux (RHEL):
getenforce→ set to Permissive for quick test (sudo setenforce 0).
6 . Connectivity & Routing Tests
| Test | Command | Interpretation |
|---|---|---|
| Loopback | ping -c3 127.0.0.1 | Fail = kernel network stack |
| Local NIC | ping -c3 <own_IP> | Fail = IP binding |
| Gateway | ping -c3 <gw_IP> | Fail = L2 problem |
| Outside | traceroute 8.8.8.8 | Stops early = routing/NAT |
| DNS | dig example.com @<DNS> | NXDOMAIN/time-out = DNS issues |
| HTTP | curl -Iv https://example.com | TLS/connect failures clue proxies |
7 . Performance Problems
Measure throughput & latency
Detect congestion/offload issues
Tune
Interrupt coalescing:
ethtool -C enp129s0 rx-usecs 3 rx-frames 0.Ring buffers:
ethtool -G enp129s0 rx 4096 tx 4096.Enable flow-control or RSS for multi-core.
8 . Hardware Isolation Steps
Move NIC to different PCIe slot (eliminates backplane issues).
Swap cable and switch port.
Boot Live-USB of another distro (proves OS install vs hardware).
Loopback plug test (fiber loop or twinax).
Replace NIC with known-good unit.
If dual-port card: test each port separately.
9 . Distro-Specific Notes
| Task | RHEL/CentOS/Rocky | Ubuntu (netplan) | Debian (ifupdown) | openSUSE/SLE |
|---|---|---|---|---|
| Restart networking | nmcli networking off && nmcli networking on | netplan apply | ifdown eth0 && ifup eth0 | wicked ifrestart eth0 |
| System logs | journalctl -u NetworkManager | journalctl -u systemd-networkd | /var/log/syslog | journalctl -u wickedd |
| Enable service | systemctl enable --now NetworkManager | systemctl enable --now systemd-networkd | N/A | systemctl enable --now wicked |
| Persistent firewall | firewall-cmd --permanent | ufw enable | iptables-save | firewall-cmd |
10 . When to Escalate
Repeated kernel oopses or PCIe AER fatal errors → open hardware RMA.
Driver bug reported in upstream change-log; try latest kernel from distro backport channel.
Switch shows errors (FCS, CRC, flapping) on port even after cable/NIC swap → escalate to network team.
11 . Quick Reference Command Cheat-Sheet
12 . Conclusion
Start at the physical layer, work upward, and change one variable at a time. Document every step (interface name, IP, switch port, cable ID) so you can roll back and provide clear information if you must open a vendor ticket. Following this structured approach will isolate 90 % + of network faults on Linux workstations and servers.
Related to
Comments
0 comments
Please sign in to leave a comment.