Recently, one of our production servers experienced an unexpected boot failure, repeatedly dropping into Linux Emergency Mode with the error:
EXT4-fs (nvme0n1): VFS: Can't find ext4 filesystem
At the same time, the server’s hardware RAID controller reported a degraded RAID-1 array with rebuild issues.
This blog explains:
- What actually went wrong
- Why Linux Emergency Mode appeared
- Why RAID rebuild alone did not fix the issue
- The correct step-by-step resolution
- Lessons learned for future prevention
🧩 The Problem Symptoms
1. Linux Boot Failure
- Server booted into Emergency Mode
- Error displayed:
EXT4-fs (nvme0n1): Can't find ext4 filesystem - Network services did not start
2. RAID Controller Alerts
- RAID level: RAID-1 (2 SSDs)
- Status: Degraded
- Rebuild started but:
- Progress stalled
- Repeated “Fault detected on drive” messages
- Disk status toggling between Online and Failed
🔍 Root Cause Analysis
This incident had two independent issues that combined into one major outage.
❌ Root Cause #1: Incorrect /etc/fstab Entry
The system was trying to mount this entry:
/dev/nvme0n1 /home/vm_storage ext4 defaults 0 1
Why this is wrong:
/dev/nvme0n1is a raw block device- The server uses Dell PERC H730 hardware RAID
- Filesystems exist on partitions or LVM, not raw disks
- During RAID instability, this device temporarily disappeared
- systemd failed the mount → Emergency Mode
👉 This alone is enough to force Emergency Mode, even if the rest of the system is healthy.
❌ Root Cause #2: Unstable RAID Rebuild
- One SSD was failing under rebuild load
- RAID rebuild stalled repeatedly
- SATA link resets caused intermittent disk failures
- Rebuild could not complete safely
This caused device renumbering and made the fstab issue much more visible.
✅ The Correct Resolution
Step 1: Fix /etc/fstab
The invalid mount entry was removed and replaced with stable mounts only:
vi /etc/fstab
Removed:
/dev/nvme0n1 /home/vm_storage ext4 defaults 0 1
Best practice:
- Always use UUID, LVM, or partition paths
- Never mount raw disks directly
Step 2: Rebuild initramfs
Even after fixing fstab, Linux still booted into Emergency Mode because the old device reference was cached in initramfs.
Fix:
dracut -f -v
This regenerated the boot image without the invalid disk reference.
Step 3: Reset systemd failed units
systemd remembers failed mounts until explicitly cleared.
systemctl reset-failed
systemctl daemon-reexec
systemctl default
✔️ System booted normally after this step.
Step 4: Stabilize RAID
Because the rebuild stalled and disks showed repeated faults:
- Rebuild was stopped
- The unstable SSD was removed
- Server booted safely with single-disk RAID-1 (degraded)
- Data remained intact
- Faulty SSD scheduled for replacement
🛡️ Final Outcome
- ✅ Server booted successfully
- ✅ No data loss
- ✅ Filesystem integrity preserved
- ✅ RAID safely stabilized
- ✅ Root cause fully resolved
📌 Key Lessons Learned
1️⃣ Never Mount Raw Disks in /etc/fstab
Always use:
- UUID
- LVM paths
- Partition devices
Raw disks are volatile during RAID events.
2️⃣ RAID Health ≠ Linux Boot Health
- RAID rebuild happens below the OS
- Linux can still fail due to cached configs
- Fixing RAID alone may not fix boot issues
3️⃣ Always Rebuild initramfs After Storage Changes
Any time disks, RAID, or LVM change:
dracut -f
4️⃣ If RAID Rebuild Stalls → Stop and Replace
If:
- Rebuild stalls for 30+ minutes
- Disks repeatedly fail under load
👉 Stop rebuild and replace the disk
Forcing rebuild risks total data loss.
🏁 Conclusion
This incident highlights how hardware RAID issues and Linux boot configuration problems can interact, creating confusing symptoms.
By:
- Correcting filesystem configuration
- Rebuilding initramfs
- Resetting systemd state
- Handling RAID rebuilds cautiously
We restored the system safely and cleanly.
At PrenHost, we apply best-practice diagnostics and recovery procedures to ensure maximum uptime and data safety for our customers.