by sajibe

January 4, 2026

Linux, Virtualizor

Recently, one of our production servers experienced an unexpected boot failure, repeatedly dropping into Linux Emergency Mode with the error:

EXT4-fs (nvme0n1): VFS: Can't find ext4 filesystem

At the same time, the server’s hardware RAID controller reported a degraded RAID-1 array with rebuild issues.

This blog explains:

What actually went wrong
Why Linux Emergency Mode appeared
Why RAID rebuild alone did not fix the issue
The correct step-by-step resolution
Lessons learned for future prevention

🧩 The Problem Symptoms

1. Linux Boot Failure

Server booted into Emergency Mode

Error displayed:

EXT4-fs (nvme0n1): Can't find ext4 filesystem

Network services did not start

2. RAID Controller Alerts

RAID level: RAID-1 (2 SSDs)
Status: Degraded
Rebuild started but:
- Progress stalled
- Repeated “Fault detected on drive” messages
- Disk status toggling between Online and Failed

🔍 Root Cause Analysis

This incident had two independent issues that combined into one major outage.

❌ Root Cause #1: Incorrect `/etc/fstab` Entry

The system was trying to mount this entry:

/dev/nvme0n1   /home/vm_storage   ext4   defaults   0 1

Why this is wrong:

/dev/nvme0n1 is a raw block device
The server uses Dell PERC H730 hardware RAID
Filesystems exist on partitions or LVM, not raw disks
During RAID instability, this device temporarily disappeared
systemd failed the mount → Emergency Mode

👉 This alone is enough to force Emergency Mode, even if the rest of the system is healthy.

❌ Root Cause #2: Unstable RAID Rebuild

One SSD was failing under rebuild load
RAID rebuild stalled repeatedly
SATA link resets caused intermittent disk failures
Rebuild could not complete safely

This caused device renumbering and made the fstab issue much more visible.

✅ The Correct Resolution

Step 1: Fix `/etc/fstab`

The invalid mount entry was removed and replaced with stable mounts only:

vi /etc/fstab

Removed:

/dev/nvme0n1 /home/vm_storage ext4 defaults 0 1

Best practice:

Always use UUID, LVM, or partition paths
Never mount raw disks directly

Step 2: Rebuild initramfs

Even after fixing fstab, Linux still booted into Emergency Mode because the old device reference was cached in initramfs.

Fix:

dracut -f -v

This regenerated the boot image without the invalid disk reference.

Step 3: Reset systemd failed units

systemd remembers failed mounts until explicitly cleared.

systemctl reset-failed
systemctl daemon-reexec
systemctl default

✔️ System booted normally after this step.

Step 4: Stabilize RAID

Because the rebuild stalled and disks showed repeated faults:

Rebuild was stopped
The unstable SSD was removed
Server booted safely with single-disk RAID-1 (degraded)
Data remained intact
Faulty SSD scheduled for replacement

🛡️ Final Outcome

✅ Server booted successfully
✅ No data loss
✅ Filesystem integrity preserved
✅ RAID safely stabilized
✅ Root cause fully resolved

📌 Key Lessons Learned

1️⃣ Never Mount Raw Disks in `/etc/fstab`

Always use:

UUID
LVM paths
Partition devices

Raw disks are volatile during RAID events.

2️⃣ RAID Health ≠ Linux Boot Health

RAID rebuild happens below the OS
Linux can still fail due to cached configs
Fixing RAID alone may not fix boot issues

3️⃣ Always Rebuild initramfs After Storage Changes

Any time disks, RAID, or LVM change:

dracut -f

4️⃣ If RAID Rebuild Stalls → Stop and Replace

If:

Rebuild stalls for 30+ minutes
Disks repeatedly fail under load

👉 Stop rebuild and replace the disk
Forcing rebuild risks total data loss.

🏁 Conclusion

This incident highlights how hardware RAID issues and Linux boot configuration problems can interact, creating confusing symptoms.

By:

Correcting filesystem configuration
Rebuilding initramfs
Resetting systemd state
Handling RAID rebuilds cautiously

We restored the system safely and cleanly.

At PrenHost, we apply best-practice diagnostics and recovery procedures to ensure maximum uptime and data safety for our customers.

🧩 The Problem Symptoms

1. Linux Boot Failure

2. RAID Controller Alerts

🔍 Root Cause Analysis

❌ Root Cause #1: Incorrect `/etc/fstab` Entry

Why this is wrong:

❌ Root Cause #2: Unstable RAID Rebuild

✅ The Correct Resolution

Step 1: Fix `/etc/fstab`

Step 2: Rebuild initramfs

Step 3: Reset systemd failed units

Step 4: Stabilize RAID

🛡️ Final Outcome

📌 Key Lessons Learned

1️⃣ Never Mount Raw Disks in `/etc/fstab`

2️⃣ RAID Health ≠ Linux Boot Health

3️⃣ Always Rebuild initramfs After Storage Changes

4️⃣ If RAID Rebuild Stalls → Stop and Replace

🏁 Conclusion

Leave a Reply Cancel reply

Payment Method

Company

Hosting

Software Hosting

Join Our Newsletter

Social Media

RAID Failure Causing Linux Emergency Mode — Root Cause & Resolution

🧩 The Problem Symptoms

1. Linux Boot Failure

2. RAID Controller Alerts

🔍 Root Cause Analysis

❌ Root Cause #1: Incorrect /etc/fstab Entry

Why this is wrong:

❌ Root Cause #2: Unstable RAID Rebuild

✅ The Correct Resolution

Step 1: Fix /etc/fstab

Step 2: Rebuild initramfs

Step 3: Reset systemd failed units

Step 4: Stabilize RAID

🛡️ Final Outcome

📌 Key Lessons Learned

1️⃣ Never Mount Raw Disks in /etc/fstab

2️⃣ RAID Health ≠ Linux Boot Health

3️⃣ Always Rebuild initramfs After Storage Changes

4️⃣ If RAID Rebuild Stalls → Stop and Replace

🏁 Conclusion

Leave a Reply Cancel reply

Payment Method

Company

Hosting

Software Hosting

Join Our Newsletter

Social Media

❌ Root Cause #1: Incorrect `/etc/fstab` Entry

Step 1: Fix `/etc/fstab`

1️⃣ Never Mount Raw Disks in `/etc/fstab`