Whats covered in this article:

1. how to regain access to a zpool import that keeps segfaulting (perhaps its even rebooting your system on every import).

Whats not covered here:

1. the fact that you should also go thru zfs forums to submit your error to the devs so they can fix it

2. How to zfs send and recieve your data and restore your data. Only how to regain access is in this article

The idea

If your ZFS system segfaults during a zpool import of a volume try to mount it as readonly. That will give you access to data. You can then copy your data out if you need access to it instantiated.

If you had snapshots, you can do a zfs send and zfs recieve on another volume/system to get your data back.

Or you can just do another type of copy to another volume, some file level copy. Maybe a simple cp, or rsync to a samba share/nfs share/etc.

The point of this blog entry is to get access back to your segfaulted volume

First get booted into a mode where the volume doesnt try to boot, some sort of recovery mode.

See your volume:

zpool import

Get history of that volume:

zdb -he <pool>

Mount that volume readonly:

zpool import -o readonly=on <pool>

Check that its filesystems went on and mounted

zfs list

Test zfs send

zfs send <pool>/<filesystem from zfs> > /dev/null

example;

zfs send pool1/datashare > /dev/null

That will tell you if you can work with zfs send. If you didnt have a snapshot, dont bother taking a snapshot now it wont work:

zfs snapshot <pool>/<filesystem>@<snapshotname>

ex:

zfs snapshot pool1/datashare@snap1

That will fail on a readonly system

Make sure your system is at least in good status in readonly mode

zpool status

zpool status -v

Check every file

find /pool/ -type f -exec echo {} \; -exec cp {} /dev/null \;

While the operation is going (this is similar to a scrub to look for errors, this will look for problematic files, hopefully you have no problem files)

while every file is being checked keep checking on the io (make sure latency is low below 100ms for each device and volume as a  whole) – however if latency is high we cant do much its a curropt volume that we are troubleshooting.  This is just to get an idea of the io:

zpool iostat -v 10

Also keep checking on the zpool status every few seconds:

while true; do date;  zpool status; sleep 10; clear; done

Leave a Reply

Your email address will not be published. Required fields are marked *