BTRFS – transid issue explained and fix

USE METHODS AT YOUR OWN RISK, IF YOU DECIDE TO USE THIS ARTICLE, HEED ALL OF THE SAFETY MEASURE (IF I MISSED A SAFETY MEASURE, DONT BLAME ME, I SAID USE AT YOUR OWN RISK)

All of this inspired from my answer on this forum: http://askubuntu.com/questions/157917/how-do-i-recover-a-btrfs-partition-that-will-not-mount/407766#407766

To fix a transid issue with a hammer just hit it with:

 btrfs-zero-log DEVICE

ex:

btrfs-zero-log /dev/sda3

However not so fast, thats a hammer that might over do it and destroy your system (lose some files, or completly make it unrestoreable… at least from hearsay… but it is a correct move to “zero” the log, whent he log is off by a transaction).

Note some of these options might not be available to you and you will need to get the package. The name of the package for all this is the btrfs-tools, apt-get might not have the latest so you might need to look online for the sources and compile them

NOTE: I assume your btrfs volume is /dev/sda3

If you see things like this in your btrfs filesystem check: btrfsck /dev/sda3 (a read only check of your system – or in dmesg after an unsuccessful mount;

parent transid verify failed on 109973766144 wanted 1823 found 1821

or:

parent transid verify failed on 31302336512 wanted 62455 found 62456

Thats because the journal is off from the disk. btrfs keeps a journal of all the transactions going on to a disk (that journal can also be on the disk).

How it works:

So when data is written (A transaction; a write or a delete, a delete is also a write remember): 1st data is written to journal then to disk (or at same time, but journal just saves metadata about the upcoming write – not sure… need more research on that part)…

Anyhow if you turn off the system in the middle of this write/delete or do something hickups the system (dismount the USB that holds your btrfs mount point), then when it returns that mount will not work it will fail (dmesg and btrfsck will show you the errors in more detail)…

Looking at dmesg you will see those same transid messages.

You will see something like this:

parent transid verify failed on 109973766144 wanted 1823 found 1821

It means that btrfs wanted transid 1826 (That was on the journal) but on the disk it saw 1821. So the disk was 2 transactions away from being in sync with journal. I personally would risk a brtfs-zero-log here just because its only 2 transactions (I mean how much data could really be in 2 transactions, and could that little amount of data destroy my system if its off on my disk – im assuming a transaction in btrfs is small – I dont have more info on btrfs transaction id size as of currently). But to be 100% safe if this is your only data (by the way if you have critical data you should NEVER EVER have only 1 copy of it, always have a copy/backup in a safe other location – blaming the creators of btrfs wouldnt justify against the persons own lack of responsibility of not having a backup – btrfs is not backup solution, its a filesystem – nothing is a true backup solution besides having a copy of it else where – not even parity or mirrored drives, a true backup is sitting somewhere underground in the Alps while its active copy is in your office in Texas)

parent transid verify failed on 31302336512 wanted 62455 found 62456

Here the journal is wanting 62455 but the disk is one ahead at 62456, so in your case i would just clear the journal. Journal didnt update this time. Again I told you bout being safe thing, if its your only data and its mega critical (shame on you), and I would do the below operations first to be safe.

Running a btrfsck /dev/sda3 (which by the way just does a readonly check of the filesystem so its completely safe, its only btrfsck options that you have to worry about) will also show you those messages.

But beware if that data is critical, i would follow these steps in order (As the other gents said):

SAFE STEPS

1, recovery mounts (if fail continue, if successful transfer mount to safe location, and continue with option3, or as bad option keep the recovery mount as your main mount in /etc/fstab – highly not recommended to do that latter part)

2. try btrfs restore to a disk (if fail continue, if successful then continue to btrfs-zero-log)

3. brtfsck-zero-log which will clear the log and your filesystem should be mountable… hopefully.. unless those transacations were way way off on the journal and something was damaged

Note: I would always consider having a backup or performing btrfs restore/recovery mounts to backup your system.

(Step3) btrfsck-zero-log

First Ill talk about step 3 as its the point of the article, and I already have an article for step1 and 2 (link below).

To zero the log and get those transid messages to go away:

# btrfs-zero-log /dev/sda3

Beware, heed the warnings I mentioned through out the article (if you skipped those read the full article)

(step1 and 2) Everything below is what you should try before btrfs-zero-log

Here is an article on the below stuff – read it first then read this: HERE

# mount -t btrfs -o ro /dev/sda3 /mnt/sda3

# mount -t btrfs -o ro,recovery /dev/sda3 /mnt/sda3

# mount -t btrfs -o recovery /dev/sda3 /mnt/sda3

# mount -t btrfs -o rootflags=recovery,nospace_cache /dev/sda3 /mnt/sda3

# mount -t btrfs -o rootflags=recovery,nospace_cache,clear_cache /dev/sda3 /mnt/sda3

# mount -t btrfs -o recovery,nospace_cache,nospace_cache /dev/sda3 /mnt/sda3

# mount -t btrfs -o ro,rootflags=recovery,nospace_cache /dev/sda3 /mnt/sda3

# mount -t btrfs -o ro,rootflags=recovery,nospace_cache,clear_cache /dev/sda3 /mnt/sda3

# mount -t btrfs -o ro,recovery,nospace_cache,nospace_cache /dev/sda3 /mnt/sda3

Then cp or rsync all of your files over to safe location, then when safe do the btrfs-zero-log, if its a successful operation you just wasted alot of time backing up your system (but if its not successful, you just saved your arse)

Then if the mounts failed do a btrfs restore (dump of the system, as I understand its a resumeable operation, however it keeps asking for Y or y every now and again so watch the output)

# btrfs restore /dev/sda5 /USB

Then when safe (when btrfs restore is done) do the btrfs-zero-log, if its a successful operation you just wasted alot of time backing up your system (but if its not successful, you just saved your arse)

You can run screen first

# screen /bin/bash

# btrfs restore /dev/sda5 /USB

SCREEN SIDE NOTE

To detach (command will still run): CONTROL-a then type “:detach” without the quotes then press ENTER

Another way to detach: Then close out of putty or your terminal and it will detach (the command / restore will still run).

To check up on it, just screen back to it:

# screen -x

screen -x will attach to sessions, even if detached, and unlike -h says, it will attach even if its already attached as well)

If you have several screens, screen -x will tell you need to be more specific to attach to the session:

# screen -ls

ls for list all sessions, easy to remember that.

to see the PID you can also do this:

# ps aux | grep screen

or:

# pstree -ap

Once you find out your PID, then run screen like this:

# screen -x PID

That will attach to a specific session. You can have several sessions/puttys attached to the same screen (they will output the same text, you can type commands in one, and they will be mirrored on the other putty)

UPDATE – BIGGER TRANSID ISSUE and INSTALLING LATEST BTRFS-TOOLS:

NOTE: everywhere your see <dev> replace it with the filesystem device which has your btrfs volume (if you have a couple of devices sharing the same btrfs volume, like in a raided btrfs setup, then just pick one of them, and it will automatically know that the other devices are linked to it). So in my case <dev> is /dev/sda5.

I was able to fix transid issues that were off by 1000, here are the error messages:

When I tried to mount the filesystem # mount -t btrfs <dev> <path> or even see its label with # btrfs fi label <dev>

parent transid verify failed on 13891821568 wanted 540620 found 541176
parent transid verify failed on 13891821568 wanted 540620 found 541176
parent transid verify failed on 13891821568 wanted 540620 found 541176
parent transid verify failed on 13891821568 wanted 540620 found 541176
btrfs: open_ctree failed

1) get the latest brtfs-tools and also the latest btrfsck

Look here for most up to date instructions: HERE But they should go something like this:

# mkdir ~/src
# cd ~/src
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git 
# cd btrfs-progs
# make 
# make btrfs-select-super 
# make btrfs-zero-log

# mkdir ~/src1
# cd ~/src1
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
# cd btrfs-progs
# git checkout dangerdonteveruse
# make

3) First try to do some recovery mounts as seen here, if that doesnt work try btrfs restore: HERE

4) Finally try to zero the log (good data in the journal might be lost, but I dont know of another way to get it out, besides submit a bug to the btrfs forums): # btrfs-zero-log <dev> ex: # btrfs-zero-log /dev/sda5

NOTE: this move for me didnt remove the transid issue (even though thats its job and it didnt remove the csum error at the bottom)

5) Clear the csum tree. The message “btrfs: open_ctree failed” hinted me at this move.

like this:

# btrfsck --init-csum-tree <dev>

ex:

# btrfsck –init-csum-tree /dev/sda5

That command runs really quick. Now I was able to mount the data however all of the csums are gone, if for you it doesnt mount, try mounting with ignoring the crc/csum tree. For me it mounted no problem. It fixed my transid issue and csum issue. SIDENOTE: my mount options after I mounted the filesystem after running init-csum-tree didnt say anything about it being mounted with any special option to ignore csums, but that doesnt mean yours will do that. The following note explains that.

NOTE: articles online state that the mount command after an –init-csum-tree should have the option to mount with “-o nodatasum” because of the zeroed out csum tree. However for me it just mounted without it. Also looking at mount options didnt list that it was mounted with nodatasum… so I dont know about that… so if you randomly still cant mount after the –init-csum-tree look into mounting with “-o nodatasum”

At this point I decided its logical to run

# btrfsck --fix-crc <dev>

to fix the crcs at this point. This command takes its time to go thru everything and fix the crc tree.

Maybe even a good idea to run after that:

# btrfsck --repair <dev>

2 thoughts on “BTRFS – transid issue explained and fix”

Vojtěch Kletečka says:

2014-08-28 at 2:52 am

Thank you very much, I’ve been searching for a long time how to repair btrfs filesystem on my disc and finaly found clear sequence of commands here. In my case “mount -t btrfs -o ro,recovery /dev/sdb1 /mnt/sdb1” worked

Dave Olson says:

2016-09-08 at 10:02 am

None of the above methods were able to fix a similar problem that I encountered on the root filesystem after a power cycle.

What fixed it for me (on a 4.1 kernel) was

btrfs check –repair –init-csum-tree -b /dev/sda4

infotinks

My Notes, Articles & Guides for Linux, Windows and Networking.

BTRFS – transid issue explained and fix

2 thoughts on “BTRFS – transid issue explained and fix”

Leave a Reply Cancel reply