If your just looking to rename files instead of fix encodings check this article out: fixing bad filenames or fixing movie names

WINDOWS and LINUX: Looking at Shares and the have folder/files without names

If your accessing a share and see files/folders without names, this means you probably have an encoding issue.

Here is the setup:

1) Windows Server/ PC (hostname: win1)

2) Linux server (that has SAMBA file sharing) which has shares Music and Backup (hostname: lin1)

3) Windows Server/PC (hostname: win2)

You are using your Windows PC (win1) and you log in to the Linux Server (Lin1) Music share and see this:

NOTE: i had to scrach out the folder name to keep information private. Imagine this is the Music share (which has alot of songs and artists from many different countries … hint hint)

What causes the above? Windows and linux have different encodings that they prefer. UTF16 is what windows uses for filenames, where as linux uses UTF8. So sometimes that conversion doesnt happen to well. Most likely cause of this (actually in this case it is the cause of this). There was a copy procedure from another Windows server (say the hostname is win2) to this Linux server. There is a bug with samba where the copy doesnt happen to do the filename conversions correct for different international characters. For example the “é ” can turn into an “e”, but instead it turns into an upside down question “?” with a  hotdog ontop or looks like this “./Expos¦” (it can look many different ways depending on your encodings on your terminal, your browser, etc) .

Solution 1 – best one? Login to the Linux server, and go directly to that share

# ======= #
# PREPARE #
# ======= #

# We are sitting at an SSH session on Lin1 (where the bad named files are)

# Download the main app
apt-get install convmv

# change to the directory with the bad folders/files
cd /data/Music

# ===================================== #
# Find bad files with Convmv (Readonly) #
# ===================================== #

# This next command will go recursively through all of the folders and show you how the conversion should go from one encoding to another

convmv -r -f ISO-8859-1 -t UTF-8 . 

# -r is to go recursively thru every subfolder, -f is the source encoding (the encoding its in), the -t is the destination encoding (we want it to be in UTF-8). Remember the problem is that the encoding got saved to wrong format, in this case on Lin1 everything got saved to ISO-8859-1 with international characters.

# OR Save it to a file at the same time
convmv -r -f ISO-8859-1 -t UTF-8 . | tee /data/Backup/badfilesinmusic.txt

# Save it to a file with errors saved as well
# bash 3:
convmv -r -f ISO-8859-1 -t UTF-8 . 2>&1 | tee /data/Backup/badfilesinmusic.txt
# bash 4:
convmv -r -f ISO-8859-1 -t UTF-8 . |& tee /data/Backup/badfilesinmusic.txt

# NOTE ABOUT OUTPUT: it will look like a bunch of mv commands (move/rename commands) that never got ran. Look at the output and make sure that the encoding conversion looks good.

# Meaning a file that looks like this:
# ./Expos▒  changes to ./Expose
# Via command:
# mv ./Expos▒ ./Expose
# ./Expos? change to ./Exposé 
# Via command:
# mv ./Expos? ./Expose

# If it tries to change them to something that looks wrong then try another encoding:

# ==== OTHER TYPES OF ====
convmv -r -f cp-850 -t UTF-8 . 
convmv -r -f windows-1252 -t UTF-8 . 
convmv -t utf8 --nfc -f iso-8859-1 --notest -r . 
# As by forum entry: 
# By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.

# ======== #
# 2 Fixes  #
# ======== #

# (Fix1) - run the convmv command on lin1 with notest (the notest command will complete the move command)

convmv --notest -r -f ISO-8859-1 -t UTF-8 . 

# NOTE: --notest is not readonly, it actually completes the renames

# (Fix2) - Since all of this began on server Win2 when it got transfered to Lin1 via Samba or whatever. Then lets just rename the files & folders manually on Win2

# You have the list of bad files in the Backup share: Backup/badfilesinmusic.txt

# Just open it up in Windows with Notepad++ or Wordpad (not notepad, notepad will not see the linux line endings correctly, so it will look like one long continous line of text)

 

Solution 2 – not the best one? Go into win2 server and change all of the internation characters like “Exposé ” to simply just “Expose”. So that the bad conversion doesnt happen.

You also have the output in the Backup share on Lin1 that tells you what files and folders need to be converted

 

Leave a Reply

Your email address will not be published. Required fields are marked *