Remove Invalid Characters From Filenames
##################################

Source of the SED regular expression from this site: http://serverfault.com/questions/348482/how-to-remove-invalid-characters-from-filenames

I have another article on converting file names between encodings that might fit your needs: converting between encodings

Intro

Imagine a file with this name: 009_-_�%86ndringshåndtering.html

Or what if you need to convert a full path to a single filename, for example /data/folder/file.txt to _data_folder_file.txt. This might be useful in a script. Especially if you have a script that operates on folders and you want your log file to contain the folder path. Say you make your own defrag tool and you run it on a folder called /data/folder, the log file can be called defrag_log_data_folder.log. This is important because its impossible to have a file that has a / character. Try making a file called defrag_log_/data/folder.log, you will see it fails and for obvious reasons.

How to rename it to something to have more legit ascii characters, so that it can sit as a file?

Use sed, tell said for every none ascii character (which is not A-Z and not a-z and not 0-9 and not a dot, underscore or a dash) then replace it with an underscore

Answer: mv ‘file’ $(echo ‘file’ | sed -e ‘s/[^A-Za-z0-9._-]/_/g’)

If filename variable is set to the filename:

echo $filename | sed -e 's/[^A-Za-z0-9._-]/_/g'

Here is how you operate on full paths like I mentioned earlier:

This is also useful to convert a full filename path to a string so that it can be appended to a file example

# Example:
(filename='/data/some random path@#$/to a file which can be made you knowwhat i mean.txt'
echo "old file:"
echo $filename
echo "new file:"
newname=`echo $filename | sed -e 's/[^A-Za-z0-9._-]/_/g'`
newfile=/newlocation/rename-to-this-single-file-$newname
echo $newfile)

# OUTPUT:
# old file:
# /data/some random path@#$/to a file which can be made you knowwhat i mean.txt
# new file:
# /newlocation/rename-to-this-single-file-_data_some_random_path____to_a_file_which_can_be_made_you_knowwhat_i_mean.txt

 How to convert many bad files and folder names to good ones:

First download rename.pl and edit the program script so that its output is easier to read, like I have done here: beautifying rename.pl script

SIDENOTE: here is an interesting article showing how you can use rename.pl script to rename Movie folders

Now go to your folder:

cd /folder

Now run this command, which will not harm anything (because of -n dry run option), but it will find all of the files & folders (in this folder, not recursive) with questionable characters:

rename -n 's/[^A-Za-z0-9._-]/_/g' *

What do we have here: We are telling it to look thru all files and hypothetically rename everything that doesnt have a lower or upper case letter or number or dot underscore hyphen into an underscore. hypothetically meaning it wont commit the change because of the dry run option placed with -n. Here we are converting everything to a underscore but you can convert to a hyphen (it will look like this: rename -n ‘s/[^A-Za-z0-9._-]/-/g’ * ) or into a space (it will look like this: rename -n ‘s/[^A-Za-z0-9._-]/ /g’ * )

Don’t remove the -n and run it, unless you want all of those changes to commit.

Study the output and find a file that you want to rename. Then put it in like this

rename -v 's/[^A-Za-z0-9._-]/_/g' 'this|file|has|pipe|symbols.txt'

NOTE: windows will show the above file as A~2+34GB but linux will show it

Notice without the -n sign it will commit the change. So maybe its good to log everything

I would instead though keep track of all of my changes like this (tee will append to this renames.log file, if renames.log doesnt exist it will make it. tee is just like > and tee -a is just like >> except that you also get screen output):

rename -v 's/[^A-Za-z0-9._-]/_/g' 'this|file|has|pipe|symbols.txt' | tee -a /var/log/renames.log

NOTE: The “rename” tool im using is the perl rename tool sometimes called “prename”, “rename.pl”, or simply rename (sidenote, there is also another similar tool called “rename” tool from a package called util-linux, which isnt as good as the one we work with – google “rename util-linux”, I have more info on the provided link). More information on perl rename and not-as-good-util-linux rename: http://www.infotinks.com/renameperl/

Maybe your files only have a couple or one offensive character.

If all of your files have the | character that you want to convert to an underscore _.

#### working in current directory (not recursive) ####
# first do a dry run (save the dry run output with tee):
rename -n 's/|/_/g' *
# then commmit the change, if you like what you see
rename 's/|/_/g' *
# or run it verbose to see the changes
rename -v 's/|/_/g' * | tee -a /var/log/renames.log

#### working in recursively in current folder ####
# **STEP1**
# - first we will change the folders & then the files - #
# first do a dry run (save the dry run output with tee):
find -type d -exec rename -n 's/|/_/g' {} \;
# commit the changes silently if you like what you see
find -type d -exec rename 's/|/_/g' {} \; 
# or commit them and see the changes
find -type d -exec rename -v 's/|/_/g' {} \; | tee -a /var/log/renames.log
# **STEP2**
# - now we change the files in all of the folders - #
# first do a dry run (save the dry run output with tee):
find -type f -exec rename -n 's/|/_/g' {} \;
# commit the changes silently if you like what you see
find -type f -exec rename 's/|/_/g' {} \; 
# or commit them and see the changes
find -type f -exec rename -v 's/|/_/g' {} \; | tee -a /var/log/renames.log

If you have a couple trouble characters like lets say the pipe | and also the colon : then you can have it work on both characters (or you can do the slow method and rerun the above commands for the other character, the fast way is to have sed look for | or : and convert them to _).

#### NOTES ####
# before we were looking only for a pipe | using this regular expression:
rename -n 's/|/_/g' *
# now to look for colons we would do this:
rename -n 's/:/_/g' *
# and to look for both we would do this:
rename -n 's/[|:]/_/g' *
# or this (order doesnt matter when you put characters in the [] operator of sed):
rename -n 's/[:|]/_/g' *

#### working in current directory (not recursive) ####
# first do a dry run (save the dry run output with tee):
rename -n 's/[|:]/_/g' *
# then commmit the change, if you like what you see
rename 's/[|:]/_/g' *
# or run it verbose to see the changes
rename -v 's/[|:]/_/g' * | tee -a /var/log/renames.log

#### working in recursively in current folder ####
# **STEP1**
# - first we will change the folders & then the files - #
# first do a dry run (save the dry run output with tee):
find -type d -exec rename -n 's/[|:]/_/g' {} \;
# commit the changes silently if you like what you see
find -type d -exec rename 's/[|:]/_/g' {} \; 
# or commit them and see the changes
find -type d -exec rename -v 's/[|:]/_/g' {} \; | tee -a /var/log/renames.log
# **STEP2**
# - now we change the files in all of the folders - #
# first do a dry run (save the dry run output with tee):
find -type f -exec rename -n 's/[|:]/_/g' {} \;
# commit the changes silently if you like what you see
find -type f -exec rename 's/[|:]/_/g' {} \; 
# or commit them and see the changes
find -type f -exec rename -v 's/[|:]/_/g' {} \; | tee -a /var/log/renames.log

The End

Leave a Reply

Your email address will not be published. Required fields are marked *