2 in 1: find duplicates & remove empty folders in linux and windows

Below content is sourced in the links. Most of the content is not my original content and im giving credit via the links.

Windows Remove EMPTY Folders:
##############################
##############################

http://sourceforge.net/projects/rem-empty-dir/

Or you can use the linux methods below in windows, if you have cygwin installed (which gives you linux commands in your windows)

Windows Dealing with duplicates
###############################
###############################

using linux tools in windows:
========================

You can try the below commands in windows thru cygwin, all of your localdrives are in /cygdrive folder such as /cygdrive/c or /cygdrive/d etc.

Here is the source code for fdupes, you might be able to compile it with windows gcc/c compilers, and/or windows cygwin with gcc/c compilers

fdupes uses md5 sum to look for duplicates. so it looks for exact duplicates

Look below at commands to see how to run fdupes

using windows tools:
=================

Also there is the following free app:
Duplicate Cleaner Free
It looks for similar files based on the md5 sum
http://www.digitalvolcano.co.uk/duplicatecleaner.html

It might be worth to get the full version as it has an interesting feature to look for similar images based on its image processing algorithms

REMOVE EMPTY FOLDERS IN LINUX
###############################
###############################

citation:

http://www.itworld.com/it-managementstrategy/67200/recursively-removing-empty-directories

http://www.commandlinefu.com/commands/view/5131/recursively-remove-all-empty-directories

method1 – with find
====================

See what it will delete:

# find . -type d -empty

Then delete:

# find . -type d -empty -delete

OR:

Delete with prompt

# find . -type d -empty -exec rm -i {} \;

method1 – with find and rmdir
===============================

Also you can use rmdir to delete only empty folders. Remember rmdir will cry if you try to delete a folder that has stuff in it – so it will by default only delete empty folders:

# find . -type d | xargs rmdir

That spits errors so do this

# find . -type d | xargs rmdir 2> /dev/null

or use rmdirs -s silent option to shut the errors up:

# find . -type d | xargs rmdir -s

Method3 – delete empty parent folders too
===========================================

But what about the parents of the folders? What if they become empty afterwards? Well you could repeat the commands

# find . -type d | xargs rmdir -ps

# find . -type d | xargs rmdir -p 2> /dev/null

or if you like seeing errors (“this folder is not empty”, it will in essence tell you what folders its not deleting) then do “# find . -type d | xargs rmdir -p” or “# find . -type d | xargs rmdir -p”

Linux Dealing with DUPLICATES
################################
################################

From commandline.fu
http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash

I will seperate the enteries out with #### lines

FDUPES
#######

This little Fdupes section is mine (I wrote it, unlike all of the sections below which are exerpts from commandline.fu)

# apt-get install fdupes

NOTE: this will not delete your duplicates with these command line arguments (to delete use -d arugment). To see all options use “fdupes –help”

cd to the folder you want to find duplicates

# cd /videos
# fdupes -r .

-r or -R (same option works) option for recursive search, or else it only looks for duplicates on same page

since fdupes generates a list of files (newline seperated, and sometimes double new line seperated for next duplicate sets). md5sum can handle newlines. So pump the output to md5sum like this, but not thru pipe.

# md5sum $(fdupes -r .)
# md5sum `fdupes -r .`

—————————-
—————————-

# find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Find Duplicate Files (based on size first, then MD5 hash)

This dup finder saves time by comparing size first, then md5sum, it doesn’t delete anything, just lists them.

—————————-
—————————-

# find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Find Duplicate Files (based on MD5 hash)

Calculates md5 sum of files. sort (required for uniq to work). uniq based on only the hash. use cut ro remove the hash from the result.

—————————-
—————————-

# fdupes -r .

Find Duplicate Files (based on size first, then MD5 hash)

If you have the fdupes command, you’ll save a lot of typing. It can do recursive searches (-r,-R) and it allows you to interactively select which of the duplicate files found you wish to keep or delete.

—————————-
—————————-

# find -type d -name ".svn" -prune -o -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type d -name ".svn" -prune -o -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Find Duplicate Files, excluding .svn-directories (based on size first, then MD5 hash)

Improvement of the command “Find Duplicate Files (based on size first, then MD5 hash)” when searching for duplicate files in a directory containing a subversion working copy. This way the (multiple dupicates) in the meta-information directories are ignored.
Can easily be adopted for other VCS as well. For CVS i.e. change “.svn” into “.csv”:

find -type d -name ".csv" -prune -o -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type d -name ".csv" -prune -o -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

—————————-
—————————-

# find . -type f -size +0 -printf "%-25s%p\n" | sort -n | uniq -D -w 25 | sed 's/^\w* *\(.*\)/md5sum "\1"/' | sh | sort | uniq -w32 --all-repeated=separate

Find Duplicate Files (based on size first, then MD5 hash)

Avoids the nested ‘find’ commands but doesn’t seem to run any faster than syssyphus’s solution.

—————————-
—————————-

# find . -type f -not -empty -printf "%-25s%p\n"|sort -n|uniq -D -w25|cut -b26-|xargs -d"\n" -n1 md5sum|sed "s/ /\x0/"|uniq -D -w32|awk -F"\0" 'BEGIN{l="";}{if(l!=$1||l==""){printf "\n%s\0",$1}printf "\0%s",$2;l=$1}END{printf "\n"}'|sed "/^$/d"

Find Duplicate Files (based on size first, then MD5 hash)

* Find all file sizes and file names from the current directory down (replace “.” with a target directory as needed).
* sort the file sizes in numeric order
* List only the duplicated file sizes
* drop the file sizes so there are simply a list of files (retain order)
* calculate md5sums on all of the files
* replace the first instance of two spaces (md5sum output) with a \0
* drop the unique md5sums so only duplicate files remain listed
* Use AWK to aggregate identical files on one line.
* Remove the blank line from the beginning (This was done more efficiently by putting another “IF” into the AWK command, but then the whole line exceeded the 255 char limit).
>>>> Each output line contains the md5sum and then all of the files that have that identical md5sum. All fields are \0 delimited. All records are \n delimited.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

infotinks

My Notes, Articles & Guides for Linux, Windows and Networking.

2 in 1: find duplicates & remove empty folders in linux and windows

Leave a Reply Cancel reply