Path	Last modification	Log	Download	RSS
fdupesGreatestSavings	158	423 d 16 h	rodolico	Log	Download	RSS
README.md	161	423 d 16 h	rodolico	Log	Download	RSS

find greatest savings after fdupes run

Using fdupes to report on duplicate files, or even remove them automatically, is excellent. However, sometimes just working on a subset of the files can be done in 10% of the time and result in a 90% cleanup.

fdupesGreatestSavings is a perl script that takes the output of fdupes (on stdin) and determines which entries will result in the greatest savings, whether it is 100 copies of a 1 Meg file or 2 copies of a 20G file. The output is sorted by greatest savings to least.

fdupesGreatestSavings takes one parameter, the number of entries to display. It accepts input from stdin and sends output to stdout, so it is a filter.

fdupes must be run with only the flags --recurse and --size, though

--recurse is optional.

The following command will look through the entire file system on a Unix machine and report the top 10 duplicates it finds.

fdupes --recurse --size / | ./fdupesGreatestSavings 10

If you want to save the results (for other procesing), you could do something like

fdupes --recurse --size  /path/to/be/checked > /tmp/duplicate_files

fdupesGreatestSavings 100 < /tmp/duplicate_files > /tmp/fdupe.savings

Downloading

Script is available via subversion at

svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/fdupes

Bugs

The only bug I have found so far is that the count of the number of files is incorrect (always 2), and I haven't tracked it down. Total space used by an entry is correct, however.

Subversion Repositories sysadmin_scripts

(root)/trunk/fdupes/ – Rev 161

Last modification

find greatest savings after fdupes run

Downloading

Bugs