Subversion Repositories sysadmin_scripts

Rev

Rev 159 | Rev 161 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
158 rodolico 1
# find greatest savings after fdupes run
2
 
3
Using fdupes to report on duplicate files, or even remove them
4
automatically, is excellent. However, sometimes just working on a subset of
5
the files can be done in 10% of the time and result in a 90% cleanup.
6
 
7
fdupesGreatestSavings is a perl script that takes the output of fdupes (on
8
stdin) and determines which entries will result in the greatest savings,
9
whether it is 100 copies of a 1 Meg file or 2 copies of a 20G file. The
10
output is sorted by greatest savings to least.
11
 
12
fdupesGreatestSavings takes one parameter, the number of entries to display.
13
It accepts input from stdin and sends output to stdout, so it is a filter.
14
 
15
> fdupes must be run with only the flags --recurse and --size, though
16
> --recurse is optional.
17
 
18
The following command will look through the entire file system on a Unix
19
machine and report the top 10 duplicates it finds.
20
 
21
    fdupes --recurse --size / | ./fdupesGreatestSavings 10
22
 
23
If you want to save the results (for other procesing), you could do
24
something like
25
 
26
    fdupes --recurse --size  /path/to/be/checked > /tmp/duplicate_files
27
    fdupesGreatestSavings 100 < /tmp/duplicate_files > /tmp/fdupe.savings
28
 
160 rodolico 29
#### Downloading
159 rodolico 30
 
31
Script is available via subversion at
32
    svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/fdupes
33
 
160 rodolico 34
#### Bugs
158 rodolico 35
 
36
The only bug I have found so far is that the count of the number of files is
37
incorrect (always 2), and I haven't tracked it down. Total space used by an
38
entry is correct, however.