Subversion Repositories sysadmin_scripts

Rev

Rev 160 | Go to most recent revision | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
158 rodolico 1
# find greatest savings after fdupes run
2
 
161 rodolico 3
Using fdupes to report on duplicate files, or even remove them automatically, is excellent. However, sometimes just working on a subset of the files can be done in 10% of the time and result in a 90% cleanup.
158 rodolico 4
 
5
fdupesGreatestSavings is a perl script that takes the output of fdupes (on
6
stdin) and determines which entries will result in the greatest savings,
7
whether it is 100 copies of a 1 Meg file or 2 copies of a 20G file. The
8
output is sorted by greatest savings to least.
9
 
10
fdupesGreatestSavings takes one parameter, the number of entries to display.
11
It accepts input from stdin and sends output to stdout, so it is a filter.
12
 
13
> fdupes must be run with only the flags --recurse and --size, though
14
> --recurse is optional.
15
 
16
The following command will look through the entire file system on a Unix
17
machine and report the top 10 duplicates it finds.
18
 
19
    fdupes --recurse --size / | ./fdupesGreatestSavings 10
20
 
21
If you want to save the results (for other procesing), you could do
22
something like
23
 
24
    fdupes --recurse --size  /path/to/be/checked > /tmp/duplicate_files
25
    fdupesGreatestSavings 100 < /tmp/duplicate_files > /tmp/fdupe.savings
26
 
160 rodolico 27
#### Downloading
159 rodolico 28
 
29
Script is available via subversion at
30
    svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/fdupes
31
 
160 rodolico 32
#### Bugs
158 rodolico 33
 
34
The only bug I have found so far is that the count of the number of files is
35
incorrect (always 2), and I haven't tracked it down. Total space used by an
36
entry is correct, however.