Subversion Repositories sysadmin_scripts

Rev

Rev 161 | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 161 Rev 162
Line 1... Line 1...
1
# find greatest savings after fdupes run
1
# find greatest savings after fdupes run
2
 
2
 
3
Using fdupes to report on duplicate files, or even remove them automatically, is excellent. However, sometimes just working on a subset of the files can be done in 10% of the time and result in a 90% cleanup.
3
Using fdupes to report on duplicate files, or even remove them automatically, is excellent. However, sometimes just working on a subset of the files can be done in 10% of the time and result in a 90% cleanup.
4
 
4
 
5
fdupesGreatestSavings is a perl script that takes the output of fdupes (on
-
 
6
stdin) and determines which entries will result in the greatest savings,
-
 
7
whether it is 100 copies of a 1 Meg file or 2 copies of a 20G file. The
5
fdupesGreatestSavings is a perl script that takes the output of fdupes (on stdin) and determines which entries will result in the greatest savings, whether it is 100 copies of a 1 Meg file or 2 copies of a 20G file. The output is sorted by greatest savings to least.
8
output is sorted by greatest savings to least.
-
 
9
 
6
 
10
fdupesGreatestSavings takes one parameter, the number of entries to display.
7
fdupesGreatestSavings takes one parameter, the number of entries to display. It accepts input from stdin and sends output to stdout, so it is a filter.
11
It accepts input from stdin and sends output to stdout, so it is a filter.
-
 
12
 
8
 
13
> fdupes must be run with only the flags --recurse and --size, though
9
> fdupes must be run with only the flags --recurse and --size, though
14
> --recurse is optional.
10
> --recurse is optional.
15
 
11
 
16
The following command will look through the entire file system on a Unix
12
The following command will look through the entire file system on a Unix machine and report the top 10 duplicates it finds.
17
machine and report the top 10 duplicates it finds.
-
 
18
 
13
 
19
    fdupes --recurse --size / | ./fdupesGreatestSavings 10
14
    fdupes --recurse --size / | ./fdupesGreatestSavings 10
20
 
15
 
21
If you want to save the results (for other procesing), you could do
16
If you want to save the results (for other procesing), you could do something like
22
something like
-
 
23
 
17
 
24
    fdupes --recurse --size  /path/to/be/checked > /tmp/duplicate_files
18
    fdupes --recurse --size  /path/to/be/checked > /tmp/duplicate_files
25
    fdupesGreatestSavings 100 < /tmp/duplicate_files > /tmp/fdupe.savings
19
    fdupesGreatestSavings 100 < /tmp/duplicate_files > /tmp/fdupe.savings
26
 
20
 
27
#### Downloading
21
#### Downloading
Line 29... Line 23...
29
Script is available via subversion at
23
Script is available via subversion at
30
    svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/fdupes
24
    svn co http://svn.dailydata.net/svn/sysadmin_scripts/trunk/fdupes
31
 
25
 
32
#### Bugs
26
#### Bugs
33
 
27
 
34
The only bug I have found so far is that the count of the number of files is
28
The only bug I have found so far is that the count of the number of files is incorrect (always 2), and I haven't tracked it down. Total space used by an entry is correct, however.
35
incorrect (always 2), and I haven't tracked it down. Total space used by an
-
 
36
entry is correct, however.
-