unique contribution of folder to disk usage

up vote
4
down vote

favorite

I have a backup containing folders for daily snapshots. To save space, identical files in different snapshots are deduplicated via hard links (generated by rsync).

When I'm running out of space, one option is to delete older snapshots. But because of the hard links, it is hard to figure out how much space I would gain by deleting a given snapshot.

One option I can think of would be to use du -s first on all snapshot folders, then on all but the one I might delete, and the difference would give me the expected gained space. However, that's quite cumbersome and would have to be repeated when I'm trying to find a suitable snapshot for deletion.

Is there an easier way?

After trying out and thinking about the answers by Stéphane Chazelas and derobert, I realized that my question was not precise enough. Here's an attempt to be more precise:

I have a set of directories ("snapshots") which contain files which are partially storage-identical (hard linked) with files in another snapshot. I'm looking for a solution that gives me a list of the snapshots and for each the amount of used disk storage taken up by the files in it, but without that storage which is also used by a file in another snapshot. I would like to allow for the possibility that there are hard links within each snapshot.

The idea is that I can look at that list to decide which of the snapshots I should delete when I run out of space, which is a trade-off between storage space gained by deletion and value of the snapshot (e.g. based on age).

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

See also unix.stackexchange.com/questions/52876/…
– derobert
Nov 2 at 17:09

1

But again, looking at the unique disk usage of directories in isolation is not necessarily useful. You may find that deleting dir1 saves nothing, that deleting dir2 saves nothing either, but that deleting both saves terabytes because they have large files in common that are not found elsewhere.
– Stéphane Chazelas
Nov 18 at 17:12

add a comment |

up vote
4
down vote

favorite

I have a backup containing folders for daily snapshots. To save space, identical files in different snapshots are deduplicated via hard links (generated by rsync).

When I'm running out of space, one option is to delete older snapshots. But because of the hard links, it is hard to figure out how much space I would gain by deleting a given snapshot.

Is there an easier way?

After trying out and thinking about the answers by Stéphane Chazelas and derobert, I realized that my question was not precise enough. Here's an attempt to be more precise:

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

See also unix.stackexchange.com/questions/52876/…
– derobert
Nov 2 at 17:09

1

But again, looking at the unique disk usage of directories in isolation is not necessarily useful. You may find that deleting dir1 saves nothing, that deleting dir2 saves nothing either, but that deleting both saves terabytes because they have large files in common that are not found elsewhere.
– Stéphane Chazelas
Nov 18 at 17:12

add a comment |

up vote
4
down vote

favorite

I have a backup containing folders for daily snapshots. To save space, identical files in different snapshots are deduplicated via hard links (generated by rsync).

When I'm running out of space, one option is to delete older snapshots. But because of the hard links, it is hard to figure out how much space I would gain by deleting a given snapshot.

Is there an easier way?

After trying out and thinking about the answers by Stéphane Chazelas and derobert, I realized that my question was not precise enough. Here's an attempt to be more precise:

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

I have a backup containing folders for daily snapshots. To save space, identical files in different snapshots are deduplicated via hard links (generated by rsync).

When I'm running out of space, one option is to delete older snapshots. But because of the hard links, it is hard to figure out how much space I would gain by deleting a given snapshot.

Is there an easier way?

After trying out and thinking about the answers by Stéphane Chazelas and derobert, I realized that my question was not precise enough. Here's an attempt to be more precise:

disk-usage hard-link

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

edited Nov 17 at 21:54

asked Oct 31 at 20:02

A. Donda

1558

asked Oct 31 at 20:02

A. Donda

1558

asked Oct 31 at 20:02

A. Donda

1558

See also unix.stackexchange.com/questions/52876/…
– derobert
Nov 2 at 17:09

1

But again, looking at the unique disk usage of directories in isolation is not necessarily useful. You may find that deleting dir1 saves nothing, that deleting dir2 saves nothing either, but that deleting both saves terabytes because they have large files in common that are not found elsewhere.
– Stéphane Chazelas
Nov 18 at 17:12

add a comment |

See also unix.stackexchange.com/questions/52876/…
– derobert
Nov 2 at 17:09

1

But again, looking at the unique disk usage of directories in isolation is not necessarily useful. You may find that deleting dir1 saves nothing, that deleting dir2 saves nothing either, but that deleting both saves terabytes because they have large files in common that are not found elsewhere.
– Stéphane Chazelas
Nov 18 at 17:12

See also unix.stackexchange.com/questions/52876/…
– derobert
Nov 2 at 17:09

But again, looking at the unique disk usage of directories in isolation is not necessarily useful. You may find that deleting dir1 saves nothing, that deleting dir2 saves nothing either, but that deleting both saves terabytes because they have large files in common that are not found elsewhere.
– Stéphane Chazelas
Nov 18 at 17:12

add a comment |

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

You could do it by hand with GNU find:

find snapshot-dir -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

   END{print t*512}'

That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.

find prints:

1 <disk-usage> for directories

<link-count> <disk-usage> <inode-number> for other types of files.

We pretend the link count is always one for directories, because when in practice it's not, its because of the .. entries, and find doesn't list those entries, and directories generally don't have other hardlinks.

From that output, awk counts the disk usage of the entries that have link count of 1 and also of the inodes which it has seen <link-count> times (that is the ones whose all hard links are in the current directory and so, like the ones with a link-count of one would have their space reclaimed once the directory tree is deleted).

You can also use find snapshot-dir1 snapshot-dir2 to find out how much disk space would be reclaimed if both dirs were removed (which may be more than the sum of the space for the two directories taken individually if there are are files that are found in both and only in those snapshots).

If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:

find snapshot-dir* ( -path '*/*' -o -printf "%p:n" ) 

  -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}

        $1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

        END{print t*512}'

That processes the list of snapshots in lexical order. If you processed it in a different order, that would likely give you different numbers except for the final one (when all snapshots are removed).

See numfmt to make the numbers more readable.

That assumes all files are on the same filesystem. If not, you can replace %i with %D:%i (if they're not all on the same filesystem, that would mean you'd have a mount point in there which you couldn't remove anyway).

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

1

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

|
show 2 more comments

up vote
1
down vote

If your file names don't contain pattern characters or newlines, you can use find + du's exclude feature to do this:

find -links +1 -type f 

    | cut -d/ -f2- 

    | du --exclude-from=- -s *

The find bit gets all the files (-type f) with a hardlink count greater than 1 (-links +1). The cut trims off the leading ./ find prints out. Then du is asked for disk usage of every directory, excluding all the files with multiple links. Of course, once you delete a snapshot, it's possible there are now files with only one link that previously had two — so every few deletes, you really ought to re-run it.

If it needs to work with arbitrary file names, it'd require some more scripting to replace du (those are shell patterns, so escaping is not possible).

Also, as Stéphane Chazelas points out, if there are hardlinks inside of one snapshot (all the names of the file reside within a single snapshot, not hardlinks between snapshots), those files will be excluded from the totals (even though deleting the snapshot would recover that space).

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

1

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

add a comment |

up vote
0
down vote

Since I wrote this answer, Stéphane Chazelas has convinced me that his answer was right all along. I leave my answer including code because it works well, too, and provides some pretty-printing. Its output looks like this:

              total               unique

--T---G---M---k---B  --T---G---M---k---B

     91,044,435,456          665,754,624  back-2018-03-01T06:00:01

     91,160,015,360          625,541,632  back-2018-04-01T06:00:01

     91,235,970,560          581,360,640  back-2018-05-01T06:00:01

     91,474,846,208          897,665,536  back-2018-06-01T06:00:01

     91,428,597,760          668,853,760  back-2018-07-01T06:00:01

     91,602,767,360          660,594,176  back-2018-08-01T06:00:01

     91,062,218,752        1,094,236,160  back-2018-09-01T06:00:01

    230,810,647,552       50,314,291,712  back-2018-11-01T06:00:01

    220,587,811,328          256,036,352  back-2018-11-12T06:00:01

    220,605,425,664          267,876,352  back-2018-11-13T06:00:01

    220,608,163,328          268,711,424  back-2018-11-14T06:00:01

    220,882,714,112          272,000,000  back-2018-11-15T06:00:01

    220,882,118,656          263,202,304  back-2018-11-16T06:00:01

    220,882,081,792          263,165,440  back-2018-11-17T06:00:01

    220,894,113,280          312,208,896  back-2018-11-18T06:00:01

Since I wasn't 100% happy with either of the two answers (as of 2018-11-18) – though I learned from both of them – I created my own tool and am publishing it here.

Similar to Stéphane Chazelas's answer, it uses find to obtain a list of inodes and associated file / directory sizes, ~~but doesn't rely on the "at most one link" heuristic. Instead,~~ it creates a list of unique inodes (not files/directories!) for each input directory, filters out the inodes from the other directories, and them sums the remaining inodes' sizes. This way it accounts for possible hardlinks within each input directory. As a side effect, it disregards possible hardlinks from outside of the set of input directories.

bash-external tools that are used: find, xargs, mktemp, sort, tput, awk, tr, numfmt, touch, cat, comm, rm. I know, not exactly lightweight, but it does exactly what I want it to do. I share it here in case someone else has similar needs.

If anything can be done more efficiently or foolproof, comments are welcome! I'm anything but a bash master.

To use it, save the following code to a script file duu.sh. A short usage instruction is contained in the first comment block.

#!/bin/bash



# duu

#

# disk usage unique to a directory within a set of directories

#

# Call with a list of directory names. If called without arguments,

# it operates on the subdirectories of the current directory.





# no arguments: call itself with subdirectories of .

if [ "$#" -eq 0 ]

then

    exec find . -maxdepth 1 -type d ! -name . -printf '%P' | sort -z 

        | xargs -r --null "$0"

    exit

fi





# create temporary directory

T=`mktemp -d`

# array of directory names

dirs=("$@")

# number of directories

n="$#"



# for each directory, create list of (unique) inodes with size

for i in $(seq 1 $n)

do

    echo -n "reading $i/$n: ${dirs[$i - 1]} "

    find "${dirs[$i - 1]}" -printf "%it%bn" | sort -u > "$T/$i"

    # find %b: "The amount of disk space used for this file in 512-byte blocks."

    echo -ne "r"

    tput el

done



# print header

echo "              total               unique"

echo "--T---G---M---k---B  --T---G---M---k---B"



# for each directory

for i in $(seq 1 $n)

do

    # compute and print total size

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/$i" 

        | tr -d 'n' 

        | numfmt --grouping --padding 19

    echo -n "  "



    # compute and print unique size

    #   create list of (unique) inodes in the other directories

    touch "$T/o$i"

    for j in $(seq 1 $n)

    do

        if [ "$j" -ne "$i" ]

        then

            cat "$T/$j" >> "$T/o$i"

        fi

    done

    sort -o "$T/o$i" -u "$T/o$i"

    #   create list of (unique) inodes that are in this but not in the other directories

    comm -23 "$T/$i" "$T/o$i" > "$T/u$i"

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/u$i" 

        | tr -d 'n' 

        | numfmt  --grouping --padding 19

    #   append directory name

    echo "  ${dirs[$i - 1]}"

done



# remove temporary files

rm -rf "$T"

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f478977%2funique-contribution-of-folder-to-disk-usage%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

You could do it by hand with GNU find:

find snapshot-dir -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

   END{print t*512}'

That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.

find prints:

1 <disk-usage> for directories

<link-count> <disk-usage> <inode-number> for other types of files.

If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:

find snapshot-dir* ( -path '*/*' -o -printf "%p:n" ) 

  -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}

        $1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

        END{print t*512}'

See numfmt to make the numbers more readable.

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

1

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

|
show 2 more comments

up vote
2
down vote

accepted

You could do it by hand with GNU find:

find snapshot-dir -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

   END{print t*512}'

That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.

find prints:

1 <disk-usage> for directories

<link-count> <disk-usage> <inode-number> for other types of files.

If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:

find snapshot-dir* ( -path '*/*' -o -printf "%p:n" ) 

  -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}

        $1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

        END{print t*512}'

See numfmt to make the numbers more readable.

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

1

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

|
show 2 more comments

up vote
2
down vote

accepted

You could do it by hand with GNU find:

find snapshot-dir -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

   END{print t*512}'

That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.

find prints:

1 <disk-usage> for directories

<link-count> <disk-usage> <inode-number> for other types of files.

If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:

find snapshot-dir* ( -path '*/*' -o -printf "%p:n" ) 

  -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}

        $1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

        END{print t*512}'

See numfmt to make the numbers more readable.

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

You could do it by hand with GNU find:

find snapshot-dir -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

   END{print t*512}'

That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.

find prints:

1 <disk-usage> for directories

<link-count> <disk-usage> <inode-number> for other types of files.

If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:

find snapshot-dir* ( -path '*/*' -o -printf "%p:n" ) 

  -type d -printf '1 %bn' -o -printf '%n %b %in' |

   awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}

        $1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}

        END{print t*512}'

See numfmt to make the numbers more readable.

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

edited Nov 18 at 16:47

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

answered Oct 31 at 21:54

Stéphane Chazelas

294k54553894

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

1

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

|
show 2 more comments

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

1

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

This seems to work, thanks, which is why I have upvoted it. As you write, I would have to repeat this, which I can do with a simple loop. However, that would take a long time (like derobert's answer) and is therefore not really practical. I believe there must be a solution to do this more effectively for many or all snapshot folders at once, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:50

@A.Donda, I'm not sure what you mean. What do you want to repeat? What do you want to achieve? Do you want to know how many snapshots you need to remove to be able to reclaim say 1TB? Would you delete snapshots in sequence or based on some criteria?
– Stéphane Chazelas
Nov 2 at 16:55

@A.Donda, see if the edit answers your loop question.
– Stéphane Chazelas
Nov 2 at 17:18

The idea would be to have a list of the snapshots, each with the amount of space freed if that respective snapshot is deleted. This way I would avoid deleting snapshots that don't amount to much anyway, and not have to repeat your original command manually for each snapshot. I believe this is what you have solved with your update?
– A. Donda
Nov 4 at 1:10

I ended up creating my own bash script (see new answer), but I learned a lot from yours: I also use find, awk and numfmt. The difference is that I filter out inodes using comm. Thanks again!
– A. Donda
Nov 18 at 1:47

|
show 2 more comments

up vote
1
down vote

If your file names don't contain pattern characters or newlines, you can use find + du's exclude feature to do this:

find -links +1 -type f 

    | cut -d/ -f2- 

    | du --exclude-from=- -s *

If it needs to work with arbitrary file names, it'd require some more scripting to replace du (those are shell patterns, so escaping is not possible).

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

1

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

add a comment |

up vote
1
down vote

If your file names don't contain pattern characters or newlines, you can use find + du's exclude feature to do this:

find -links +1 -type f 

    | cut -d/ -f2- 

    | du --exclude-from=- -s *

If it needs to work with arbitrary file names, it'd require some more scripting to replace du (those are shell patterns, so escaping is not possible).

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

1

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

add a comment |

up vote
1
down vote

If your file names don't contain pattern characters or newlines, you can use find + du's exclude feature to do this:

find -links +1 -type f 

    | cut -d/ -f2- 

    | du --exclude-from=- -s *

If it needs to work with arbitrary file names, it'd require some more scripting to replace du (those are shell patterns, so escaping is not possible).

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

If your file names don't contain pattern characters or newlines, you can use find + du's exclude feature to do this:

find -links +1 -type f 

    | cut -d/ -f2- 

    | du --exclude-from=- -s *

If it needs to work with arbitrary file names, it'd require some more scripting to replace du (those are shell patterns, so escaping is not possible).

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

edited Nov 1 at 16:19

answered Oct 31 at 20:33

derobert

70.9k8151210

answered Oct 31 at 20:33

derobert

70.9k8151210

answered Oct 31 at 20:33

derobert

70.9k8151210

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

1

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

add a comment |

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

1

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

This seems to work, thanks, which is why I have upvoted it. However, it takes extremely long, probably due to the many du calls. I believe there must be a solution to do this more effectively, which is why I don't accept your answer yet.
– A. Donda
Nov 2 at 16:51

@A.Donda there is only one du call, but it's passed a potentially very long list of exclude patterns — that might be slowing it down. Curious if putting LC_ALL=C in front of du speeds it up (as long as your file names are ASCII). I fear doing this quickly needs a utility that actually tracks all the files.
– derobert
Nov 2 at 17:04

Yes I suspect, too, that it would be necessary to write one's own tool, which basically does the same as du, build a list of files and which inodes they refer to, but them process this list differently.
– A. Donda
Nov 4 at 1:11

I now believe what makes your solution slow is that the list of excluded files is huge (overlap between snapshots is only on the order of 10%). I experimented with a solution which instead explicitly includes files, but for reasons I don't completely understand it wasn't really working. I then decided not to rely on du, but make a tool that creates and modifies file lists itself, see my new answer. Thanks again!
– A. Donda
Nov 18 at 1:44

add a comment |

up vote
0
down vote

              total               unique

--T---G---M---k---B  --T---G---M---k---B

     91,044,435,456          665,754,624  back-2018-03-01T06:00:01

     91,160,015,360          625,541,632  back-2018-04-01T06:00:01

     91,235,970,560          581,360,640  back-2018-05-01T06:00:01

     91,474,846,208          897,665,536  back-2018-06-01T06:00:01

     91,428,597,760          668,853,760  back-2018-07-01T06:00:01

     91,602,767,360          660,594,176  back-2018-08-01T06:00:01

     91,062,218,752        1,094,236,160  back-2018-09-01T06:00:01

    230,810,647,552       50,314,291,712  back-2018-11-01T06:00:01

    220,587,811,328          256,036,352  back-2018-11-12T06:00:01

    220,605,425,664          267,876,352  back-2018-11-13T06:00:01

    220,608,163,328          268,711,424  back-2018-11-14T06:00:01

    220,882,714,112          272,000,000  back-2018-11-15T06:00:01

    220,882,118,656          263,202,304  back-2018-11-16T06:00:01

    220,882,081,792          263,165,440  back-2018-11-17T06:00:01

    220,894,113,280          312,208,896  back-2018-11-18T06:00:01

Since I wasn't 100% happy with either of the two answers (as of 2018-11-18) – though I learned from both of them – I created my own tool and am publishing it here.

If anything can be done more efficiently or foolproof, comments are welcome! I'm anything but a bash master.

To use it, save the following code to a script file duu.sh. A short usage instruction is contained in the first comment block.

#!/bin/bash



# duu

#

# disk usage unique to a directory within a set of directories

#

# Call with a list of directory names. If called without arguments,

# it operates on the subdirectories of the current directory.





# no arguments: call itself with subdirectories of .

if [ "$#" -eq 0 ]

then

    exec find . -maxdepth 1 -type d ! -name . -printf '%P' | sort -z 

        | xargs -r --null "$0"

    exit

fi





# create temporary directory

T=`mktemp -d`

# array of directory names

dirs=("$@")

# number of directories

n="$#"



# for each directory, create list of (unique) inodes with size

for i in $(seq 1 $n)

do

    echo -n "reading $i/$n: ${dirs[$i - 1]} "

    find "${dirs[$i - 1]}" -printf "%it%bn" | sort -u > "$T/$i"

    # find %b: "The amount of disk space used for this file in 512-byte blocks."

    echo -ne "r"

    tput el

done



# print header

echo "              total               unique"

echo "--T---G---M---k---B  --T---G---M---k---B"



# for each directory

for i in $(seq 1 $n)

do

    # compute and print total size

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/$i" 

        | tr -d 'n' 

        | numfmt --grouping --padding 19

    echo -n "  "



    # compute and print unique size

    #   create list of (unique) inodes in the other directories

    touch "$T/o$i"

    for j in $(seq 1 $n)

    do

        if [ "$j" -ne "$i" ]

        then

            cat "$T/$j" >> "$T/o$i"

        fi

    done

    sort -o "$T/o$i" -u "$T/o$i"

    #   create list of (unique) inodes that are in this but not in the other directories

    comm -23 "$T/$i" "$T/o$i" > "$T/u$i"

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/u$i" 

        | tr -d 'n' 

        | numfmt  --grouping --padding 19

    #   append directory name

    echo "  ${dirs[$i - 1]}"

done



# remove temporary files

rm -rf "$T"

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

add a comment |

up vote
0
down vote

              total               unique

--T---G---M---k---B  --T---G---M---k---B

     91,044,435,456          665,754,624  back-2018-03-01T06:00:01

     91,160,015,360          625,541,632  back-2018-04-01T06:00:01

     91,235,970,560          581,360,640  back-2018-05-01T06:00:01

     91,474,846,208          897,665,536  back-2018-06-01T06:00:01

     91,428,597,760          668,853,760  back-2018-07-01T06:00:01

     91,602,767,360          660,594,176  back-2018-08-01T06:00:01

     91,062,218,752        1,094,236,160  back-2018-09-01T06:00:01

    230,810,647,552       50,314,291,712  back-2018-11-01T06:00:01

    220,587,811,328          256,036,352  back-2018-11-12T06:00:01

    220,605,425,664          267,876,352  back-2018-11-13T06:00:01

    220,608,163,328          268,711,424  back-2018-11-14T06:00:01

    220,882,714,112          272,000,000  back-2018-11-15T06:00:01

    220,882,118,656          263,202,304  back-2018-11-16T06:00:01

    220,882,081,792          263,165,440  back-2018-11-17T06:00:01

    220,894,113,280          312,208,896  back-2018-11-18T06:00:01

Since I wasn't 100% happy with either of the two answers (as of 2018-11-18) – though I learned from both of them – I created my own tool and am publishing it here.

If anything can be done more efficiently or foolproof, comments are welcome! I'm anything but a bash master.

To use it, save the following code to a script file duu.sh. A short usage instruction is contained in the first comment block.

#!/bin/bash



# duu

#

# disk usage unique to a directory within a set of directories

#

# Call with a list of directory names. If called without arguments,

# it operates on the subdirectories of the current directory.





# no arguments: call itself with subdirectories of .

if [ "$#" -eq 0 ]

then

    exec find . -maxdepth 1 -type d ! -name . -printf '%P' | sort -z 

        | xargs -r --null "$0"

    exit

fi





# create temporary directory

T=`mktemp -d`

# array of directory names

dirs=("$@")

# number of directories

n="$#"



# for each directory, create list of (unique) inodes with size

for i in $(seq 1 $n)

do

    echo -n "reading $i/$n: ${dirs[$i - 1]} "

    find "${dirs[$i - 1]}" -printf "%it%bn" | sort -u > "$T/$i"

    # find %b: "The amount of disk space used for this file in 512-byte blocks."

    echo -ne "r"

    tput el

done



# print header

echo "              total               unique"

echo "--T---G---M---k---B  --T---G---M---k---B"



# for each directory

for i in $(seq 1 $n)

do

    # compute and print total size

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/$i" 

        | tr -d 'n' 

        | numfmt --grouping --padding 19

    echo -n "  "



    # compute and print unique size

    #   create list of (unique) inodes in the other directories

    touch "$T/o$i"

    for j in $(seq 1 $n)

    do

        if [ "$j" -ne "$i" ]

        then

            cat "$T/$j" >> "$T/o$i"

        fi

    done

    sort -o "$T/o$i" -u "$T/o$i"

    #   create list of (unique) inodes that are in this but not in the other directories

    comm -23 "$T/$i" "$T/o$i" > "$T/u$i"

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/u$i" 

        | tr -d 'n' 

        | numfmt  --grouping --padding 19

    #   append directory name

    echo "  ${dirs[$i - 1]}"

done



# remove temporary files

rm -rf "$T"

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

add a comment |

up vote
0
down vote

              total               unique

--T---G---M---k---B  --T---G---M---k---B

     91,044,435,456          665,754,624  back-2018-03-01T06:00:01

     91,160,015,360          625,541,632  back-2018-04-01T06:00:01

     91,235,970,560          581,360,640  back-2018-05-01T06:00:01

     91,474,846,208          897,665,536  back-2018-06-01T06:00:01

     91,428,597,760          668,853,760  back-2018-07-01T06:00:01

     91,602,767,360          660,594,176  back-2018-08-01T06:00:01

     91,062,218,752        1,094,236,160  back-2018-09-01T06:00:01

    230,810,647,552       50,314,291,712  back-2018-11-01T06:00:01

    220,587,811,328          256,036,352  back-2018-11-12T06:00:01

    220,605,425,664          267,876,352  back-2018-11-13T06:00:01

    220,608,163,328          268,711,424  back-2018-11-14T06:00:01

    220,882,714,112          272,000,000  back-2018-11-15T06:00:01

    220,882,118,656          263,202,304  back-2018-11-16T06:00:01

    220,882,081,792          263,165,440  back-2018-11-17T06:00:01

    220,894,113,280          312,208,896  back-2018-11-18T06:00:01

Since I wasn't 100% happy with either of the two answers (as of 2018-11-18) – though I learned from both of them – I created my own tool and am publishing it here.

If anything can be done more efficiently or foolproof, comments are welcome! I'm anything but a bash master.

To use it, save the following code to a script file duu.sh. A short usage instruction is contained in the first comment block.

#!/bin/bash



# duu

#

# disk usage unique to a directory within a set of directories

#

# Call with a list of directory names. If called without arguments,

# it operates on the subdirectories of the current directory.





# no arguments: call itself with subdirectories of .

if [ "$#" -eq 0 ]

then

    exec find . -maxdepth 1 -type d ! -name . -printf '%P' | sort -z 

        | xargs -r --null "$0"

    exit

fi





# create temporary directory

T=`mktemp -d`

# array of directory names

dirs=("$@")

# number of directories

n="$#"



# for each directory, create list of (unique) inodes with size

for i in $(seq 1 $n)

do

    echo -n "reading $i/$n: ${dirs[$i - 1]} "

    find "${dirs[$i - 1]}" -printf "%it%bn" | sort -u > "$T/$i"

    # find %b: "The amount of disk space used for this file in 512-byte blocks."

    echo -ne "r"

    tput el

done



# print header

echo "              total               unique"

echo "--T---G---M---k---B  --T---G---M---k---B"



# for each directory

for i in $(seq 1 $n)

do

    # compute and print total size

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/$i" 

        | tr -d 'n' 

        | numfmt --grouping --padding 19

    echo -n "  "



    # compute and print unique size

    #   create list of (unique) inodes in the other directories

    touch "$T/o$i"

    for j in $(seq 1 $n)

    do

        if [ "$j" -ne "$i" ]

        then

            cat "$T/$j" >> "$T/o$i"

        fi

    done

    sort -o "$T/o$i" -u "$T/o$i"

    #   create list of (unique) inodes that are in this but not in the other directories

    comm -23 "$T/$i" "$T/o$i" > "$T/u$i"

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/u$i" 

        | tr -d 'n' 

        | numfmt  --grouping --padding 19

    #   append directory name

    echo "  ${dirs[$i - 1]}"

done



# remove temporary files

rm -rf "$T"

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

              total               unique

--T---G---M---k---B  --T---G---M---k---B

     91,044,435,456          665,754,624  back-2018-03-01T06:00:01

     91,160,015,360          625,541,632  back-2018-04-01T06:00:01

     91,235,970,560          581,360,640  back-2018-05-01T06:00:01

     91,474,846,208          897,665,536  back-2018-06-01T06:00:01

     91,428,597,760          668,853,760  back-2018-07-01T06:00:01

     91,602,767,360          660,594,176  back-2018-08-01T06:00:01

     91,062,218,752        1,094,236,160  back-2018-09-01T06:00:01

    230,810,647,552       50,314,291,712  back-2018-11-01T06:00:01

    220,587,811,328          256,036,352  back-2018-11-12T06:00:01

    220,605,425,664          267,876,352  back-2018-11-13T06:00:01

    220,608,163,328          268,711,424  back-2018-11-14T06:00:01

    220,882,714,112          272,000,000  back-2018-11-15T06:00:01

    220,882,118,656          263,202,304  back-2018-11-16T06:00:01

    220,882,081,792          263,165,440  back-2018-11-17T06:00:01

    220,894,113,280          312,208,896  back-2018-11-18T06:00:01

Since I wasn't 100% happy with either of the two answers (as of 2018-11-18) – though I learned from both of them – I created my own tool and am publishing it here.

If anything can be done more efficiently or foolproof, comments are welcome! I'm anything but a bash master.

To use it, save the following code to a script file duu.sh. A short usage instruction is contained in the first comment block.

#!/bin/bash



# duu

#

# disk usage unique to a directory within a set of directories

#

# Call with a list of directory names. If called without arguments,

# it operates on the subdirectories of the current directory.





# no arguments: call itself with subdirectories of .

if [ "$#" -eq 0 ]

then

    exec find . -maxdepth 1 -type d ! -name . -printf '%P' | sort -z 

        | xargs -r --null "$0"

    exit

fi





# create temporary directory

T=`mktemp -d`

# array of directory names

dirs=("$@")

# number of directories

n="$#"



# for each directory, create list of (unique) inodes with size

for i in $(seq 1 $n)

do

    echo -n "reading $i/$n: ${dirs[$i - 1]} "

    find "${dirs[$i - 1]}" -printf "%it%bn" | sort -u > "$T/$i"

    # find %b: "The amount of disk space used for this file in 512-byte blocks."

    echo -ne "r"

    tput el

done



# print header

echo "              total               unique"

echo "--T---G---M---k---B  --T---G---M---k---B"



# for each directory

for i in $(seq 1 $n)

do

    # compute and print total size

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/$i" 

        | tr -d 'n' 

        | numfmt --grouping --padding 19

    echo -n "  "



    # compute and print unique size

    #   create list of (unique) inodes in the other directories

    touch "$T/o$i"

    for j in $(seq 1 $n)

    do

        if [ "$j" -ne "$i" ]

        then

            cat "$T/$j" >> "$T/o$i"

        fi

    done

    sort -o "$T/o$i" -u "$T/o$i"

    #   create list of (unique) inodes that are in this but not in the other directories

    comm -23 "$T/$i" "$T/o$i" > "$T/u$i"

    #   sum block sizes and multiply by 512

    awk '{s += $2} END{printf "%.0f", s * 512}' "$T/u$i" 

        | tr -d 'n' 

        | numfmt  --grouping --padding 19

    #   append directory name

    echo "  ${dirs[$i - 1]}"

done



# remove temporary files

rm -rf "$T"

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

edited Nov 18 at 17:53

answered Nov 18 at 1:33

A. Donda

1558

answered Nov 18 at 1:33

A. Donda

1558

answered Nov 18 at 1:33

A. Donda

1558

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

add a comment |

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

I think you misunderstood my answer. It doesn't rely on the "at most one link" heuristic. It counts the disk usage of inodes that would be deleted if the directory was deleted, of all the files whose all links are found in the current directory.
– Stéphane Chazelas
Nov 18 at 9:00

@StéphaneChazelas, it's quite possible that I didn't understand your answer, and maybe it does exactly the right thing. If so, I would like to accept it. Could you explain your code in more detail?
– A. Donda
Nov 18 at 14:58

See edit of my answer. Does it make it any clearer?
– Stéphane Chazelas
Nov 18 at 16:49

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk