Why are there so many different ways to measure disk usage?
When I sum up the sizes of my files, I get one figure. If I run du
, I get another figure. If I run du
on all the files on my partition, it doesn't match what df
claims is used. Why are there so many different figures for the total size of my files? Can't computers add?
Speaking of adding: when I add the “Used” and “Available” columns of df
, I don't get the total figure. And that total figure is smaller than the size of my partition. And if I add up my partition sizes I don't get my disk size! What gives?
filesystems partition disk-usage
add a comment |
When I sum up the sizes of my files, I get one figure. If I run du
, I get another figure. If I run du
on all the files on my partition, it doesn't match what df
claims is used. Why are there so many different figures for the total size of my files? Can't computers add?
Speaking of adding: when I add the “Used” and “Available” columns of df
, I don't get the total figure. And that total figure is smaller than the size of my partition. And if I add up my partition sizes I don't get my disk size! What gives?
filesystems partition disk-usage
add a comment |
When I sum up the sizes of my files, I get one figure. If I run du
, I get another figure. If I run du
on all the files on my partition, it doesn't match what df
claims is used. Why are there so many different figures for the total size of my files? Can't computers add?
Speaking of adding: when I add the “Used” and “Available” columns of df
, I don't get the total figure. And that total figure is smaller than the size of my partition. And if I add up my partition sizes I don't get my disk size! What gives?
filesystems partition disk-usage
When I sum up the sizes of my files, I get one figure. If I run du
, I get another figure. If I run du
on all the files on my partition, it doesn't match what df
claims is used. Why are there so many different figures for the total size of my files? Can't computers add?
Speaking of adding: when I add the “Used” and “Available” columns of df
, I don't get the total figure. And that total figure is smaller than the size of my partition. And if I add up my partition sizes I don't get my disk size! What gives?
filesystems partition disk-usage
filesystems partition disk-usage
asked Mar 19 '14 at 3:28
Gilles
528k12810581583
528k12810581583
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
Adding up numbers is easy. The problem is, there are many different numbers to add.
How much disk space does a file use?
The basic idea is that a file containing n bytes uses n bytes of disk space, plus a bit for some control information: the file's metadata (permissions, timestamps, etc.), and a bit of overhead for the information that the system needs to find where the file is stored. However there are many complications.
Microscopic complications
Think of each file as a series of books in a library. Smaller files make up just one volume, but larger files consist of many volumes, like an encyclopedia. In order to be able to locate the files, there is a card catalog which references every volume. Each volume has a bit of overhead due to the covers. If a file is very small, this overhead is relatively large. Also the card catalog itself takes up some room.
Going a bit more technical, in a typical simple filesystem, the space is divided in blocks. A typical block size is 4KiB. Each file takes up an integer number of blocks. Unless the file size is a multiple of the block size, the last block is only partially used. So a 1-byte file and a 4096-byte file both take up 1 block, whereas a 4097-byte file takes up two blocks. You can observe this with the du
command: if your filesystem has a 4KiB block size, then du
will report 4KiB for a 1-byte file.
If a file is large, then additional blocks are needed just to store the list of blocks that make up the file (these are indirect blocks; more sophisticated filesystems may optimize this in the form of extents). Those don't show in the file size as reported by ls -l
or GNU du --apparent-size
; du
, which reports disk usage as opposed to size, does account for them.
Some filesystems try to reuse the free space left in the last block to pack several file tails in the same block. Some filesystems (such as ext4 since Linux 3.8 use 0 blocks for tiny files (just a few bytes) that entirely fit in the inode.
Macroscopic complications
Generally, as seen above, the total size reported by du
is the sum of the sizes of the blocks or extents used by the file.
The size reported by du
may be smaller if the file is compressed. Unix systems traditionally support a crude form of compression: if a file block contains only null bytes, then instead of storing a block of zeroes, the filesystem can omit that block altogether. A file with omitted blocks like this is called a sparse file. Sparse files are not automatically created when a file contains a large series of null bytes, the application must arrange for the file to become sparse.
Some filesystems such as btrfs and zfs support general-purpose compression.
Advanced complications
Two major features of very modern filesystems such as zfs and btrfs make the relationship between file size and disk usage significantly more distant: snapshots and deduplication.
Snapshots are a frozen state of the filesystem at a certain date. Filesystems that support this feature can contain multiple snapshots taken at different dates. These snapshots take room, of course. At one extreme, if you delete all the files from the active version of the filesystem, the filesystem won't become empty if there are snapshots remaining.
Any file or block that hasn't changed since a snapshot, or between two snapshots was taken exists identically in the snapshot and in the active version or other snapshot. This is implemented via copy-on-write. In some edge cases, it's possible that deleting a file on a full filesystem will fail due to insufficient available space — because removing that file would require making a copy of a block in the directory, and there's no more room for even that one block.
Deduplication is a storage optimization technique that consists of avoiding storing identical blocks. With typical data, looking for duplicates isn't always worth the effort. Both zfs and btrfs support deduplication as an optional feature.
Why is the total from du
different from the sum of the file sizes?
As we've seen above, the size reported by du
for each file is normally is the sum of the sizes of the blocks or extents used by the file. Note that by default, ls -l
lists sizes in bytes, but du
lists sizes in KiB, or in 512-byte units (sectors) on some more traditional systems (du -k
forces the use of kilobytes). Most modern unices support ls -lh
and du -h
to use “human-readable” numbers using K, M, G, etc. suffices (for KiB, MiB, GiB) as appropriate.
When you run du
on a directory, it sums up the disk usage of all the files in the directory tree, including the directories themselves. A directory contains data (the names of the files, and a pointer to where the file's metadata is), so it needs a bit of storage space. A small directory will take up one block, a larger directory will require more blocks. The amount of storage used by a directory sometimes depends not only on the files it contains but also the order in which they were inserted and in which some files are removed (with some filesystems, this can leave holes — a compromise between disk space and performance), but the difference will be tiny (an extra block here and there). When you run ls -ld /some/directory
, the directory's size is listed. (Note that the “total NNN” line at the top of the output from ls -l
is an unrelated number, it's the sum of the sizes in blocks of the listed items, expressed in KiB or sectors.)
Keep in mind that du
includes dot files which ls
doesn't show unless you use the -A
or -a
option.
Sometimes du
reports less than the expected sum. This happens if there are hard links inside the directory tree: du
counts each file only once.
On some file systems like ZFS
on Linux, du
does not report the full disk space occupied by extended attributes of a file.
Beware that if there are mount points under a directory, du
will count all the files on these mount points as well, unless given the -x
option. So if for instance you want the total size of the files in your root filesystem, run du -x /
, not du /
.
If a filesystem is mounted to a non-empty directory, the files in that directory are hidden by the mounted filesystem. They still occupy their space, but du
won't find them.
Deleted files
When a file is deleted, this only removes the directory entry, not necessarily the file itself. Two conditions are necessary in order to actually delete a file and thus reclaim its disk space:
- The file's link count must drop to 0: if a file has multiple hard links, removing one doesn't affect the others.
- As long as the file is open by some process, the data remains. Only when all processes have closed the file is the file deleted. The output
fuser -m
orlsof
on a mount point includes the processes that have a file open on that filesystem, even if the file is deleted. - even if no process has the deleted file open, the file's space may not be reclaimed if that file is the backend of a
loop
device.losetup -a
(asroot
) can tell you whichloop
devices are currently set up and on what file. The loop device must be destroyed (withlosetup -d
) before the disk space can be reclaimed.
If you delete a file in some file managers or GUI environments, it may be put into a trash area where it can be undeleted. As long as the file can be undeleted, its space is still consumed.
What are these numbers from df
exactly?
A typical filesystem contains:
- Blocks containing file (including directories) data and some metadata (including indirect blocks, and extended attributes on some filesystems).
- Free blocks.
- Blocks that are reserved to the root user.
- superblocks and other control information.
- Inodes
- A journal
Only the first kind is reported by du
. When it comes to df
, what goes into the “used”, “available” and total columns depends on the filesystem (of course used blocks (including indirect ones) are always in the “used” column, and unused blocks are always in the “available” column).
Filesystems in the ext2/ext3/ext4 reserve 5% of the space to the root user. This is useful on the root filesystem, to keep the system going if it fills up (in particular for logging, and to let the system administrator store a bit of data while fixing the problem). Even for data partitions such as /home
, keeping that reserved space is useful because an almost-full filesystem is prone to fragmentation. Linux tries to avoid fragmentation (which slows down file access, especially on rotating mechanical devices such as hard disks) by pre-allocating many consecutive blocks when a file is being written, but if there are not many consecutive blocks, that can't work.
Traditional filesystems, up to and including ext4 but not btrfs, reserve a fixed number of inodes when the filesystem is created. This significantly simplifies the design of the filesystem, but has the downside that the number of inodes needs to be sized properly: with too many inodes, space is wasted; with too few inodes, the filesystem may run out of inodes before running out of space. The command df -i
reports how many inodes are in use and how many are available (filesystems where the concept is not applicable may report 0).
Running tune2fs -l
on the volume containing an ext2/ext3/ext4 filesystem reports some statistics including the total number and number of free inodes and blocks.
Another feature that can confuse matter is subvolumes (supported in btrfs, and in zfs under the name datasets). Multiple subvolumes share the same space, but have separate directory tree roots.
If a filesystem is mounted over the network (NFS, Samba, etc.) and the server exports a portion of that filesystem (e.g. the server has a /home
filesystem, and exports /home/bob
), then df
on a client reflects the data for the whole filesystem, not just for the part that is exported and mounted on the client.
What's using the space on my disk?
As we've seen above, the total size reported by df
does not always take all the control data of the filesystem into account. Use filesystem-specific tools to get the exact size of the filesystem if needed. For example, with ext2/ext3/ext4, run tune2fs -l
and multiply the block size by the block count.
When you create a filesystem, it normally fills up the available space on the enclosing partition or volume. Sometimes you might end up with a smaller filesystem when you've been moving filesystems around or resizing volumes.
On Linux, lsblk
presents a nice overview of the available storage volumes. For additional information or if you don't have lsblk
, use specialized volume management or partitioning tools to check what partitions you have. On Linux, there's lvs
, vgs
, pvs
for LVM, fdisk
for traditional PC-style (“MBR”) partitions (as well as GPT on recent systems), gdisk
for GPT partitions, disklabel
for BSD disklabels, Parted, etc. Under Linux, cat /proc/partitions
gives a quick summary. Typical installations have at least two partitions or volumes used by the operating system: a filesystem (sometimes more), and a swap volume.
Some computers have a partition containing the BIOS or other diagnostic software. Computers with UEFI have a dedicated bootloader partition.
Finally, note that most computer programs use units based on powers of 1024 = 210 (because programmers love binary and powers of 2). So 1 kB = 1024 B, 1 MB = 1048576 B, 1 GB = 1073741824, 1 TB = 1099511627776 B, … Officially, these units are known as kibibyte KiB, mebibyte MiB, etc., but most software just reports k or kB, M or MB, etc. On the other hand, hard disk manufacturers systematically use metric (1000-based units). So that 1 TB drive is only 931 GiB or 0.904 TiB.
1
@Kiwytune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.
– Gilles
Mar 19 '14 at 10:10
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
du
does account for indirect blocks. That's the main difference from the file size as reported byls -l
.
– Stéphane Chazelas
Feb 6 '17 at 14:07
|
show 5 more comments
A short summary of complications to calculating file sizes and disk spaces:
The space the file takes on disk is a multiplier of the number of blocks it takes against the size of each block + the number of inodes it takes. A 1 byte long file will take at least 1 block, 1 inode and one directory entry.
But it could take only 1 additional directory entry if the file is a hard link to another file. It would be just another reference to the same set of blocks.
- The size of the contents of the file. This is what
ls
displays. - Free disk space is not the size of the largest file you can fit in or the sum of all file content sizes that will fit on the disk. It's somewhere in between. It depends on the number of files (taking up inodes) the block size and how closely each file's contents fill blocks completely.
This is just scratching the surface of file systems and it is overly simplified. Also remember that different file systems operate differently.
stat
is very helpful at spotting some of this information. Here's some examples of how to use stat and what it is good for: http://landoflinux.com/linux_stat_command_examples.html
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
add a comment |
I will illustrate here different cases that causes du
being different from df
.
df
counts the file system allocated blocks, du
use the size information of each files.
A difference can have many cause:
1) Unlinked (deleted) files that are still open by application. The file information are missing, the block are still allocated. lsof +aL1 <filesystem>
will helps you to identfy the processes. Most of the time you have to kill the processes to free the space (it depends on the process, sometimes a configuration reload is sufficient).
2) Files beneath mount points hidden to du
but not to df
. debugfs
can helps you to read the filesystem.
$ sudo debugfs
debugfs 1.42.12 (29-Aug-2014)
debugfs: open /dev/xxx (the desired file system device)
debugfs: cd /boot
debugfs: ls -l
1966081 40755 (2) 0 0 4096 26-May-2016 16:28 .
2 40555 (2) 0 0 4096 11-May-2016 10:43 ..
1974291 100644 (1) 0 0 0 26-May-2016 16:28 bob <---<<< /boot/bob is hidden by /boot fs
3) Sparse files that looks bigger than the reality. non allocated blocks are not counted by df
but the apparent file size is counted by du
.
Note that Hard links do not fools du
add a comment |
df
is generally used to see what the file systems are, how full each is and where they're mounted. Very useful when you're running out of space in a file system, and maybe want to shift things around among the file systems, or buy a bigger disk, etc.
du
shows details of how much cumulative storage each of one's directories is consuming (sort of like windirstat
in Windows). Great for finding where you're hogging up space when trying to do file cleanup.
Aside from small numerical differences explained by others, I think the du
and df
utilities serve very different purposes.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f120311%2fwhy-are-there-so-many-different-ways-to-measure-disk-usage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Adding up numbers is easy. The problem is, there are many different numbers to add.
How much disk space does a file use?
The basic idea is that a file containing n bytes uses n bytes of disk space, plus a bit for some control information: the file's metadata (permissions, timestamps, etc.), and a bit of overhead for the information that the system needs to find where the file is stored. However there are many complications.
Microscopic complications
Think of each file as a series of books in a library. Smaller files make up just one volume, but larger files consist of many volumes, like an encyclopedia. In order to be able to locate the files, there is a card catalog which references every volume. Each volume has a bit of overhead due to the covers. If a file is very small, this overhead is relatively large. Also the card catalog itself takes up some room.
Going a bit more technical, in a typical simple filesystem, the space is divided in blocks. A typical block size is 4KiB. Each file takes up an integer number of blocks. Unless the file size is a multiple of the block size, the last block is only partially used. So a 1-byte file and a 4096-byte file both take up 1 block, whereas a 4097-byte file takes up two blocks. You can observe this with the du
command: if your filesystem has a 4KiB block size, then du
will report 4KiB for a 1-byte file.
If a file is large, then additional blocks are needed just to store the list of blocks that make up the file (these are indirect blocks; more sophisticated filesystems may optimize this in the form of extents). Those don't show in the file size as reported by ls -l
or GNU du --apparent-size
; du
, which reports disk usage as opposed to size, does account for them.
Some filesystems try to reuse the free space left in the last block to pack several file tails in the same block. Some filesystems (such as ext4 since Linux 3.8 use 0 blocks for tiny files (just a few bytes) that entirely fit in the inode.
Macroscopic complications
Generally, as seen above, the total size reported by du
is the sum of the sizes of the blocks or extents used by the file.
The size reported by du
may be smaller if the file is compressed. Unix systems traditionally support a crude form of compression: if a file block contains only null bytes, then instead of storing a block of zeroes, the filesystem can omit that block altogether. A file with omitted blocks like this is called a sparse file. Sparse files are not automatically created when a file contains a large series of null bytes, the application must arrange for the file to become sparse.
Some filesystems such as btrfs and zfs support general-purpose compression.
Advanced complications
Two major features of very modern filesystems such as zfs and btrfs make the relationship between file size and disk usage significantly more distant: snapshots and deduplication.
Snapshots are a frozen state of the filesystem at a certain date. Filesystems that support this feature can contain multiple snapshots taken at different dates. These snapshots take room, of course. At one extreme, if you delete all the files from the active version of the filesystem, the filesystem won't become empty if there are snapshots remaining.
Any file or block that hasn't changed since a snapshot, or between two snapshots was taken exists identically in the snapshot and in the active version or other snapshot. This is implemented via copy-on-write. In some edge cases, it's possible that deleting a file on a full filesystem will fail due to insufficient available space — because removing that file would require making a copy of a block in the directory, and there's no more room for even that one block.
Deduplication is a storage optimization technique that consists of avoiding storing identical blocks. With typical data, looking for duplicates isn't always worth the effort. Both zfs and btrfs support deduplication as an optional feature.
Why is the total from du
different from the sum of the file sizes?
As we've seen above, the size reported by du
for each file is normally is the sum of the sizes of the blocks or extents used by the file. Note that by default, ls -l
lists sizes in bytes, but du
lists sizes in KiB, or in 512-byte units (sectors) on some more traditional systems (du -k
forces the use of kilobytes). Most modern unices support ls -lh
and du -h
to use “human-readable” numbers using K, M, G, etc. suffices (for KiB, MiB, GiB) as appropriate.
When you run du
on a directory, it sums up the disk usage of all the files in the directory tree, including the directories themselves. A directory contains data (the names of the files, and a pointer to where the file's metadata is), so it needs a bit of storage space. A small directory will take up one block, a larger directory will require more blocks. The amount of storage used by a directory sometimes depends not only on the files it contains but also the order in which they were inserted and in which some files are removed (with some filesystems, this can leave holes — a compromise between disk space and performance), but the difference will be tiny (an extra block here and there). When you run ls -ld /some/directory
, the directory's size is listed. (Note that the “total NNN” line at the top of the output from ls -l
is an unrelated number, it's the sum of the sizes in blocks of the listed items, expressed in KiB or sectors.)
Keep in mind that du
includes dot files which ls
doesn't show unless you use the -A
or -a
option.
Sometimes du
reports less than the expected sum. This happens if there are hard links inside the directory tree: du
counts each file only once.
On some file systems like ZFS
on Linux, du
does not report the full disk space occupied by extended attributes of a file.
Beware that if there are mount points under a directory, du
will count all the files on these mount points as well, unless given the -x
option. So if for instance you want the total size of the files in your root filesystem, run du -x /
, not du /
.
If a filesystem is mounted to a non-empty directory, the files in that directory are hidden by the mounted filesystem. They still occupy their space, but du
won't find them.
Deleted files
When a file is deleted, this only removes the directory entry, not necessarily the file itself. Two conditions are necessary in order to actually delete a file and thus reclaim its disk space:
- The file's link count must drop to 0: if a file has multiple hard links, removing one doesn't affect the others.
- As long as the file is open by some process, the data remains. Only when all processes have closed the file is the file deleted. The output
fuser -m
orlsof
on a mount point includes the processes that have a file open on that filesystem, even if the file is deleted. - even if no process has the deleted file open, the file's space may not be reclaimed if that file is the backend of a
loop
device.losetup -a
(asroot
) can tell you whichloop
devices are currently set up and on what file. The loop device must be destroyed (withlosetup -d
) before the disk space can be reclaimed.
If you delete a file in some file managers or GUI environments, it may be put into a trash area where it can be undeleted. As long as the file can be undeleted, its space is still consumed.
What are these numbers from df
exactly?
A typical filesystem contains:
- Blocks containing file (including directories) data and some metadata (including indirect blocks, and extended attributes on some filesystems).
- Free blocks.
- Blocks that are reserved to the root user.
- superblocks and other control information.
- Inodes
- A journal
Only the first kind is reported by du
. When it comes to df
, what goes into the “used”, “available” and total columns depends on the filesystem (of course used blocks (including indirect ones) are always in the “used” column, and unused blocks are always in the “available” column).
Filesystems in the ext2/ext3/ext4 reserve 5% of the space to the root user. This is useful on the root filesystem, to keep the system going if it fills up (in particular for logging, and to let the system administrator store a bit of data while fixing the problem). Even for data partitions such as /home
, keeping that reserved space is useful because an almost-full filesystem is prone to fragmentation. Linux tries to avoid fragmentation (which slows down file access, especially on rotating mechanical devices such as hard disks) by pre-allocating many consecutive blocks when a file is being written, but if there are not many consecutive blocks, that can't work.
Traditional filesystems, up to and including ext4 but not btrfs, reserve a fixed number of inodes when the filesystem is created. This significantly simplifies the design of the filesystem, but has the downside that the number of inodes needs to be sized properly: with too many inodes, space is wasted; with too few inodes, the filesystem may run out of inodes before running out of space. The command df -i
reports how many inodes are in use and how many are available (filesystems where the concept is not applicable may report 0).
Running tune2fs -l
on the volume containing an ext2/ext3/ext4 filesystem reports some statistics including the total number and number of free inodes and blocks.
Another feature that can confuse matter is subvolumes (supported in btrfs, and in zfs under the name datasets). Multiple subvolumes share the same space, but have separate directory tree roots.
If a filesystem is mounted over the network (NFS, Samba, etc.) and the server exports a portion of that filesystem (e.g. the server has a /home
filesystem, and exports /home/bob
), then df
on a client reflects the data for the whole filesystem, not just for the part that is exported and mounted on the client.
What's using the space on my disk?
As we've seen above, the total size reported by df
does not always take all the control data of the filesystem into account. Use filesystem-specific tools to get the exact size of the filesystem if needed. For example, with ext2/ext3/ext4, run tune2fs -l
and multiply the block size by the block count.
When you create a filesystem, it normally fills up the available space on the enclosing partition or volume. Sometimes you might end up with a smaller filesystem when you've been moving filesystems around or resizing volumes.
On Linux, lsblk
presents a nice overview of the available storage volumes. For additional information or if you don't have lsblk
, use specialized volume management or partitioning tools to check what partitions you have. On Linux, there's lvs
, vgs
, pvs
for LVM, fdisk
for traditional PC-style (“MBR”) partitions (as well as GPT on recent systems), gdisk
for GPT partitions, disklabel
for BSD disklabels, Parted, etc. Under Linux, cat /proc/partitions
gives a quick summary. Typical installations have at least two partitions or volumes used by the operating system: a filesystem (sometimes more), and a swap volume.
Some computers have a partition containing the BIOS or other diagnostic software. Computers with UEFI have a dedicated bootloader partition.
Finally, note that most computer programs use units based on powers of 1024 = 210 (because programmers love binary and powers of 2). So 1 kB = 1024 B, 1 MB = 1048576 B, 1 GB = 1073741824, 1 TB = 1099511627776 B, … Officially, these units are known as kibibyte KiB, mebibyte MiB, etc., but most software just reports k or kB, M or MB, etc. On the other hand, hard disk manufacturers systematically use metric (1000-based units). So that 1 TB drive is only 931 GiB or 0.904 TiB.
1
@Kiwytune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.
– Gilles
Mar 19 '14 at 10:10
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
du
does account for indirect blocks. That's the main difference from the file size as reported byls -l
.
– Stéphane Chazelas
Feb 6 '17 at 14:07
|
show 5 more comments
Adding up numbers is easy. The problem is, there are many different numbers to add.
How much disk space does a file use?
The basic idea is that a file containing n bytes uses n bytes of disk space, plus a bit for some control information: the file's metadata (permissions, timestamps, etc.), and a bit of overhead for the information that the system needs to find where the file is stored. However there are many complications.
Microscopic complications
Think of each file as a series of books in a library. Smaller files make up just one volume, but larger files consist of many volumes, like an encyclopedia. In order to be able to locate the files, there is a card catalog which references every volume. Each volume has a bit of overhead due to the covers. If a file is very small, this overhead is relatively large. Also the card catalog itself takes up some room.
Going a bit more technical, in a typical simple filesystem, the space is divided in blocks. A typical block size is 4KiB. Each file takes up an integer number of blocks. Unless the file size is a multiple of the block size, the last block is only partially used. So a 1-byte file and a 4096-byte file both take up 1 block, whereas a 4097-byte file takes up two blocks. You can observe this with the du
command: if your filesystem has a 4KiB block size, then du
will report 4KiB for a 1-byte file.
If a file is large, then additional blocks are needed just to store the list of blocks that make up the file (these are indirect blocks; more sophisticated filesystems may optimize this in the form of extents). Those don't show in the file size as reported by ls -l
or GNU du --apparent-size
; du
, which reports disk usage as opposed to size, does account for them.
Some filesystems try to reuse the free space left in the last block to pack several file tails in the same block. Some filesystems (such as ext4 since Linux 3.8 use 0 blocks for tiny files (just a few bytes) that entirely fit in the inode.
Macroscopic complications
Generally, as seen above, the total size reported by du
is the sum of the sizes of the blocks or extents used by the file.
The size reported by du
may be smaller if the file is compressed. Unix systems traditionally support a crude form of compression: if a file block contains only null bytes, then instead of storing a block of zeroes, the filesystem can omit that block altogether. A file with omitted blocks like this is called a sparse file. Sparse files are not automatically created when a file contains a large series of null bytes, the application must arrange for the file to become sparse.
Some filesystems such as btrfs and zfs support general-purpose compression.
Advanced complications
Two major features of very modern filesystems such as zfs and btrfs make the relationship between file size and disk usage significantly more distant: snapshots and deduplication.
Snapshots are a frozen state of the filesystem at a certain date. Filesystems that support this feature can contain multiple snapshots taken at different dates. These snapshots take room, of course. At one extreme, if you delete all the files from the active version of the filesystem, the filesystem won't become empty if there are snapshots remaining.
Any file or block that hasn't changed since a snapshot, or between two snapshots was taken exists identically in the snapshot and in the active version or other snapshot. This is implemented via copy-on-write. In some edge cases, it's possible that deleting a file on a full filesystem will fail due to insufficient available space — because removing that file would require making a copy of a block in the directory, and there's no more room for even that one block.
Deduplication is a storage optimization technique that consists of avoiding storing identical blocks. With typical data, looking for duplicates isn't always worth the effort. Both zfs and btrfs support deduplication as an optional feature.
Why is the total from du
different from the sum of the file sizes?
As we've seen above, the size reported by du
for each file is normally is the sum of the sizes of the blocks or extents used by the file. Note that by default, ls -l
lists sizes in bytes, but du
lists sizes in KiB, or in 512-byte units (sectors) on some more traditional systems (du -k
forces the use of kilobytes). Most modern unices support ls -lh
and du -h
to use “human-readable” numbers using K, M, G, etc. suffices (for KiB, MiB, GiB) as appropriate.
When you run du
on a directory, it sums up the disk usage of all the files in the directory tree, including the directories themselves. A directory contains data (the names of the files, and a pointer to where the file's metadata is), so it needs a bit of storage space. A small directory will take up one block, a larger directory will require more blocks. The amount of storage used by a directory sometimes depends not only on the files it contains but also the order in which they were inserted and in which some files are removed (with some filesystems, this can leave holes — a compromise between disk space and performance), but the difference will be tiny (an extra block here and there). When you run ls -ld /some/directory
, the directory's size is listed. (Note that the “total NNN” line at the top of the output from ls -l
is an unrelated number, it's the sum of the sizes in blocks of the listed items, expressed in KiB or sectors.)
Keep in mind that du
includes dot files which ls
doesn't show unless you use the -A
or -a
option.
Sometimes du
reports less than the expected sum. This happens if there are hard links inside the directory tree: du
counts each file only once.
On some file systems like ZFS
on Linux, du
does not report the full disk space occupied by extended attributes of a file.
Beware that if there are mount points under a directory, du
will count all the files on these mount points as well, unless given the -x
option. So if for instance you want the total size of the files in your root filesystem, run du -x /
, not du /
.
If a filesystem is mounted to a non-empty directory, the files in that directory are hidden by the mounted filesystem. They still occupy their space, but du
won't find them.
Deleted files
When a file is deleted, this only removes the directory entry, not necessarily the file itself. Two conditions are necessary in order to actually delete a file and thus reclaim its disk space:
- The file's link count must drop to 0: if a file has multiple hard links, removing one doesn't affect the others.
- As long as the file is open by some process, the data remains. Only when all processes have closed the file is the file deleted. The output
fuser -m
orlsof
on a mount point includes the processes that have a file open on that filesystem, even if the file is deleted. - even if no process has the deleted file open, the file's space may not be reclaimed if that file is the backend of a
loop
device.losetup -a
(asroot
) can tell you whichloop
devices are currently set up and on what file. The loop device must be destroyed (withlosetup -d
) before the disk space can be reclaimed.
If you delete a file in some file managers or GUI environments, it may be put into a trash area where it can be undeleted. As long as the file can be undeleted, its space is still consumed.
What are these numbers from df
exactly?
A typical filesystem contains:
- Blocks containing file (including directories) data and some metadata (including indirect blocks, and extended attributes on some filesystems).
- Free blocks.
- Blocks that are reserved to the root user.
- superblocks and other control information.
- Inodes
- A journal
Only the first kind is reported by du
. When it comes to df
, what goes into the “used”, “available” and total columns depends on the filesystem (of course used blocks (including indirect ones) are always in the “used” column, and unused blocks are always in the “available” column).
Filesystems in the ext2/ext3/ext4 reserve 5% of the space to the root user. This is useful on the root filesystem, to keep the system going if it fills up (in particular for logging, and to let the system administrator store a bit of data while fixing the problem). Even for data partitions such as /home
, keeping that reserved space is useful because an almost-full filesystem is prone to fragmentation. Linux tries to avoid fragmentation (which slows down file access, especially on rotating mechanical devices such as hard disks) by pre-allocating many consecutive blocks when a file is being written, but if there are not many consecutive blocks, that can't work.
Traditional filesystems, up to and including ext4 but not btrfs, reserve a fixed number of inodes when the filesystem is created. This significantly simplifies the design of the filesystem, but has the downside that the number of inodes needs to be sized properly: with too many inodes, space is wasted; with too few inodes, the filesystem may run out of inodes before running out of space. The command df -i
reports how many inodes are in use and how many are available (filesystems where the concept is not applicable may report 0).
Running tune2fs -l
on the volume containing an ext2/ext3/ext4 filesystem reports some statistics including the total number and number of free inodes and blocks.
Another feature that can confuse matter is subvolumes (supported in btrfs, and in zfs under the name datasets). Multiple subvolumes share the same space, but have separate directory tree roots.
If a filesystem is mounted over the network (NFS, Samba, etc.) and the server exports a portion of that filesystem (e.g. the server has a /home
filesystem, and exports /home/bob
), then df
on a client reflects the data for the whole filesystem, not just for the part that is exported and mounted on the client.
What's using the space on my disk?
As we've seen above, the total size reported by df
does not always take all the control data of the filesystem into account. Use filesystem-specific tools to get the exact size of the filesystem if needed. For example, with ext2/ext3/ext4, run tune2fs -l
and multiply the block size by the block count.
When you create a filesystem, it normally fills up the available space on the enclosing partition or volume. Sometimes you might end up with a smaller filesystem when you've been moving filesystems around or resizing volumes.
On Linux, lsblk
presents a nice overview of the available storage volumes. For additional information or if you don't have lsblk
, use specialized volume management or partitioning tools to check what partitions you have. On Linux, there's lvs
, vgs
, pvs
for LVM, fdisk
for traditional PC-style (“MBR”) partitions (as well as GPT on recent systems), gdisk
for GPT partitions, disklabel
for BSD disklabels, Parted, etc. Under Linux, cat /proc/partitions
gives a quick summary. Typical installations have at least two partitions or volumes used by the operating system: a filesystem (sometimes more), and a swap volume.
Some computers have a partition containing the BIOS or other diagnostic software. Computers with UEFI have a dedicated bootloader partition.
Finally, note that most computer programs use units based on powers of 1024 = 210 (because programmers love binary and powers of 2). So 1 kB = 1024 B, 1 MB = 1048576 B, 1 GB = 1073741824, 1 TB = 1099511627776 B, … Officially, these units are known as kibibyte KiB, mebibyte MiB, etc., but most software just reports k or kB, M or MB, etc. On the other hand, hard disk manufacturers systematically use metric (1000-based units). So that 1 TB drive is only 931 GiB or 0.904 TiB.
1
@Kiwytune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.
– Gilles
Mar 19 '14 at 10:10
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
du
does account for indirect blocks. That's the main difference from the file size as reported byls -l
.
– Stéphane Chazelas
Feb 6 '17 at 14:07
|
show 5 more comments
Adding up numbers is easy. The problem is, there are many different numbers to add.
How much disk space does a file use?
The basic idea is that a file containing n bytes uses n bytes of disk space, plus a bit for some control information: the file's metadata (permissions, timestamps, etc.), and a bit of overhead for the information that the system needs to find where the file is stored. However there are many complications.
Microscopic complications
Think of each file as a series of books in a library. Smaller files make up just one volume, but larger files consist of many volumes, like an encyclopedia. In order to be able to locate the files, there is a card catalog which references every volume. Each volume has a bit of overhead due to the covers. If a file is very small, this overhead is relatively large. Also the card catalog itself takes up some room.
Going a bit more technical, in a typical simple filesystem, the space is divided in blocks. A typical block size is 4KiB. Each file takes up an integer number of blocks. Unless the file size is a multiple of the block size, the last block is only partially used. So a 1-byte file and a 4096-byte file both take up 1 block, whereas a 4097-byte file takes up two blocks. You can observe this with the du
command: if your filesystem has a 4KiB block size, then du
will report 4KiB for a 1-byte file.
If a file is large, then additional blocks are needed just to store the list of blocks that make up the file (these are indirect blocks; more sophisticated filesystems may optimize this in the form of extents). Those don't show in the file size as reported by ls -l
or GNU du --apparent-size
; du
, which reports disk usage as opposed to size, does account for them.
Some filesystems try to reuse the free space left in the last block to pack several file tails in the same block. Some filesystems (such as ext4 since Linux 3.8 use 0 blocks for tiny files (just a few bytes) that entirely fit in the inode.
Macroscopic complications
Generally, as seen above, the total size reported by du
is the sum of the sizes of the blocks or extents used by the file.
The size reported by du
may be smaller if the file is compressed. Unix systems traditionally support a crude form of compression: if a file block contains only null bytes, then instead of storing a block of zeroes, the filesystem can omit that block altogether. A file with omitted blocks like this is called a sparse file. Sparse files are not automatically created when a file contains a large series of null bytes, the application must arrange for the file to become sparse.
Some filesystems such as btrfs and zfs support general-purpose compression.
Advanced complications
Two major features of very modern filesystems such as zfs and btrfs make the relationship between file size and disk usage significantly more distant: snapshots and deduplication.
Snapshots are a frozen state of the filesystem at a certain date. Filesystems that support this feature can contain multiple snapshots taken at different dates. These snapshots take room, of course. At one extreme, if you delete all the files from the active version of the filesystem, the filesystem won't become empty if there are snapshots remaining.
Any file or block that hasn't changed since a snapshot, or between two snapshots was taken exists identically in the snapshot and in the active version or other snapshot. This is implemented via copy-on-write. In some edge cases, it's possible that deleting a file on a full filesystem will fail due to insufficient available space — because removing that file would require making a copy of a block in the directory, and there's no more room for even that one block.
Deduplication is a storage optimization technique that consists of avoiding storing identical blocks. With typical data, looking for duplicates isn't always worth the effort. Both zfs and btrfs support deduplication as an optional feature.
Why is the total from du
different from the sum of the file sizes?
As we've seen above, the size reported by du
for each file is normally is the sum of the sizes of the blocks or extents used by the file. Note that by default, ls -l
lists sizes in bytes, but du
lists sizes in KiB, or in 512-byte units (sectors) on some more traditional systems (du -k
forces the use of kilobytes). Most modern unices support ls -lh
and du -h
to use “human-readable” numbers using K, M, G, etc. suffices (for KiB, MiB, GiB) as appropriate.
When you run du
on a directory, it sums up the disk usage of all the files in the directory tree, including the directories themselves. A directory contains data (the names of the files, and a pointer to where the file's metadata is), so it needs a bit of storage space. A small directory will take up one block, a larger directory will require more blocks. The amount of storage used by a directory sometimes depends not only on the files it contains but also the order in which they were inserted and in which some files are removed (with some filesystems, this can leave holes — a compromise between disk space and performance), but the difference will be tiny (an extra block here and there). When you run ls -ld /some/directory
, the directory's size is listed. (Note that the “total NNN” line at the top of the output from ls -l
is an unrelated number, it's the sum of the sizes in blocks of the listed items, expressed in KiB or sectors.)
Keep in mind that du
includes dot files which ls
doesn't show unless you use the -A
or -a
option.
Sometimes du
reports less than the expected sum. This happens if there are hard links inside the directory tree: du
counts each file only once.
On some file systems like ZFS
on Linux, du
does not report the full disk space occupied by extended attributes of a file.
Beware that if there are mount points under a directory, du
will count all the files on these mount points as well, unless given the -x
option. So if for instance you want the total size of the files in your root filesystem, run du -x /
, not du /
.
If a filesystem is mounted to a non-empty directory, the files in that directory are hidden by the mounted filesystem. They still occupy their space, but du
won't find them.
Deleted files
When a file is deleted, this only removes the directory entry, not necessarily the file itself. Two conditions are necessary in order to actually delete a file and thus reclaim its disk space:
- The file's link count must drop to 0: if a file has multiple hard links, removing one doesn't affect the others.
- As long as the file is open by some process, the data remains. Only when all processes have closed the file is the file deleted. The output
fuser -m
orlsof
on a mount point includes the processes that have a file open on that filesystem, even if the file is deleted. - even if no process has the deleted file open, the file's space may not be reclaimed if that file is the backend of a
loop
device.losetup -a
(asroot
) can tell you whichloop
devices are currently set up and on what file. The loop device must be destroyed (withlosetup -d
) before the disk space can be reclaimed.
If you delete a file in some file managers or GUI environments, it may be put into a trash area where it can be undeleted. As long as the file can be undeleted, its space is still consumed.
What are these numbers from df
exactly?
A typical filesystem contains:
- Blocks containing file (including directories) data and some metadata (including indirect blocks, and extended attributes on some filesystems).
- Free blocks.
- Blocks that are reserved to the root user.
- superblocks and other control information.
- Inodes
- A journal
Only the first kind is reported by du
. When it comes to df
, what goes into the “used”, “available” and total columns depends on the filesystem (of course used blocks (including indirect ones) are always in the “used” column, and unused blocks are always in the “available” column).
Filesystems in the ext2/ext3/ext4 reserve 5% of the space to the root user. This is useful on the root filesystem, to keep the system going if it fills up (in particular for logging, and to let the system administrator store a bit of data while fixing the problem). Even for data partitions such as /home
, keeping that reserved space is useful because an almost-full filesystem is prone to fragmentation. Linux tries to avoid fragmentation (which slows down file access, especially on rotating mechanical devices such as hard disks) by pre-allocating many consecutive blocks when a file is being written, but if there are not many consecutive blocks, that can't work.
Traditional filesystems, up to and including ext4 but not btrfs, reserve a fixed number of inodes when the filesystem is created. This significantly simplifies the design of the filesystem, but has the downside that the number of inodes needs to be sized properly: with too many inodes, space is wasted; with too few inodes, the filesystem may run out of inodes before running out of space. The command df -i
reports how many inodes are in use and how many are available (filesystems where the concept is not applicable may report 0).
Running tune2fs -l
on the volume containing an ext2/ext3/ext4 filesystem reports some statistics including the total number and number of free inodes and blocks.
Another feature that can confuse matter is subvolumes (supported in btrfs, and in zfs under the name datasets). Multiple subvolumes share the same space, but have separate directory tree roots.
If a filesystem is mounted over the network (NFS, Samba, etc.) and the server exports a portion of that filesystem (e.g. the server has a /home
filesystem, and exports /home/bob
), then df
on a client reflects the data for the whole filesystem, not just for the part that is exported and mounted on the client.
What's using the space on my disk?
As we've seen above, the total size reported by df
does not always take all the control data of the filesystem into account. Use filesystem-specific tools to get the exact size of the filesystem if needed. For example, with ext2/ext3/ext4, run tune2fs -l
and multiply the block size by the block count.
When you create a filesystem, it normally fills up the available space on the enclosing partition or volume. Sometimes you might end up with a smaller filesystem when you've been moving filesystems around or resizing volumes.
On Linux, lsblk
presents a nice overview of the available storage volumes. For additional information or if you don't have lsblk
, use specialized volume management or partitioning tools to check what partitions you have. On Linux, there's lvs
, vgs
, pvs
for LVM, fdisk
for traditional PC-style (“MBR”) partitions (as well as GPT on recent systems), gdisk
for GPT partitions, disklabel
for BSD disklabels, Parted, etc. Under Linux, cat /proc/partitions
gives a quick summary. Typical installations have at least two partitions or volumes used by the operating system: a filesystem (sometimes more), and a swap volume.
Some computers have a partition containing the BIOS or other diagnostic software. Computers with UEFI have a dedicated bootloader partition.
Finally, note that most computer programs use units based on powers of 1024 = 210 (because programmers love binary and powers of 2). So 1 kB = 1024 B, 1 MB = 1048576 B, 1 GB = 1073741824, 1 TB = 1099511627776 B, … Officially, these units are known as kibibyte KiB, mebibyte MiB, etc., but most software just reports k or kB, M or MB, etc. On the other hand, hard disk manufacturers systematically use metric (1000-based units). So that 1 TB drive is only 931 GiB or 0.904 TiB.
Adding up numbers is easy. The problem is, there are many different numbers to add.
How much disk space does a file use?
The basic idea is that a file containing n bytes uses n bytes of disk space, plus a bit for some control information: the file's metadata (permissions, timestamps, etc.), and a bit of overhead for the information that the system needs to find where the file is stored. However there are many complications.
Microscopic complications
Think of each file as a series of books in a library. Smaller files make up just one volume, but larger files consist of many volumes, like an encyclopedia. In order to be able to locate the files, there is a card catalog which references every volume. Each volume has a bit of overhead due to the covers. If a file is very small, this overhead is relatively large. Also the card catalog itself takes up some room.
Going a bit more technical, in a typical simple filesystem, the space is divided in blocks. A typical block size is 4KiB. Each file takes up an integer number of blocks. Unless the file size is a multiple of the block size, the last block is only partially used. So a 1-byte file and a 4096-byte file both take up 1 block, whereas a 4097-byte file takes up two blocks. You can observe this with the du
command: if your filesystem has a 4KiB block size, then du
will report 4KiB for a 1-byte file.
If a file is large, then additional blocks are needed just to store the list of blocks that make up the file (these are indirect blocks; more sophisticated filesystems may optimize this in the form of extents). Those don't show in the file size as reported by ls -l
or GNU du --apparent-size
; du
, which reports disk usage as opposed to size, does account for them.
Some filesystems try to reuse the free space left in the last block to pack several file tails in the same block. Some filesystems (such as ext4 since Linux 3.8 use 0 blocks for tiny files (just a few bytes) that entirely fit in the inode.
Macroscopic complications
Generally, as seen above, the total size reported by du
is the sum of the sizes of the blocks or extents used by the file.
The size reported by du
may be smaller if the file is compressed. Unix systems traditionally support a crude form of compression: if a file block contains only null bytes, then instead of storing a block of zeroes, the filesystem can omit that block altogether. A file with omitted blocks like this is called a sparse file. Sparse files are not automatically created when a file contains a large series of null bytes, the application must arrange for the file to become sparse.
Some filesystems such as btrfs and zfs support general-purpose compression.
Advanced complications
Two major features of very modern filesystems such as zfs and btrfs make the relationship between file size and disk usage significantly more distant: snapshots and deduplication.
Snapshots are a frozen state of the filesystem at a certain date. Filesystems that support this feature can contain multiple snapshots taken at different dates. These snapshots take room, of course. At one extreme, if you delete all the files from the active version of the filesystem, the filesystem won't become empty if there are snapshots remaining.
Any file or block that hasn't changed since a snapshot, or between two snapshots was taken exists identically in the snapshot and in the active version or other snapshot. This is implemented via copy-on-write. In some edge cases, it's possible that deleting a file on a full filesystem will fail due to insufficient available space — because removing that file would require making a copy of a block in the directory, and there's no more room for even that one block.
Deduplication is a storage optimization technique that consists of avoiding storing identical blocks. With typical data, looking for duplicates isn't always worth the effort. Both zfs and btrfs support deduplication as an optional feature.
Why is the total from du
different from the sum of the file sizes?
As we've seen above, the size reported by du
for each file is normally is the sum of the sizes of the blocks or extents used by the file. Note that by default, ls -l
lists sizes in bytes, but du
lists sizes in KiB, or in 512-byte units (sectors) on some more traditional systems (du -k
forces the use of kilobytes). Most modern unices support ls -lh
and du -h
to use “human-readable” numbers using K, M, G, etc. suffices (for KiB, MiB, GiB) as appropriate.
When you run du
on a directory, it sums up the disk usage of all the files in the directory tree, including the directories themselves. A directory contains data (the names of the files, and a pointer to where the file's metadata is), so it needs a bit of storage space. A small directory will take up one block, a larger directory will require more blocks. The amount of storage used by a directory sometimes depends not only on the files it contains but also the order in which they were inserted and in which some files are removed (with some filesystems, this can leave holes — a compromise between disk space and performance), but the difference will be tiny (an extra block here and there). When you run ls -ld /some/directory
, the directory's size is listed. (Note that the “total NNN” line at the top of the output from ls -l
is an unrelated number, it's the sum of the sizes in blocks of the listed items, expressed in KiB or sectors.)
Keep in mind that du
includes dot files which ls
doesn't show unless you use the -A
or -a
option.
Sometimes du
reports less than the expected sum. This happens if there are hard links inside the directory tree: du
counts each file only once.
On some file systems like ZFS
on Linux, du
does not report the full disk space occupied by extended attributes of a file.
Beware that if there are mount points under a directory, du
will count all the files on these mount points as well, unless given the -x
option. So if for instance you want the total size of the files in your root filesystem, run du -x /
, not du /
.
If a filesystem is mounted to a non-empty directory, the files in that directory are hidden by the mounted filesystem. They still occupy their space, but du
won't find them.
Deleted files
When a file is deleted, this only removes the directory entry, not necessarily the file itself. Two conditions are necessary in order to actually delete a file and thus reclaim its disk space:
- The file's link count must drop to 0: if a file has multiple hard links, removing one doesn't affect the others.
- As long as the file is open by some process, the data remains. Only when all processes have closed the file is the file deleted. The output
fuser -m
orlsof
on a mount point includes the processes that have a file open on that filesystem, even if the file is deleted. - even if no process has the deleted file open, the file's space may not be reclaimed if that file is the backend of a
loop
device.losetup -a
(asroot
) can tell you whichloop
devices are currently set up and on what file. The loop device must be destroyed (withlosetup -d
) before the disk space can be reclaimed.
If you delete a file in some file managers or GUI environments, it may be put into a trash area where it can be undeleted. As long as the file can be undeleted, its space is still consumed.
What are these numbers from df
exactly?
A typical filesystem contains:
- Blocks containing file (including directories) data and some metadata (including indirect blocks, and extended attributes on some filesystems).
- Free blocks.
- Blocks that are reserved to the root user.
- superblocks and other control information.
- Inodes
- A journal
Only the first kind is reported by du
. When it comes to df
, what goes into the “used”, “available” and total columns depends on the filesystem (of course used blocks (including indirect ones) are always in the “used” column, and unused blocks are always in the “available” column).
Filesystems in the ext2/ext3/ext4 reserve 5% of the space to the root user. This is useful on the root filesystem, to keep the system going if it fills up (in particular for logging, and to let the system administrator store a bit of data while fixing the problem). Even for data partitions such as /home
, keeping that reserved space is useful because an almost-full filesystem is prone to fragmentation. Linux tries to avoid fragmentation (which slows down file access, especially on rotating mechanical devices such as hard disks) by pre-allocating many consecutive blocks when a file is being written, but if there are not many consecutive blocks, that can't work.
Traditional filesystems, up to and including ext4 but not btrfs, reserve a fixed number of inodes when the filesystem is created. This significantly simplifies the design of the filesystem, but has the downside that the number of inodes needs to be sized properly: with too many inodes, space is wasted; with too few inodes, the filesystem may run out of inodes before running out of space. The command df -i
reports how many inodes are in use and how many are available (filesystems where the concept is not applicable may report 0).
Running tune2fs -l
on the volume containing an ext2/ext3/ext4 filesystem reports some statistics including the total number and number of free inodes and blocks.
Another feature that can confuse matter is subvolumes (supported in btrfs, and in zfs under the name datasets). Multiple subvolumes share the same space, but have separate directory tree roots.
If a filesystem is mounted over the network (NFS, Samba, etc.) and the server exports a portion of that filesystem (e.g. the server has a /home
filesystem, and exports /home/bob
), then df
on a client reflects the data for the whole filesystem, not just for the part that is exported and mounted on the client.
What's using the space on my disk?
As we've seen above, the total size reported by df
does not always take all the control data of the filesystem into account. Use filesystem-specific tools to get the exact size of the filesystem if needed. For example, with ext2/ext3/ext4, run tune2fs -l
and multiply the block size by the block count.
When you create a filesystem, it normally fills up the available space on the enclosing partition or volume. Sometimes you might end up with a smaller filesystem when you've been moving filesystems around or resizing volumes.
On Linux, lsblk
presents a nice overview of the available storage volumes. For additional information or if you don't have lsblk
, use specialized volume management or partitioning tools to check what partitions you have. On Linux, there's lvs
, vgs
, pvs
for LVM, fdisk
for traditional PC-style (“MBR”) partitions (as well as GPT on recent systems), gdisk
for GPT partitions, disklabel
for BSD disklabels, Parted, etc. Under Linux, cat /proc/partitions
gives a quick summary. Typical installations have at least two partitions or volumes used by the operating system: a filesystem (sometimes more), and a swap volume.
Some computers have a partition containing the BIOS or other diagnostic software. Computers with UEFI have a dedicated bootloader partition.
Finally, note that most computer programs use units based on powers of 1024 = 210 (because programmers love binary and powers of 2). So 1 kB = 1024 B, 1 MB = 1048576 B, 1 GB = 1073741824, 1 TB = 1099511627776 B, … Officially, these units are known as kibibyte KiB, mebibyte MiB, etc., but most software just reports k or kB, M or MB, etc. On the other hand, hard disk manufacturers systematically use metric (1000-based units). So that 1 TB drive is only 931 GiB or 0.904 TiB.
edited Feb 20 '18 at 9:41
Stéphane Chazelas
299k54564913
299k54564913
answered Mar 19 '14 at 3:28
Gilles
528k12810581583
528k12810581583
1
@Kiwytune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.
– Gilles
Mar 19 '14 at 10:10
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
du
does account for indirect blocks. That's the main difference from the file size as reported byls -l
.
– Stéphane Chazelas
Feb 6 '17 at 14:07
|
show 5 more comments
1
@Kiwytune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.
– Gilles
Mar 19 '14 at 10:10
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
du
does account for indirect blocks. That's the main difference from the file size as reported byls -l
.
– Stéphane Chazelas
Feb 6 '17 at 14:07
1
1
@Kiwy
tune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.– Gilles
Mar 19 '14 at 10:10
@Kiwy
tune2fs
requires having read access to the block device that contains the filesystem, which in general requires being root since that lets you read the content of any file.– Gilles
Mar 19 '14 at 10:10
18
18
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
I know that 'thank you' is discouraged in SE, but Gilles you deserve a huge 'Thank you' for this terrific post.
– dotancohen
Mar 19 '14 at 10:52
1
1
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
I remember seeing a card catalog when I was like 6. I wonder how many won't know what they are?
– Izkata
Mar 19 '14 at 14:32
1
1
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
@illuminÉ That's too advanced Solaris for me, I don't know at what level it fits.
– Gilles
Mar 19 '14 at 15:56
1
1
du
does account for indirect blocks. That's the main difference from the file size as reported by ls -l
.– Stéphane Chazelas
Feb 6 '17 at 14:07
du
does account for indirect blocks. That's the main difference from the file size as reported by ls -l
.– Stéphane Chazelas
Feb 6 '17 at 14:07
|
show 5 more comments
A short summary of complications to calculating file sizes and disk spaces:
The space the file takes on disk is a multiplier of the number of blocks it takes against the size of each block + the number of inodes it takes. A 1 byte long file will take at least 1 block, 1 inode and one directory entry.
But it could take only 1 additional directory entry if the file is a hard link to another file. It would be just another reference to the same set of blocks.
- The size of the contents of the file. This is what
ls
displays. - Free disk space is not the size of the largest file you can fit in or the sum of all file content sizes that will fit on the disk. It's somewhere in between. It depends on the number of files (taking up inodes) the block size and how closely each file's contents fill blocks completely.
This is just scratching the surface of file systems and it is overly simplified. Also remember that different file systems operate differently.
stat
is very helpful at spotting some of this information. Here's some examples of how to use stat and what it is good for: http://landoflinux.com/linux_stat_command_examples.html
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
add a comment |
A short summary of complications to calculating file sizes and disk spaces:
The space the file takes on disk is a multiplier of the number of blocks it takes against the size of each block + the number of inodes it takes. A 1 byte long file will take at least 1 block, 1 inode and one directory entry.
But it could take only 1 additional directory entry if the file is a hard link to another file. It would be just another reference to the same set of blocks.
- The size of the contents of the file. This is what
ls
displays. - Free disk space is not the size of the largest file you can fit in or the sum of all file content sizes that will fit on the disk. It's somewhere in between. It depends on the number of files (taking up inodes) the block size and how closely each file's contents fill blocks completely.
This is just scratching the surface of file systems and it is overly simplified. Also remember that different file systems operate differently.
stat
is very helpful at spotting some of this information. Here's some examples of how to use stat and what it is good for: http://landoflinux.com/linux_stat_command_examples.html
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
add a comment |
A short summary of complications to calculating file sizes and disk spaces:
The space the file takes on disk is a multiplier of the number of blocks it takes against the size of each block + the number of inodes it takes. A 1 byte long file will take at least 1 block, 1 inode and one directory entry.
But it could take only 1 additional directory entry if the file is a hard link to another file. It would be just another reference to the same set of blocks.
- The size of the contents of the file. This is what
ls
displays. - Free disk space is not the size of the largest file you can fit in or the sum of all file content sizes that will fit on the disk. It's somewhere in between. It depends on the number of files (taking up inodes) the block size and how closely each file's contents fill blocks completely.
This is just scratching the surface of file systems and it is overly simplified. Also remember that different file systems operate differently.
stat
is very helpful at spotting some of this information. Here's some examples of how to use stat and what it is good for: http://landoflinux.com/linux_stat_command_examples.html
A short summary of complications to calculating file sizes and disk spaces:
The space the file takes on disk is a multiplier of the number of blocks it takes against the size of each block + the number of inodes it takes. A 1 byte long file will take at least 1 block, 1 inode and one directory entry.
But it could take only 1 additional directory entry if the file is a hard link to another file. It would be just another reference to the same set of blocks.
- The size of the contents of the file. This is what
ls
displays. - Free disk space is not the size of the largest file you can fit in or the sum of all file content sizes that will fit on the disk. It's somewhere in between. It depends on the number of files (taking up inodes) the block size and how closely each file's contents fill blocks completely.
This is just scratching the surface of file systems and it is overly simplified. Also remember that different file systems operate differently.
stat
is very helpful at spotting some of this information. Here's some examples of how to use stat and what it is good for: http://landoflinux.com/linux_stat_command_examples.html
edited Mar 30 '18 at 17:37
agc
4,43111036
4,43111036
answered Feb 20 '18 at 11:00
Pedro
62929
62929
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
add a comment |
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
1
1
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
A 1-byte file would typically take one block, not 8. Creating a hard link doesn't create an inode at all: one file is one inode no matter how many links there are to the file. Creating a hard link only requires space for the directory entry.
– Gilles
Feb 20 '18 at 12:46
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
Thanks for the corrections, admittedly my memory re: studying ext2 in depth is now a little fuzzy. I was following the output of stat re: the block count - it did feel excessive but that's what's there. I'll correct the answer.
– Pedro
Feb 20 '18 at 15:24
1
1
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
That's because 1 ext2 block = 8 stat blocks, if the ext2 filesystem uses 4kB blocks: stat counts in 512-byte blocks for historical reasons. See unix.stackexchange.com/questions/14409/…
– Gilles
Feb 20 '18 at 16:10
add a comment |
I will illustrate here different cases that causes du
being different from df
.
df
counts the file system allocated blocks, du
use the size information of each files.
A difference can have many cause:
1) Unlinked (deleted) files that are still open by application. The file information are missing, the block are still allocated. lsof +aL1 <filesystem>
will helps you to identfy the processes. Most of the time you have to kill the processes to free the space (it depends on the process, sometimes a configuration reload is sufficient).
2) Files beneath mount points hidden to du
but not to df
. debugfs
can helps you to read the filesystem.
$ sudo debugfs
debugfs 1.42.12 (29-Aug-2014)
debugfs: open /dev/xxx (the desired file system device)
debugfs: cd /boot
debugfs: ls -l
1966081 40755 (2) 0 0 4096 26-May-2016 16:28 .
2 40555 (2) 0 0 4096 11-May-2016 10:43 ..
1974291 100644 (1) 0 0 0 26-May-2016 16:28 bob <---<<< /boot/bob is hidden by /boot fs
3) Sparse files that looks bigger than the reality. non allocated blocks are not counted by df
but the apparent file size is counted by du
.
Note that Hard links do not fools du
add a comment |
I will illustrate here different cases that causes du
being different from df
.
df
counts the file system allocated blocks, du
use the size information of each files.
A difference can have many cause:
1) Unlinked (deleted) files that are still open by application. The file information are missing, the block are still allocated. lsof +aL1 <filesystem>
will helps you to identfy the processes. Most of the time you have to kill the processes to free the space (it depends on the process, sometimes a configuration reload is sufficient).
2) Files beneath mount points hidden to du
but not to df
. debugfs
can helps you to read the filesystem.
$ sudo debugfs
debugfs 1.42.12 (29-Aug-2014)
debugfs: open /dev/xxx (the desired file system device)
debugfs: cd /boot
debugfs: ls -l
1966081 40755 (2) 0 0 4096 26-May-2016 16:28 .
2 40555 (2) 0 0 4096 11-May-2016 10:43 ..
1974291 100644 (1) 0 0 0 26-May-2016 16:28 bob <---<<< /boot/bob is hidden by /boot fs
3) Sparse files that looks bigger than the reality. non allocated blocks are not counted by df
but the apparent file size is counted by du
.
Note that Hard links do not fools du
add a comment |
I will illustrate here different cases that causes du
being different from df
.
df
counts the file system allocated blocks, du
use the size information of each files.
A difference can have many cause:
1) Unlinked (deleted) files that are still open by application. The file information are missing, the block are still allocated. lsof +aL1 <filesystem>
will helps you to identfy the processes. Most of the time you have to kill the processes to free the space (it depends on the process, sometimes a configuration reload is sufficient).
2) Files beneath mount points hidden to du
but not to df
. debugfs
can helps you to read the filesystem.
$ sudo debugfs
debugfs 1.42.12 (29-Aug-2014)
debugfs: open /dev/xxx (the desired file system device)
debugfs: cd /boot
debugfs: ls -l
1966081 40755 (2) 0 0 4096 26-May-2016 16:28 .
2 40555 (2) 0 0 4096 11-May-2016 10:43 ..
1974291 100644 (1) 0 0 0 26-May-2016 16:28 bob <---<<< /boot/bob is hidden by /boot fs
3) Sparse files that looks bigger than the reality. non allocated blocks are not counted by df
but the apparent file size is counted by du
.
Note that Hard links do not fools du
I will illustrate here different cases that causes du
being different from df
.
df
counts the file system allocated blocks, du
use the size information of each files.
A difference can have many cause:
1) Unlinked (deleted) files that are still open by application. The file information are missing, the block are still allocated. lsof +aL1 <filesystem>
will helps you to identfy the processes. Most of the time you have to kill the processes to free the space (it depends on the process, sometimes a configuration reload is sufficient).
2) Files beneath mount points hidden to du
but not to df
. debugfs
can helps you to read the filesystem.
$ sudo debugfs
debugfs 1.42.12 (29-Aug-2014)
debugfs: open /dev/xxx (the desired file system device)
debugfs: cd /boot
debugfs: ls -l
1966081 40755 (2) 0 0 4096 26-May-2016 16:28 .
2 40555 (2) 0 0 4096 11-May-2016 10:43 ..
1974291 100644 (1) 0 0 0 26-May-2016 16:28 bob <---<<< /boot/bob is hidden by /boot fs
3) Sparse files that looks bigger than the reality. non allocated blocks are not counted by df
but the apparent file size is counted by du
.
Note that Hard links do not fools du
answered Sep 2 '16 at 17:10
Emmanuel
3,00911120
3,00911120
add a comment |
add a comment |
df
is generally used to see what the file systems are, how full each is and where they're mounted. Very useful when you're running out of space in a file system, and maybe want to shift things around among the file systems, or buy a bigger disk, etc.
du
shows details of how much cumulative storage each of one's directories is consuming (sort of like windirstat
in Windows). Great for finding where you're hogging up space when trying to do file cleanup.
Aside from small numerical differences explained by others, I think the du
and df
utilities serve very different purposes.
add a comment |
df
is generally used to see what the file systems are, how full each is and where they're mounted. Very useful when you're running out of space in a file system, and maybe want to shift things around among the file systems, or buy a bigger disk, etc.
du
shows details of how much cumulative storage each of one's directories is consuming (sort of like windirstat
in Windows). Great for finding where you're hogging up space when trying to do file cleanup.
Aside from small numerical differences explained by others, I think the du
and df
utilities serve very different purposes.
add a comment |
df
is generally used to see what the file systems are, how full each is and where they're mounted. Very useful when you're running out of space in a file system, and maybe want to shift things around among the file systems, or buy a bigger disk, etc.
du
shows details of how much cumulative storage each of one's directories is consuming (sort of like windirstat
in Windows). Great for finding where you're hogging up space when trying to do file cleanup.
Aside from small numerical differences explained by others, I think the du
and df
utilities serve very different purposes.
df
is generally used to see what the file systems are, how full each is and where they're mounted. Very useful when you're running out of space in a file system, and maybe want to shift things around among the file systems, or buy a bigger disk, etc.
du
shows details of how much cumulative storage each of one's directories is consuming (sort of like windirstat
in Windows). Great for finding where you're hogging up space when trying to do file cleanup.
Aside from small numerical differences explained by others, I think the du
and df
utilities serve very different purposes.
edited Jan 26 '18 at 2:10
yeti
2,39611224
2,39611224
answered Jan 26 '18 at 0:11
Jim Robertson
211
211
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f120311%2fwhy-are-there-so-many-different-ways-to-measure-disk-usage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown