DU & LS – APPARENT SIZE vs DISK USAGE Size – Sparse Files and stuff

NOTE ABOUT UNITS BELOW: I incorrectly state they are called Kilobytes, but in reality they are kibibytes. I have the short hand notation correct. A KiB is 1024 bytes. A real Kilobytes is 1000 bytes. Noone really uses the translation of Kilobytes meaning 1000 bytes, even though its the accurate one, well noone except drive manufacturers. When they tell you your getting a 3 TB drives, your really getting 3000,000,000,000 bytes and not 3298534883328 bytes (Why? it saves them money to sell smaller drives)

APPARENT Size vs DISK USAGE size

LS and DU can both display APPARENT SIZE and DISK USAGE/NORMAL size. DU displays the DISK USAGE size without any arguments, to display APPARENT SIZE you have to give the du –apparent-size argument (command examples below). ls -l gives APPARENT SIZE (in the middle of the output) and ls -s shows DISK USAGE size (on the left side of the output)

* APPARENT size (middle value in “ls -lsk”), is how the size appears to applications.
* DISK USAGE size (left size value in “ls -lsk”), is how much disk space the file takes up (so if you have filesystem compression or deduplication, this size would be smaller). DISK USAGE is also the space that you see when you dont give any special options to LS or DU (so when you dont give -s for LS or –apparent-size for DU). This is the space that is accounted for with DF. (according to highlighted text below: this is the amount of space that cant be taken up by some other file on the filesystem, also this is usually slightly bigger than apparent size because the filesystem has to occupy the last block all the way, so disk usage size is bigger, as filesystems are wasteful. disk usage is multiple integers of the blocksize. – e.g. a file can have 3003 bytes characters which means it will have 3003 bytes apparent size. In a 4k blocksize filesystem it would take up that full first block so 1093 bytes would be wasted, so disk usage would be 4096 bytes – yet only 3003 bytes are useful. Likewise in a 2k blocksize filesystem blocks the first block would be completely used, all 2048 bytes of it, the second block will use 955 bytes that will be useful, and 1093 bytes of that last/second block would be wasted — You can see that this waste can compound with more files on the filesystem. Its safe to assume every file wastes part of that last block size (mathematically we average half of the last blocksize to be wasted). Mathematically you can see the total waste of unused space in blocks by taking the number of files and multiplying by half the block size. So in a filesystem with 1million files with a 4k blocksize filesystem, you would waste 2 million k of data or 2.048 gigabytes)

Most often though without compression or deduplication or sparse files (more on this in a second), APPARENT size is actually smaller than DISK USAGE. Why? DISK USAGE counts the full size of the last (partial) block of the file. The last block though can be partially full. APPARENT size only counts the data thats in the last block.

So you can have a file that appears to be 11 TB, but really takes up 1 TB. If you have compression or Sparse files.

When you have sparse files(think of thin luns or thin vmdks, those are sparse files) then APPARENT SIZE is bigger. Because all 00s, zeroes are virtually written. Check out this explanation on Sparse files:

Exerpt From http://en.wikipedia.org/wiki/Sparse_file

A sparse file is one that attempts to use file system space more efficiently when the file itself is mostly empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual “empty” space which makes up the block, using less disk space. The full block size is written to disk as the actual size only when the block contains “real” (non-empty) data.

Exerpt From http://stackoverflow.com/questions/5694741/why-is-the-output-of-du-often-so-different-from-du-b

Apparent size is the number of bytes your applications think are in the file. It’s the amount of data that would be transferred over the network (not counting protocol headers) if you decided to send the file over FTP or HTTP. It’s also the result of cat theFile | wc -c, and the amount of address space that the file would take up if you loaded the whole thing using mmap.

Disk usage is the amount of space that can’t be used for something else because your file is occupying that space.

In most cases, the apparent size is smaller than the disk usage because the disk usage counts the full size of the last (partial) block of the file, and apparent size only counts the data that’s in that last block. However, apparent size is larger when you have a sparse file (sparse files are created when you seek somewhere past the end of the file, and then write something there — the OS doesn’t bother to create lots of blocks filled with zeros — it only creates a block for the part of the file you decided to write to).

3 Different files types

* A sparse file (think of thin luns if your familiar with lun files), accounts for lun files or vmdks taking up alot of unused space, there will be alot of unused space, filled with NULLS (00 bytes if your looking at the hex code)

Example: So if you had a sparse file/thin lun that was made to be 1 TB big. At first it will have DISK USAGE of 0 bytes, but it will have APPARENT SIZE of 1 TB (it will always have that APPARENT SIZE, it will never change). Now if you were to load up that thin lun and install a 10 GIG Operating System inside it, the DISK USAGE will change to 10 Gig, but the APPARENT SIZE will stay at 1 TB.

Creation of sparse/thin file: We just use truncate

truncate -s <size in KiB>K file1
# Example: "truncate -s 1000K filename" willl create a 1000 KiB file, 1000*1024 bytes
# M is for Mebibytes (1024*1024 bytes). G is for Gebibytes (1024*1024*1024).

Side note: to copy a sparse file use cp –sparse=always srcfile dstfile . Also there are other commands that preserve sparseness. Just make sure you use the commands, preserve sparse feature.

* A normal file, will take up the space when its written
Both APPARENT SIZE and DISK USAGE will grow as more data is written.

Example: When you first create the Lun. It will be DISK USAGE 0 gig and APPARENT SIZE 0 gig. When you write the 10 gig OS, it will be 10 gig DISK USAGE and 10 gig APPARENT SIZE.

Creation of normal file: We just use touch

touch file1

Side note: to copy a normal file use cp srcfile dstfile

* A preallocated file (think of thick lun if your familiar with lun files, will take up all of the space immediately, thus we know that any future writes to that file will only happen to the area which was preallocated)

Example: When you first create a Thick lun. It will create a file that is 1 TB for both DISK USAGE and APPARENT SIZE. When you write the 10 gig OS. The DISK USAGE and APPARENT SIZE will not change. For the user to know how much disk space is used up, the user would have to do that from the OS layer. So go into Windows and see how much of that 1 TB drive is available, the OS within the LUN will have no problem deciphering what is used space and what is not used space.

Creation of Preallocated/Thick file: We use touch and fallocate

# First we create the file
touch file1
# Now we preallocate space to the file
fallocate -l <size in KiB>K file1

Side note: to copy a thick file treated just like a normal file use cp srcfile dstfile

Sidenotes

Sidenote: Remember empty space can be described as 00s. Or its simply space that is not in your extents. Empty space can have data in it. But brand new empty space – such as in new drives – should in general always be zeroed out.

Sidenote: The above 2 out 3 cases describe the scenerios for how you want to create/provision your VMDK files when you use something like VMWARE (below excerpted from https://communities.vmware.com/message/2199560)
* Thick Provision Lazy Zeroed (thick file, that doesnt write in the 00s with 00s, it creates the 0s when it needs to)
Creates a virtual disk in a default thick format. Space required for the virtual disk is allocated when the virtual disk is created. Data remaining on the physical device is not erased during creation, but is zeroed out on demand at a later time on first write from the virtual machine.
Using the default flat virtual disk format does not zero out or eliminate the possibility of recovering deleted files or restoring old data that might be present on this allocated space. You cannot convert a flat disk to a thin disk.
* Thick Provision Eager Zeroed (thick file, that writes in the 00s with 00s at the moment of allocation/creation, it might take much longer to create)
A type of thick virtual disk that supports clustering features such as Fault Tolerance. Space required for the virtual disk is allocated at creation time. In contrast to the flat format, the data remaining on the physical device is zeroed out when the virtual disk is created. It might take much longer to create disks in this format than to create other types of disks.
* Thin Provision
Use this format to save storage space. For the thin disk, you provision as much datastore space as the disk would require based on the value that you enter for the disk size. However, the thin disk starts small and at first, uses only as much datastore space as the disk needs for its initial operations.

Command Examples

So now that we know the difference between APPARENT SIZE and DISK USAGE SIZE (or normal size). Where do we see these spaces in our commands?

With DU:

# in KiB (Kilobytes where 1 KiB is 1024 bytes)
du --apparent-size FILE
du FILE
# in Human Readable
du --apparent-size FILE
du -h FILE
# In Bytes:
du --apparent-size -B1 FILE
du -B1 FILE

Example on a spase file that is 1.1 TB big, but only has 180 gigs used.

# du --apparent-size FILE1
1073741825 FILE1
# du FILE1
188295196 FILE1

# du --apparent-size -h FILE1
1.1T FILE1
# du -h FILE1
180G FILE1

# du --apparent-size -B1 FILE1
1099511628288 FILE1
# du -B1 FILE1
192814280704 FILE1

Running to see the size of all folders in current directory (1 folder deep):

# Bytes:
diff -y <(du -cd1 --apparent-size -B1) <(du -cd1 -B1) | sort -hk1
# Kilobytes (1024 bytes = 1 KiB):
diff -y <(du -cd1 --apparent-size) <(du -cd1) | sort -hk1
# Human Readable:
diff -y <(du -cd1 --apparent-size) <(du -cd1) | sort -hk1

Running to see the size of all folders & files in current directory (1 folder deep):

# Bytes:
diff -y <(du -cad1 --apparent-size -B1) <(du -cad1 -B1) | sort -hk1
# Kilobytes (1024 bytes = 1 KiB):
diff -y <(du -cad1 --apparent-size) <(du -cad1) | sort -hk1
# Human Readable:
diff -y <(du -cad1 --apparent-size) <(du -cad1) | sort -hk1

Options:
-c gives the total at the bottom,
-d only makes sure it goes 1 folder deep, or else there will be alot of output and it will be 1 folder deep
-a will list folders and files

Example of output:

# diff -y <(du -cad1 --apparent-size -B1) <(du -cad1 -B1)
140     ./.profile                                            | 4096    ./.profile
570     ./.bashrc                                             | 4096    ./.bashrc
2132    ./.ssh                                                | 8192    ./.ssh
91      ./keepalivesdf.sh                                     | 4096    ./keepalivesdf.sh
107     ./notes                                               | 4096    ./notes
1524    ./.bash_history                                       | 4096    ./.bash_history
4668    .                                                     | 28672   .
4668    total                                                 | 28672   total

Notice the APPARENT SIZE on the left reflects the real size of the file, and its smaller in these cases where its not a sparse file.

Same output on that FILE1, which is a SPARSE FILE will look like this:

# diff -y <(du -cad1 --apparent-size -h) <(du -cad1 -h)
1.4T    ./FILE1FOLDER                                         | 1.3T    ./FILE1FOLDER
2.1T    ./BACKUP                                              | 239G    ./BACKUP
178M    ./sys.tgz                                             | 178M    ./sys.tgz
16K     ./lost+found                                          | 16K     ./lost+found
34      ./.keepalive                                          | 4.0K    ./.keepalive
6.7T    ./ATTEMPT                                             | 157G    ./ATTEMPT
11T     .                                                     | 1.7T    .
11T     total                                                 | 1.7T    total

NOTE: its hard to tell what is a file or folder with this output. But ill tell you that sys.tgz and .keepalive are files the rest are folders. total line is not a folder, and “.” dot folder is the current folder. Total and current folder are same size because we are calculating the total size of everything in this folder.

So one thing to realize is that all of the above folders are sitting on a 3TB USB. Yet notice how the APPARENT SIZE is 11TB, but the DISK USAGE is 1.7 TB.
its the DISK USAGE that cant go above the available disk space.

.keepalive is a file, sys.tgz is a file.

With LS:

# shows APPARENT SIZE in middle column, in units of bytes (APPARENT SIZE will be in the middle) - NOTICE that default output is in BYTES
ls -l FILE
ls -l --block-size 1 FILE
# shows APPARENT SIZE in middle column, in units of kilobytes (APPARENT SIZE will be in the middle)
ls -lk FILE
# shows APPARENT SIZE in middle column, in units of human readable (APPARENT SIZE will be in the middle)
ls -lh FILE

# shows DISK USAGE on the left most size, in units of Kilobytes (Where 1 KiB is 1024 bytes) - NOTICE that default output is in KILOBYTES (1 KiB = 1024 Bytes)
ls -s FILE
ls -sk FILE
# DISK USAGE is in human readable format
ls -sh FILE

# to see both in kilobytes (DISK USAGE on the left, APPARENT SIZE in the middle)
ls -lsk FILE
# to see both in bytes (DISK USAGE on the left, APPARENT SIZE in the middle)
ls -ls --block-size 1 FILE
# to see both in Human readable (DISK USAGE on the left, APPARENT SIZE in the middle)
ls -lsh FILE

NOTE: can run above without FILE and it will list all of the files in the current folder

# just remember these:
ls -lisah
ls -lisak
ls -lisa --block-size 1

Examples:

# On a SPARSE FILE
# ls -ls --block-size 1 FILE1
192814280704 -rw-r--r--+ 1 root root 1099511628288 Dec 13 18:03 FILE1
# ls -lsh FILE1
180G -rw-r--r--+ 1 root root 1.1T Dec 13 18:03 FILE1

192814280704 bytes and 180G on the left are the DISK USAGE
1099511628288 bytes and 1.1T in the middle is the APPARENT SIZE

Commit to memory:

### DU ###
# Apparent Size (human readable)
du -cd1 --apparent-size -h
# Disk Usage (human readable)
du -cd1 -h
# Both (folders only, 1 depth, human readable)
diff -y <(du -cad1 --apparent-size -h) <(du -cad1 -h)
# Both (folders and files, 1 depth, human readable)
diff -y <(du -cd1 --apparent-size -h) <(du -cd1 -h)

### LS ###
# (DISK USAGE on the left, APPARENT SIZE in the middle)
# units are human readable
ls -lisah
# units in kilobytes (where 1 KiB is 1024 bytes)
ls -lisak
# units in bytes
ls -lisa --block-size 1

TIP

sometimes DU takes forever to run. You can have it run in nohup like so:

# HUMAN
diff -y <(du -chd1 --apparent-size) <(du -chd1) | sort -hk1
# KILOBYTES
diff -y <(du -chd1 --apparent-size) <(du -chd1) | sort -hk1
# BYTES
diff -y <(du -cd1 --apparent-size -B1) <(du -cd1 -B1) | sort -hk1

NOTE ABOUT DEFAULT OUTPUT OF DU and LS:

When not specifying the unit to be shown (so when your not using -k, or –block-size, or -h). The default unit displayed is Kilobytes. THe only exception is “ls -l” DISK USAGE will be displayed in Bytes.

ls -l: DISK USAGE in units of Bytes
ls -s: APPARENT SIZE in units of Kilobytes (Where 1 KiB is 1024 bytes)
du: DISK USAGE in units of Kilobytes (Where 1 KiB is 1024 bytes)
du –apparent-size: APPARENT SIZE in units of Kilobytes (Where 1 KiB is 1024 bytes)

infotinks

My Notes, Articles & Guides for Linux, Windows and Networking.

DU & LS – APPARENT SIZE vs DISK USAGE Size – Sparse Files and stuff

Leave a Reply Cancel reply