Cache behavior in dd command

I am performing some dd outputs and running in parallel vmstat:

With no direct write (i.e. via cache)

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,31332 s, 160 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2659604 383244 3596884    0    0     3    47   34   14 27  7 66  0  0

 1  0     64 2509432 383244 3746080    0    0     0     0 1005 2278  5 20 75  0  0

 0  0     64 2452560 383248 3807932    0    0     4 204880 1175 2321  4 12 75  9  0

 0  0     64 2453144 383248 3807548    0    0     0     0  814 2677  5  2 93  0  0

 1  0     64 2444868 383248 3814516    0    0     0   244  529 1746  4  2 94  0  0

 0  0     64 2445756 383248 3814516    0    0     0     0  495 1957  3  1 96  0  0

I see that it performed more or less one bulk write

With direct write

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 oflag=direct

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,6902 s, 124 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2623556 383248 3603572    0    0     3    47   35   14 27  7 66  0  0

 1  0     64 2613784 383248 3611220    0    0     0 88064 1001 2573  5 15 79  1  0

 0  0     64 2612236 383256 3611804    0    0     8 116736  912 2033  1 18 78  3  0

 4  0     64 2621076 383256 3604232    0    0     0    96 1086 3250  8  3 89  0  0

My question is the following:

How does this fragmentation of direct writing operation occurs?
What is the (kernel?) parameter that determined the file should be broken (when not passing via cache) into 2 chunks of 88 and 116 MB (and not say into 4 chunks of 50MB)?

Would the results be different if i had used the oflag=sync instead of direct?

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

1

You are using vmstat to get the stats output once every second (for 1000 seconds), why do you give significance to the fact that in one run of dd, all data was written more or less during one such interval while in the other run, the interval ended partway through the operation?
– Kusalananda
Dec 15 at 8:42

because after multiple experiments this behavior seems consistent; after repeating it with 1G output,using the direct flag seems to break it into more or less 10 writes of 100MB each while not using it caused 3-4 writes (albeit uneven in size); what is more, not using the direct flag causes the writes to be done immediately (in terms of the bo appearing right away in the vmstat output)
– pkaramol
Dec 15 at 8:47

If you are really interested in why the kernel does things this way, I'd (1) read kernel code, (2) use kernel tracing, e.g. perf or or ftrace. I wouldn't expect the high-level behaviour you observe to have a simple explanation or a single parameter (but I don't know).
– dirkt
Dec 15 at 9:28

add a comment |

I am performing some dd outputs and running in parallel vmstat:

With no direct write (i.e. via cache)

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,31332 s, 160 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2659604 383244 3596884    0    0     3    47   34   14 27  7 66  0  0

 1  0     64 2509432 383244 3746080    0    0     0     0 1005 2278  5 20 75  0  0

 0  0     64 2452560 383248 3807932    0    0     4 204880 1175 2321  4 12 75  9  0

 0  0     64 2453144 383248 3807548    0    0     0     0  814 2677  5  2 93  0  0

 1  0     64 2444868 383248 3814516    0    0     0   244  529 1746  4  2 94  0  0

 0  0     64 2445756 383248 3814516    0    0     0     0  495 1957  3  1 96  0  0

I see that it performed more or less one bulk write

With direct write

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 oflag=direct

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,6902 s, 124 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2623556 383248 3603572    0    0     3    47   35   14 27  7 66  0  0

 1  0     64 2613784 383248 3611220    0    0     0 88064 1001 2573  5 15 79  1  0

 0  0     64 2612236 383256 3611804    0    0     8 116736  912 2033  1 18 78  3  0

 4  0     64 2621076 383256 3604232    0    0     0    96 1086 3250  8  3 89  0  0

My question is the following:

Would the results be different if i had used the oflag=sync instead of direct?

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

1

You are using vmstat to get the stats output once every second (for 1000 seconds), why do you give significance to the fact that in one run of dd, all data was written more or less during one such interval while in the other run, the interval ended partway through the operation?
– Kusalananda
Dec 15 at 8:42

because after multiple experiments this behavior seems consistent; after repeating it with 1G output,using the direct flag seems to break it into more or less 10 writes of 100MB each while not using it caused 3-4 writes (albeit uneven in size); what is more, not using the direct flag causes the writes to be done immediately (in terms of the bo appearing right away in the vmstat output)
– pkaramol
Dec 15 at 8:47

If you are really interested in why the kernel does things this way, I'd (1) read kernel code, (2) use kernel tracing, e.g. perf or or ftrace. I wouldn't expect the high-level behaviour you observe to have a simple explanation or a single parameter (but I don't know).
– dirkt
Dec 15 at 9:28

add a comment |

I am performing some dd outputs and running in parallel vmstat:

With no direct write (i.e. via cache)

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,31332 s, 160 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2659604 383244 3596884    0    0     3    47   34   14 27  7 66  0  0

 1  0     64 2509432 383244 3746080    0    0     0     0 1005 2278  5 20 75  0  0

 0  0     64 2452560 383248 3807932    0    0     4 204880 1175 2321  4 12 75  9  0

 0  0     64 2453144 383248 3807548    0    0     0     0  814 2677  5  2 93  0  0

 1  0     64 2444868 383248 3814516    0    0     0   244  529 1746  4  2 94  0  0

 0  0     64 2445756 383248 3814516    0    0     0     0  495 1957  3  1 96  0  0

I see that it performed more or less one bulk write

With direct write

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 oflag=direct

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,6902 s, 124 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2623556 383248 3603572    0    0     3    47   35   14 27  7 66  0  0

 1  0     64 2613784 383248 3611220    0    0     0 88064 1001 2573  5 15 79  1  0

 0  0     64 2612236 383256 3611804    0    0     8 116736  912 2033  1 18 78  3  0

 4  0     64 2621076 383256 3604232    0    0     0    96 1086 3250  8  3 89  0  0

My question is the following:

Would the results be different if i had used the oflag=sync instead of direct?

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

I am performing some dd outputs and running in parallel vmstat:

With no direct write (i.e. via cache)

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,31332 s, 160 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2659604 383244 3596884    0    0     3    47   34   14 27  7 66  0  0

 1  0     64 2509432 383244 3746080    0    0     0     0 1005 2278  5 20 75  0  0

 0  0     64 2452560 383248 3807932    0    0     4 204880 1175 2321  4 12 75  9  0

 0  0     64 2453144 383248 3807548    0    0     0     0  814 2677  5  2 93  0  0

 1  0     64 2444868 383248 3814516    0    0     0   244  529 1746  4  2 94  0  0

 0  0     64 2445756 383248 3814516    0    0     0     0  495 1957  3  1 96  0  0

I see that it performed more or less one bulk write

With direct write

$ dd if=/dev/urandom of=somefile.txt bs=1M count=200 oflag=direct

200+0 records in

200+0 records out

209715200 bytes (210 MB, 200 MiB) copied, 1,6902 s, 124 MB/s







$ vmstat 1 1000

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 0  0     64 2623556 383248 3603572    0    0     3    47   35   14 27  7 66  0  0

 1  0     64 2613784 383248 3611220    0    0     0 88064 1001 2573  5 15 79  1  0

 0  0     64 2612236 383256 3611804    0    0     8 116736  912 2033  1 18 78  3  0

 4  0     64 2621076 383256 3604232    0    0     0    96 1086 3250  8  3 89  0  0

My question is the following:

Would the results be different if i had used the oflag=sync instead of direct?

memory dd io cache

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

edited Dec 15 at 8:06

asked Dec 15 at 7:50

pkaramol

449216

asked Dec 15 at 7:50

pkaramol

449216

asked Dec 15 at 7:50

pkaramol

449216

1

You are using vmstat to get the stats output once every second (for 1000 seconds), why do you give significance to the fact that in one run of dd, all data was written more or less during one such interval while in the other run, the interval ended partway through the operation?
– Kusalananda
Dec 15 at 8:42

because after multiple experiments this behavior seems consistent; after repeating it with 1G output,using the direct flag seems to break it into more or less 10 writes of 100MB each while not using it caused 3-4 writes (albeit uneven in size); what is more, not using the direct flag causes the writes to be done immediately (in terms of the bo appearing right away in the vmstat output)
– pkaramol
Dec 15 at 8:47

If you are really interested in why the kernel does things this way, I'd (1) read kernel code, (2) use kernel tracing, e.g. perf or or ftrace. I wouldn't expect the high-level behaviour you observe to have a simple explanation or a single parameter (but I don't know).
– dirkt
Dec 15 at 9:28

add a comment |

1

You are using vmstat to get the stats output once every second (for 1000 seconds), why do you give significance to the fact that in one run of dd, all data was written more or less during one such interval while in the other run, the interval ended partway through the operation?
– Kusalananda
Dec 15 at 8:42

because after multiple experiments this behavior seems consistent; after repeating it with 1G output,using the direct flag seems to break it into more or less 10 writes of 100MB each while not using it caused 3-4 writes (albeit uneven in size); what is more, not using the direct flag causes the writes to be done immediately (in terms of the bo appearing right away in the vmstat output)
– pkaramol
Dec 15 at 8:47

If you are really interested in why the kernel does things this way, I'd (1) read kernel code, (2) use kernel tracing, e.g. perf or or ftrace. I wouldn't expect the high-level behaviour you observe to have a simple explanation or a single parameter (but I don't know).
– dirkt
Dec 15 at 9:28

You are using vmstat to get the stats output once every second (for 1000 seconds), why do you give significance to the fact that in one run of dd, all data was written more or less during one such interval while in the other run, the interval ended partway through the operation?
– Kusalananda
Dec 15 at 8:42

because after multiple experiments this behavior seems consistent; after repeating it with 1G output,using the direct flag seems to break it into more or less 10 writes of 100MB each while not using it caused 3-4 writes (albeit uneven in size); what is more, not using the direct flag causes the writes to be done immediately (in terms of the bo appearing right away in the vmstat output)
– pkaramol
Dec 15 at 8:47

If you are really interested in why the kernel does things this way, I'd (1) read kernel code, (2) use kernel tracing, e.g. perf or or ftrace. I wouldn't expect the high-level behaviour you observe to have a simple explanation or a single parameter (but I don't know).
– dirkt
Dec 15 at 9:28

add a comment |

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f488105%2fcache-behavior-in-dd-command%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk