System lags when doing large R/W operations on external disks












4















I am having some issues with system-wide latency/lagging when doing large disk imaging operations on an Ubuntu 18.04 system. Here's the system specs:



Processor: Intel Core i7 (never near capacity on any core)



Memory: 12GB (never near capacity)



System disk: SSD (never near capacity)



External disks: USB 3.0 5400 and 7200RPM spinning disks



These large disk imaging operations are basically:



nice ionice dd if=/dev/usbdisk1 of=/dev/usbdisk2



Since none of my system files are on any USB disks, in theory, this shouldn't introduce much latency. But I find when I'm imaging more than one USB disk, the system just comes to a crawl. Why? My understanding is that each disk has its own IO queue, so what's going on here? How can I remedy it?



Also, FWIW, I don't care at all about the imaging speed of the USB disks, so solutions which slow these operations in favor of the system running smoothly are fine by me.










share|improve this question





























    4















    I am having some issues with system-wide latency/lagging when doing large disk imaging operations on an Ubuntu 18.04 system. Here's the system specs:



    Processor: Intel Core i7 (never near capacity on any core)



    Memory: 12GB (never near capacity)



    System disk: SSD (never near capacity)



    External disks: USB 3.0 5400 and 7200RPM spinning disks



    These large disk imaging operations are basically:



    nice ionice dd if=/dev/usbdisk1 of=/dev/usbdisk2



    Since none of my system files are on any USB disks, in theory, this shouldn't introduce much latency. But I find when I'm imaging more than one USB disk, the system just comes to a crawl. Why? My understanding is that each disk has its own IO queue, so what's going on here? How can I remedy it?



    Also, FWIW, I don't care at all about the imaging speed of the USB disks, so solutions which slow these operations in favor of the system running smoothly are fine by me.










    share|improve this question



























      4












      4








      4


      1






      I am having some issues with system-wide latency/lagging when doing large disk imaging operations on an Ubuntu 18.04 system. Here's the system specs:



      Processor: Intel Core i7 (never near capacity on any core)



      Memory: 12GB (never near capacity)



      System disk: SSD (never near capacity)



      External disks: USB 3.0 5400 and 7200RPM spinning disks



      These large disk imaging operations are basically:



      nice ionice dd if=/dev/usbdisk1 of=/dev/usbdisk2



      Since none of my system files are on any USB disks, in theory, this shouldn't introduce much latency. But I find when I'm imaging more than one USB disk, the system just comes to a crawl. Why? My understanding is that each disk has its own IO queue, so what's going on here? How can I remedy it?



      Also, FWIW, I don't care at all about the imaging speed of the USB disks, so solutions which slow these operations in favor of the system running smoothly are fine by me.










      share|improve this question
















      I am having some issues with system-wide latency/lagging when doing large disk imaging operations on an Ubuntu 18.04 system. Here's the system specs:



      Processor: Intel Core i7 (never near capacity on any core)



      Memory: 12GB (never near capacity)



      System disk: SSD (never near capacity)



      External disks: USB 3.0 5400 and 7200RPM spinning disks



      These large disk imaging operations are basically:



      nice ionice dd if=/dev/usbdisk1 of=/dev/usbdisk2



      Since none of my system files are on any USB disks, in theory, this shouldn't introduce much latency. But I find when I'm imaging more than one USB disk, the system just comes to a crawl. Why? My understanding is that each disk has its own IO queue, so what's going on here? How can I remedy it?



      Also, FWIW, I don't care at all about the imaging speed of the USB disks, so solutions which slow these operations in favor of the system running smoothly are fine by me.







      performance dd usb-drive io






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 9 '18 at 11:57









      fduff

      2,64731934




      2,64731934










      asked Oct 9 '18 at 3:54









      Mr. TMr. T

      254




      254






















          1 Answer
          1






          active

          oldest

          votes


















          5















          How can I remedy it?




          When you write disk image, just use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:



          dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress


          NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?



          (An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).




          My understanding is that each disk has its own IO queue, so what's going on here?




          They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...



          EDIT:



          Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.



          The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.



          It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.



          The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.



          Dirty page cache limits



          More likely, your problem is to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).



          If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.



          But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.



          The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.



          I think you mention you have a problem when imaging "more than one USB disk". It might be that the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time.



          Related:



          Some users have observed their whole system lags when they write to slow USB sticks, and found that lowering the overall dirty limit helped avoid the lag. I do not know a good explanation for this.



          Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code?



          Is "writeback throttling" a solution to the "USB-stick stall problem"?






          share|improve this answer


























          • This is an excellent answer, thank you!

            – Mr. T
            Oct 9 '18 at 9:19











          • @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

            – sourcejedi
            Jan 4 at 13:23






          • 1





            Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

            – Mr. T
            Jan 7 at 7:00











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474119%2fsystem-lags-when-doing-large-r-w-operations-on-external-disks%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5















          How can I remedy it?




          When you write disk image, just use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:



          dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress


          NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?



          (An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).




          My understanding is that each disk has its own IO queue, so what's going on here?




          They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...



          EDIT:



          Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.



          The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.



          It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.



          The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.



          Dirty page cache limits



          More likely, your problem is to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).



          If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.



          But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.



          The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.



          I think you mention you have a problem when imaging "more than one USB disk". It might be that the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time.



          Related:



          Some users have observed their whole system lags when they write to slow USB sticks, and found that lowering the overall dirty limit helped avoid the lag. I do not know a good explanation for this.



          Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code?



          Is "writeback throttling" a solution to the "USB-stick stall problem"?






          share|improve this answer


























          • This is an excellent answer, thank you!

            – Mr. T
            Oct 9 '18 at 9:19











          • @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

            – sourcejedi
            Jan 4 at 13:23






          • 1





            Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

            – Mr. T
            Jan 7 at 7:00
















          5















          How can I remedy it?




          When you write disk image, just use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:



          dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress


          NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?



          (An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).




          My understanding is that each disk has its own IO queue, so what's going on here?




          They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...



          EDIT:



          Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.



          The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.



          It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.



          The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.



          Dirty page cache limits



          More likely, your problem is to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).



          If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.



          But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.



          The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.



          I think you mention you have a problem when imaging "more than one USB disk". It might be that the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time.



          Related:



          Some users have observed their whole system lags when they write to slow USB sticks, and found that lowering the overall dirty limit helped avoid the lag. I do not know a good explanation for this.



          Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code?



          Is "writeback throttling" a solution to the "USB-stick stall problem"?






          share|improve this answer


























          • This is an excellent answer, thank you!

            – Mr. T
            Oct 9 '18 at 9:19











          • @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

            – sourcejedi
            Jan 4 at 13:23






          • 1





            Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

            – Mr. T
            Jan 7 at 7:00














          5












          5








          5








          How can I remedy it?




          When you write disk image, just use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:



          dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress


          NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?



          (An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).




          My understanding is that each disk has its own IO queue, so what's going on here?




          They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...



          EDIT:



          Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.



          The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.



          It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.



          The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.



          Dirty page cache limits



          More likely, your problem is to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).



          If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.



          But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.



          The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.



          I think you mention you have a problem when imaging "more than one USB disk". It might be that the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time.



          Related:



          Some users have observed their whole system lags when they write to slow USB sticks, and found that lowering the overall dirty limit helped avoid the lag. I do not know a good explanation for this.



          Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code?



          Is "writeback throttling" a solution to the "USB-stick stall problem"?






          share|improve this answer
















          How can I remedy it?




          When you write disk image, just use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:



          dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress


          NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?



          (An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).




          My understanding is that each disk has its own IO queue, so what's going on here?




          They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...



          EDIT:



          Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.



          The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.



          It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.



          The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.



          Dirty page cache limits



          More likely, your problem is to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).



          If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.



          But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.



          The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.



          I think you mention you have a problem when imaging "more than one USB disk". It might be that the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time.



          Related:



          Some users have observed their whole system lags when they write to slow USB sticks, and found that lowering the overall dirty limit helped avoid the lag. I do not know a good explanation for this.



          Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code?



          Is "writeback throttling" a solution to the "USB-stick stall problem"?







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 4 at 13:22

























          answered Oct 9 '18 at 8:23









          sourcejedisourcejedi

          23.3k437102




          23.3k437102













          • This is an excellent answer, thank you!

            – Mr. T
            Oct 9 '18 at 9:19











          • @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

            – sourcejedi
            Jan 4 at 13:23






          • 1





            Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

            – Mr. T
            Jan 7 at 7:00



















          • This is an excellent answer, thank you!

            – Mr. T
            Oct 9 '18 at 9:19











          • @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

            – sourcejedi
            Jan 4 at 13:23






          • 1





            Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

            – Mr. T
            Jan 7 at 7:00

















          This is an excellent answer, thank you!

          – Mr. T
          Oct 9 '18 at 9:19





          This is an excellent answer, thank you!

          – Mr. T
          Oct 9 '18 at 9:19













          @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

          – sourcejedi
          Jan 4 at 13:23





          @Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more.

          – sourcejedi
          Jan 4 at 13:23




          1




          1





          Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

          – Mr. T
          Jan 7 at 7:00





          Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues.

          – Mr. T
          Jan 7 at 7:00


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474119%2fsystem-lags-when-doing-large-r-w-operations-on-external-disks%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Morgemoulin

          Scott Moir

          Souastre