how to impute the distance to a value












10














I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?



#    x y
#1 0 0
#2 NA 1
#3 0 0
#4 NA 1
#5 NA 2
#6 NA 1
#7 0 0
#8 NA 1
#9 NA 2
#10 NA 3
#11 NA 2
#12 NA 1
#13 0 0


I can't seem to find the right combination of dplyr group_by and mutate row_number() statements to do the trick. The various imputation packages that I've investigated are designed for more complicated scenarios where imputation is performed using statistics and other variables.



d<-data.frame(x=c(0,NA,0,rep(NA,3),0,rep(NA,5),0),y=c(0,1,0,1,2,1,0,1,2,3,2,1,0))









share|improve this question





























    10














    I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?



    #    x y
    #1 0 0
    #2 NA 1
    #3 0 0
    #4 NA 1
    #5 NA 2
    #6 NA 1
    #7 0 0
    #8 NA 1
    #9 NA 2
    #10 NA 3
    #11 NA 2
    #12 NA 1
    #13 0 0


    I can't seem to find the right combination of dplyr group_by and mutate row_number() statements to do the trick. The various imputation packages that I've investigated are designed for more complicated scenarios where imputation is performed using statistics and other variables.



    d<-data.frame(x=c(0,NA,0,rep(NA,3),0,rep(NA,5),0),y=c(0,1,0,1,2,1,0,1,2,3,2,1,0))









    share|improve this question



























      10












      10








      10


      3





      I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?



      #    x y
      #1 0 0
      #2 NA 1
      #3 0 0
      #4 NA 1
      #5 NA 2
      #6 NA 1
      #7 0 0
      #8 NA 1
      #9 NA 2
      #10 NA 3
      #11 NA 2
      #12 NA 1
      #13 0 0


      I can't seem to find the right combination of dplyr group_by and mutate row_number() statements to do the trick. The various imputation packages that I've investigated are designed for more complicated scenarios where imputation is performed using statistics and other variables.



      d<-data.frame(x=c(0,NA,0,rep(NA,3),0,rep(NA,5),0),y=c(0,1,0,1,2,1,0,1,2,3,2,1,0))









      share|improve this question















      I'd like to fill missing values with a "row distance" to the nearest non-NA value. In other words, how would I convert column x in this sample dataframe into column y?



      #    x y
      #1 0 0
      #2 NA 1
      #3 0 0
      #4 NA 1
      #5 NA 2
      #6 NA 1
      #7 0 0
      #8 NA 1
      #9 NA 2
      #10 NA 3
      #11 NA 2
      #12 NA 1
      #13 0 0


      I can't seem to find the right combination of dplyr group_by and mutate row_number() statements to do the trick. The various imputation packages that I've investigated are designed for more complicated scenarios where imputation is performed using statistics and other variables.



      d<-data.frame(x=c(0,NA,0,rep(NA,3),0,rep(NA,5),0),y=c(0,1,0,1,2,1,0,1,2,3,2,1,0))






      r imputation






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 21 '18 at 18:52









      markus

      10.8k1029




      10.8k1029










      asked Dec 21 '18 at 18:07









      Dan Strobridge

      534




      534
























          3 Answers
          3






          active

          oldest

          votes


















          5














          We can use



          d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
          # x y z
          # 1 0 0 0
          # 2 NA 1 1
          # 3 0 0 0
          # 4 NA 1 1
          # 5 NA 2 2
          # 6 NA 1 1
          # 7 0 0 0
          # 8 NA 1 1
          # 9 NA 2 2
          # 10 NA 3 3
          # 11 NA 2 2
          # 12 NA 1 1
          # 13 0 0 0


          If you want to do this in dplyr, you can just wrap the sapply part in a mutate.



          d %>%
          mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))


          or, using also library(purrr) (thanks to @Onyambu):



          d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))





          share|improve this answer























          • very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
            – Dan Strobridge
            Dec 21 '18 at 19:28






          • 1




            d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
            – Onyambu
            Dec 21 '18 at 19:35





















          3














          Here is a way using data.table



          library(data.table)
          setDT(d)
          d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
          d
          # x y out
          # 1: 0 0 0
          # 2: NA 1 1
          # 3: 0 0 0
          # 4: NA 1 1
          # 5: NA 2 2
          # 6: NA 1 1
          # 7: 0 0 0
          # 8: NA 1 1
          # 9: NA 2 2
          #10: NA 3 3
          #11: NA 2 2
          #12: NA 1 1
          #13: 0 0 0


          For each group of NAs we calculation the parallel minimum of cumsum(is.na(x)) and its reverse. That works because the values in the groups of all non-NAs will be 0. Call setDF(d) if you want to continue with a data.frame.



          Instead of calculating cumsum(is.na(x)) twice, we could also do



          d[, out := {
          tmp <- cumsum(is.na(x))
          pmin(tmp, rev(tmp))
          }, by = rleid(is.na(x))]


          This might give a performance gain, but I didn't test.





          Using dplyr syntax this would read



          library(dplyr)
          d %>%
          group_by(grp = data.table::rleid(is.na(x))) %>%
          mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
          ungroup()
          # A tibble: 13 x 4
          # x y grp out
          # <dbl> <dbl> <int> <int>
          # 1 0 0 1 0
          # 2 NA 1 2 1
          # 3 0 0 3 0
          # 4 NA 1 4 1
          # 5 NA 2 4 2
          # 6 NA 1 4 1
          # 7 0 0 5 0
          # 8 NA 1 6 1
          # 9 NA 2 6 2
          #10 NA 3 6 3
          #11 NA 2 6 2
          #12 NA 1 6 1
          #13 0 0 7 0




          The same idea in base R



          rle_x <- rle(is.na(d$x))
          grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)

          transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))





          share|improve this answer



















          • 1




            That's a pretty nice solution
            – Tjebo
            Dec 21 '18 at 18:37



















          1














          Here a solution using vapply



          d$y <- 0
          d$y[is.na(d$x)] <- vapply(which(diff(cumsum(is.na(d$x))) != 0),
          function (k) min(abs(which(diff(cumsum(is.na(d$x))) == 0) - k)),
          numeric(1))
          d
          x y
          1 0 0
          2 NA 1
          3 0 0
          4 NA 1
          5 NA 2
          6 NA 1
          7 0 0
          8 NA 1
          9 NA 2
          10 NA 3
          11 NA 2
          12 NA 1
          13 0 0


          with



          d <- structure(list(x = c(0, NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, 0)), 
          class = "data.frame", row.names = c(NA, -13L))





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53889119%2fhow-to-impute-the-distance-to-a-value%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            We can use



            d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
            # x y z
            # 1 0 0 0
            # 2 NA 1 1
            # 3 0 0 0
            # 4 NA 1 1
            # 5 NA 2 2
            # 6 NA 1 1
            # 7 0 0 0
            # 8 NA 1 1
            # 9 NA 2 2
            # 10 NA 3 3
            # 11 NA 2 2
            # 12 NA 1 1
            # 13 0 0 0


            If you want to do this in dplyr, you can just wrap the sapply part in a mutate.



            d %>%
            mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))


            or, using also library(purrr) (thanks to @Onyambu):



            d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))





            share|improve this answer























            • very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
              – Dan Strobridge
              Dec 21 '18 at 19:28






            • 1




              d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
              – Onyambu
              Dec 21 '18 at 19:35


















            5














            We can use



            d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
            # x y z
            # 1 0 0 0
            # 2 NA 1 1
            # 3 0 0 0
            # 4 NA 1 1
            # 5 NA 2 2
            # 6 NA 1 1
            # 7 0 0 0
            # 8 NA 1 1
            # 9 NA 2 2
            # 10 NA 3 3
            # 11 NA 2 2
            # 12 NA 1 1
            # 13 0 0 0


            If you want to do this in dplyr, you can just wrap the sapply part in a mutate.



            d %>%
            mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))


            or, using also library(purrr) (thanks to @Onyambu):



            d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))





            share|improve this answer























            • very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
              – Dan Strobridge
              Dec 21 '18 at 19:28






            • 1




              d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
              – Onyambu
              Dec 21 '18 at 19:35
















            5












            5








            5






            We can use



            d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
            # x y z
            # 1 0 0 0
            # 2 NA 1 1
            # 3 0 0 0
            # 4 NA 1 1
            # 5 NA 2 2
            # 6 NA 1 1
            # 7 0 0 0
            # 8 NA 1 1
            # 9 NA 2 2
            # 10 NA 3 3
            # 11 NA 2 2
            # 12 NA 1 1
            # 13 0 0 0


            If you want to do this in dplyr, you can just wrap the sapply part in a mutate.



            d %>%
            mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))


            or, using also library(purrr) (thanks to @Onyambu):



            d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))





            share|improve this answer














            We can use



            d$z = sapply(seq_along(d$x), function(z) min(abs(z - which(!is.na(d$x)))))
            # x y z
            # 1 0 0 0
            # 2 NA 1 1
            # 3 0 0 0
            # 4 NA 1 1
            # 5 NA 2 2
            # 6 NA 1 1
            # 7 0 0 0
            # 8 NA 1 1
            # 9 NA 2 2
            # 10 NA 3 3
            # 11 NA 2 2
            # 12 NA 1 1
            # 13 0 0 0


            If you want to do this in dplyr, you can just wrap the sapply part in a mutate.



            d %>%
            mutate(z = sapply(seq_along(x), function(z) min(abs(z - which(!is.na(x))))))


            or, using also library(purrr) (thanks to @Onyambu):



            d %>% mutate(m=map_dbl(1:n(),~min(abs(.x-which(!is.na(x))))))






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 21 '18 at 19:40

























            answered Dec 21 '18 at 18:59









            dww

            14.5k22655




            14.5k22655












            • very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
              – Dan Strobridge
              Dec 21 '18 at 19:28






            • 1




              d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
              – Onyambu
              Dec 21 '18 at 19:35




















            • very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
              – Dan Strobridge
              Dec 21 '18 at 19:28






            • 1




              d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
              – Onyambu
              Dec 21 '18 at 19:35


















            very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
            – Dan Strobridge
            Dec 21 '18 at 19:28




            very useful. however, in my effort to keep my question short and simple, i forgot to mention that i'm a big tidyverse fan and would ideally like something i can use in my dplyr chain. i suspect that i can work with this solution in my chain but i sure wouldn't mind knowing if there's a "tidier" method.
            – Dan Strobridge
            Dec 21 '18 at 19:28




            1




            1




            d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
            – Onyambu
            Dec 21 '18 at 19:35






            d%>%mutate(m = map_dbl(1:n(), ~min(abs(.x - which(!is.na(x))))))
            – Onyambu
            Dec 21 '18 at 19:35















            3














            Here is a way using data.table



            library(data.table)
            setDT(d)
            d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
            d
            # x y out
            # 1: 0 0 0
            # 2: NA 1 1
            # 3: 0 0 0
            # 4: NA 1 1
            # 5: NA 2 2
            # 6: NA 1 1
            # 7: 0 0 0
            # 8: NA 1 1
            # 9: NA 2 2
            #10: NA 3 3
            #11: NA 2 2
            #12: NA 1 1
            #13: 0 0 0


            For each group of NAs we calculation the parallel minimum of cumsum(is.na(x)) and its reverse. That works because the values in the groups of all non-NAs will be 0. Call setDF(d) if you want to continue with a data.frame.



            Instead of calculating cumsum(is.na(x)) twice, we could also do



            d[, out := {
            tmp <- cumsum(is.na(x))
            pmin(tmp, rev(tmp))
            }, by = rleid(is.na(x))]


            This might give a performance gain, but I didn't test.





            Using dplyr syntax this would read



            library(dplyr)
            d %>%
            group_by(grp = data.table::rleid(is.na(x))) %>%
            mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
            ungroup()
            # A tibble: 13 x 4
            # x y grp out
            # <dbl> <dbl> <int> <int>
            # 1 0 0 1 0
            # 2 NA 1 2 1
            # 3 0 0 3 0
            # 4 NA 1 4 1
            # 5 NA 2 4 2
            # 6 NA 1 4 1
            # 7 0 0 5 0
            # 8 NA 1 6 1
            # 9 NA 2 6 2
            #10 NA 3 6 3
            #11 NA 2 6 2
            #12 NA 1 6 1
            #13 0 0 7 0




            The same idea in base R



            rle_x <- rle(is.na(d$x))
            grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)

            transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))





            share|improve this answer



















            • 1




              That's a pretty nice solution
              – Tjebo
              Dec 21 '18 at 18:37
















            3














            Here is a way using data.table



            library(data.table)
            setDT(d)
            d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
            d
            # x y out
            # 1: 0 0 0
            # 2: NA 1 1
            # 3: 0 0 0
            # 4: NA 1 1
            # 5: NA 2 2
            # 6: NA 1 1
            # 7: 0 0 0
            # 8: NA 1 1
            # 9: NA 2 2
            #10: NA 3 3
            #11: NA 2 2
            #12: NA 1 1
            #13: 0 0 0


            For each group of NAs we calculation the parallel minimum of cumsum(is.na(x)) and its reverse. That works because the values in the groups of all non-NAs will be 0. Call setDF(d) if you want to continue with a data.frame.



            Instead of calculating cumsum(is.na(x)) twice, we could also do



            d[, out := {
            tmp <- cumsum(is.na(x))
            pmin(tmp, rev(tmp))
            }, by = rleid(is.na(x))]


            This might give a performance gain, but I didn't test.





            Using dplyr syntax this would read



            library(dplyr)
            d %>%
            group_by(grp = data.table::rleid(is.na(x))) %>%
            mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
            ungroup()
            # A tibble: 13 x 4
            # x y grp out
            # <dbl> <dbl> <int> <int>
            # 1 0 0 1 0
            # 2 NA 1 2 1
            # 3 0 0 3 0
            # 4 NA 1 4 1
            # 5 NA 2 4 2
            # 6 NA 1 4 1
            # 7 0 0 5 0
            # 8 NA 1 6 1
            # 9 NA 2 6 2
            #10 NA 3 6 3
            #11 NA 2 6 2
            #12 NA 1 6 1
            #13 0 0 7 0




            The same idea in base R



            rle_x <- rle(is.na(d$x))
            grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)

            transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))





            share|improve this answer



















            • 1




              That's a pretty nice solution
              – Tjebo
              Dec 21 '18 at 18:37














            3












            3








            3






            Here is a way using data.table



            library(data.table)
            setDT(d)
            d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
            d
            # x y out
            # 1: 0 0 0
            # 2: NA 1 1
            # 3: 0 0 0
            # 4: NA 1 1
            # 5: NA 2 2
            # 6: NA 1 1
            # 7: 0 0 0
            # 8: NA 1 1
            # 9: NA 2 2
            #10: NA 3 3
            #11: NA 2 2
            #12: NA 1 1
            #13: 0 0 0


            For each group of NAs we calculation the parallel minimum of cumsum(is.na(x)) and its reverse. That works because the values in the groups of all non-NAs will be 0. Call setDF(d) if you want to continue with a data.frame.



            Instead of calculating cumsum(is.na(x)) twice, we could also do



            d[, out := {
            tmp <- cumsum(is.na(x))
            pmin(tmp, rev(tmp))
            }, by = rleid(is.na(x))]


            This might give a performance gain, but I didn't test.





            Using dplyr syntax this would read



            library(dplyr)
            d %>%
            group_by(grp = data.table::rleid(is.na(x))) %>%
            mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
            ungroup()
            # A tibble: 13 x 4
            # x y grp out
            # <dbl> <dbl> <int> <int>
            # 1 0 0 1 0
            # 2 NA 1 2 1
            # 3 0 0 3 0
            # 4 NA 1 4 1
            # 5 NA 2 4 2
            # 6 NA 1 4 1
            # 7 0 0 5 0
            # 8 NA 1 6 1
            # 9 NA 2 6 2
            #10 NA 3 6 3
            #11 NA 2 6 2
            #12 NA 1 6 1
            #13 0 0 7 0




            The same idea in base R



            rle_x <- rle(is.na(d$x))
            grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)

            transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))





            share|improve this answer














            Here is a way using data.table



            library(data.table)
            setDT(d)
            d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
            d
            # x y out
            # 1: 0 0 0
            # 2: NA 1 1
            # 3: 0 0 0
            # 4: NA 1 1
            # 5: NA 2 2
            # 6: NA 1 1
            # 7: 0 0 0
            # 8: NA 1 1
            # 9: NA 2 2
            #10: NA 3 3
            #11: NA 2 2
            #12: NA 1 1
            #13: 0 0 0


            For each group of NAs we calculation the parallel minimum of cumsum(is.na(x)) and its reverse. That works because the values in the groups of all non-NAs will be 0. Call setDF(d) if you want to continue with a data.frame.



            Instead of calculating cumsum(is.na(x)) twice, we could also do



            d[, out := {
            tmp <- cumsum(is.na(x))
            pmin(tmp, rev(tmp))
            }, by = rleid(is.na(x))]


            This might give a performance gain, but I didn't test.





            Using dplyr syntax this would read



            library(dplyr)
            d %>%
            group_by(grp = data.table::rleid(is.na(x))) %>%
            mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
            ungroup()
            # A tibble: 13 x 4
            # x y grp out
            # <dbl> <dbl> <int> <int>
            # 1 0 0 1 0
            # 2 NA 1 2 1
            # 3 0 0 3 0
            # 4 NA 1 4 1
            # 5 NA 2 4 2
            # 6 NA 1 4 1
            # 7 0 0 5 0
            # 8 NA 1 6 1
            # 9 NA 2 6 2
            #10 NA 3 6 3
            #11 NA 2 6 2
            #12 NA 1 6 1
            #13 0 0 7 0




            The same idea in base R



            rle_x <- rle(is.na(d$x))
            grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)

            transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 22 '18 at 11:52

























            answered Dec 21 '18 at 18:30









            markus

            10.8k1029




            10.8k1029








            • 1




              That's a pretty nice solution
              – Tjebo
              Dec 21 '18 at 18:37














            • 1




              That's a pretty nice solution
              – Tjebo
              Dec 21 '18 at 18:37








            1




            1




            That's a pretty nice solution
            – Tjebo
            Dec 21 '18 at 18:37




            That's a pretty nice solution
            – Tjebo
            Dec 21 '18 at 18:37











            1














            Here a solution using vapply



            d$y <- 0
            d$y[is.na(d$x)] <- vapply(which(diff(cumsum(is.na(d$x))) != 0),
            function (k) min(abs(which(diff(cumsum(is.na(d$x))) == 0) - k)),
            numeric(1))
            d
            x y
            1 0 0
            2 NA 1
            3 0 0
            4 NA 1
            5 NA 2
            6 NA 1
            7 0 0
            8 NA 1
            9 NA 2
            10 NA 3
            11 NA 2
            12 NA 1
            13 0 0


            with



            d <- structure(list(x = c(0, NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, 0)), 
            class = "data.frame", row.names = c(NA, -13L))





            share|improve this answer


























              1














              Here a solution using vapply



              d$y <- 0
              d$y[is.na(d$x)] <- vapply(which(diff(cumsum(is.na(d$x))) != 0),
              function (k) min(abs(which(diff(cumsum(is.na(d$x))) == 0) - k)),
              numeric(1))
              d
              x y
              1 0 0
              2 NA 1
              3 0 0
              4 NA 1
              5 NA 2
              6 NA 1
              7 0 0
              8 NA 1
              9 NA 2
              10 NA 3
              11 NA 2
              12 NA 1
              13 0 0


              with



              d <- structure(list(x = c(0, NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, 0)), 
              class = "data.frame", row.names = c(NA, -13L))





              share|improve this answer
























                1












                1








                1






                Here a solution using vapply



                d$y <- 0
                d$y[is.na(d$x)] <- vapply(which(diff(cumsum(is.na(d$x))) != 0),
                function (k) min(abs(which(diff(cumsum(is.na(d$x))) == 0) - k)),
                numeric(1))
                d
                x y
                1 0 0
                2 NA 1
                3 0 0
                4 NA 1
                5 NA 2
                6 NA 1
                7 0 0
                8 NA 1
                9 NA 2
                10 NA 3
                11 NA 2
                12 NA 1
                13 0 0


                with



                d <- structure(list(x = c(0, NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, 0)), 
                class = "data.frame", row.names = c(NA, -13L))





                share|improve this answer












                Here a solution using vapply



                d$y <- 0
                d$y[is.na(d$x)] <- vapply(which(diff(cumsum(is.na(d$x))) != 0),
                function (k) min(abs(which(diff(cumsum(is.na(d$x))) == 0) - k)),
                numeric(1))
                d
                x y
                1 0 0
                2 NA 1
                3 0 0
                4 NA 1
                5 NA 2
                6 NA 1
                7 0 0
                8 NA 1
                9 NA 2
                10 NA 3
                11 NA 2
                12 NA 1
                13 0 0


                with



                d <- structure(list(x = c(0, NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, 0)), 
                class = "data.frame", row.names = c(NA, -13L))






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 21 '18 at 19:09









                nate.edwinton

                1,460314




                1,460314






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53889119%2fhow-to-impute-the-distance-to-a-value%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Morgemoulin

                    Scott Moir

                    Souastre