Filtering with awk between a range











up vote
0
down vote

favorite












I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.



My file that I'm extracting from looks like this



CHR# SNP_ID    POS     samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...


my filtering process looks like this



upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered


It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound) nothing is printed, and when I change the conditional to be ($3 <= $upper) it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo} prints out the correct length, so we can assume there are no hidden characters making it act as a string.



Can any one advise me?



TL;DR trying to grab items with a position between a given range, awk is not working as I expect










share|improve this question




















  • 1




    the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
    – mosvy
    Dec 1 at 0:08












  • Please post always input file and expected output
    – Praveen Kumar BS
    Dec 1 at 6:29















up vote
0
down vote

favorite












I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.



My file that I'm extracting from looks like this



CHR# SNP_ID    POS     samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...


my filtering process looks like this



upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered


It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound) nothing is printed, and when I change the conditional to be ($3 <= $upper) it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo} prints out the correct length, so we can assume there are no hidden characters making it act as a string.



Can any one advise me?



TL;DR trying to grab items with a position between a given range, awk is not working as I expect










share|improve this question




















  • 1




    the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
    – mosvy
    Dec 1 at 0:08












  • Please post always input file and expected output
    – Praveen Kumar BS
    Dec 1 at 6:29













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.



My file that I'm extracting from looks like this



CHR# SNP_ID    POS     samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...


my filtering process looks like this



upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered


It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound) nothing is printed, and when I change the conditional to be ($3 <= $upper) it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo} prints out the correct length, so we can assume there are no hidden characters making it act as a string.



Can any one advise me?



TL;DR trying to grab items with a position between a given range, awk is not working as I expect










share|improve this question















I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.



My file that I'm extracting from looks like this



CHR# SNP_ID    POS     samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...


my filtering process looks like this



upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered


It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound) nothing is printed, and when I change the conditional to be ($3 <= $upper) it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo} prints out the correct length, so we can assume there are no hidden characters making it act as a string.



Can any one advise me?



TL;DR trying to grab items with a position between a given range, awk is not working as I expect







bash awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 1 at 0:45









Rui F Ribeiro

38.5k1479128




38.5k1479128










asked Nov 30 at 23:04









Ryan Schubert

82




82








  • 1




    the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
    – mosvy
    Dec 1 at 0:08












  • Please post always input file and expected output
    – Praveen Kumar BS
    Dec 1 at 6:29














  • 1




    the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
    – mosvy
    Dec 1 at 0:08












  • Please post always input file and expected output
    – Praveen Kumar BS
    Dec 1 at 6:29








1




1




the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
– mosvy
Dec 1 at 0:08






the shell variables are not accessible inside awk. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'. Notice that you don't need an explicit if and print to awk; simply use your condition as the "pattern".
– mosvy
Dec 1 at 0:08














Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29




Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29










1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










Shell variables are single quoted. In single quotes, variables are not expanded.



$ start=100
$ echo '$start'
$start


The same happens to awk:



$ start=100
$ echo awk '$3>=$start'
awk $3>=$start


The usual solution is to set the values with -v:



awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'


So, your script should work with:



up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered





share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485273%2ffiltering-with-awk-between-a-range%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    Shell variables are single quoted. In single quotes, variables are not expanded.



    $ start=100
    $ echo '$start'
    $start


    The same happens to awk:



    $ start=100
    $ echo awk '$3>=$start'
    awk $3>=$start


    The usual solution is to set the values with -v:



    awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'


    So, your script should work with:



    up_b=$(expr $gene_stop + 1000000)
    lo_b=$(expr $gene_start - 1000000)
    zcat chr1.genotypes.txt.gz | tail -n +2 |
    awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered





    share|improve this answer

























      up vote
      0
      down vote



      accepted










      Shell variables are single quoted. In single quotes, variables are not expanded.



      $ start=100
      $ echo '$start'
      $start


      The same happens to awk:



      $ start=100
      $ echo awk '$3>=$start'
      awk $3>=$start


      The usual solution is to set the values with -v:



      awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'


      So, your script should work with:



      up_b=$(expr $gene_stop + 1000000)
      lo_b=$(expr $gene_start - 1000000)
      zcat chr1.genotypes.txt.gz | tail -n +2 |
      awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered





      share|improve this answer























        up vote
        0
        down vote



        accepted







        up vote
        0
        down vote



        accepted






        Shell variables are single quoted. In single quotes, variables are not expanded.



        $ start=100
        $ echo '$start'
        $start


        The same happens to awk:



        $ start=100
        $ echo awk '$3>=$start'
        awk $3>=$start


        The usual solution is to set the values with -v:



        awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'


        So, your script should work with:



        up_b=$(expr $gene_stop + 1000000)
        lo_b=$(expr $gene_start - 1000000)
        zcat chr1.genotypes.txt.gz | tail -n +2 |
        awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered





        share|improve this answer












        Shell variables are single quoted. In single quotes, variables are not expanded.



        $ start=100
        $ echo '$start'
        $start


        The same happens to awk:



        $ start=100
        $ echo awk '$3>=$start'
        awk $3>=$start


        The usual solution is to set the values with -v:



        awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'


        So, your script should work with:



        up_b=$(expr $gene_stop + 1000000)
        lo_b=$(expr $gene_start - 1000000)
        zcat chr1.genotypes.txt.gz | tail -n +2 |
        awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 1 at 6:18









        Isaac

        10.9k11548




        10.9k11548






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485273%2ffiltering-with-awk-between-a-range%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Morgemoulin

            Scott Moir

            Souastre