Filtering with awk between a range
up vote
0
down vote
favorite
I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.
My file that I'm extracting from looks like this
CHR# SNP_ID POS samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...
my filtering process looks like this
upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered
It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound)
nothing is printed, and when I change the conditional to be ($3 <= $upper)
it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo}
prints out the correct length, so we can assume there are no hidden characters making it act as a string.
Can any one advise me?
TL;DR trying to grab items with a position between a given range, awk is not working as I expect
bash awk
add a comment |
up vote
0
down vote
favorite
I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.
My file that I'm extracting from looks like this
CHR# SNP_ID POS samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...
my filtering process looks like this
upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered
It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound)
nothing is printed, and when I change the conditional to be ($3 <= $upper)
it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo}
prints out the correct length, so we can assume there are no hidden characters making it act as a string.
Can any one advise me?
TL;DR trying to grab items with a position between a given range, awk is not working as I expect
bash awk
1
the shell variables are not accessible insideawk
. Try something likezcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicitif
andprint
to awk; simply use your condition as the "pattern".
– mosvy
Dec 1 at 0:08
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.
My file that I'm extracting from looks like this
CHR# SNP_ID POS samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...
my filtering process looks like this
upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered
It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound)
nothing is printed, and when I change the conditional to be ($3 <= $upper)
it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo}
prints out the correct length, so we can assume there are no hidden characters making it act as a string.
Can any one advise me?
TL;DR trying to grab items with a position between a given range, awk is not working as I expect
bash awk
I have this code that compares genes to a large list of snps on the same chromosome. For this I only want to compare genes and snps that are within +/- 1000000 bases of each other, but when I try to filter with awk its not working.
My file that I'm extracting from looks like this
CHR# SNP_ID POS samp_1 samp_2 ...
chr1 rs1212 174654646 0 2 ...
chr1 rs1331 321311111 1 1 ...
... ... ... ... ... ...
my filtering process looks like this
upper_bound=$(expr $gene_stop + 1000000)
lower_bound=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 | awk '{if ($3 >= $lower_bound && $3 <= $upper_bound) print $0}' > tmp_filtered
It is currently outputting empty files. When I change the awk conditional to only ($3 >= $lower_bound)
nothing is printed, and when I change the conditional to be ($3 <= $upper)
it prints but doesn't filter anything. I've tried checking that the lower and upper bound variables are reasonable. 1st, manually checking the positions of my snps I see that there are snps that lie in between the two thresholds. 2nd by printing out the length of the variable with ${#foo}
prints out the correct length, so we can assume there are no hidden characters making it act as a string.
Can any one advise me?
TL;DR trying to grab items with a position between a given range, awk is not working as I expect
bash awk
bash awk
edited Dec 1 at 0:45
Rui F Ribeiro
38.5k1479128
38.5k1479128
asked Nov 30 at 23:04
Ryan Schubert
82
82
1
the shell variables are not accessible insideawk
. Try something likezcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicitif
andprint
to awk; simply use your condition as the "pattern".
– mosvy
Dec 1 at 0:08
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29
add a comment |
1
the shell variables are not accessible insideawk
. Try something likezcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicitif
andprint
to awk; simply use your condition as the "pattern".
– mosvy
Dec 1 at 0:08
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29
1
1
the shell variables are not accessible inside
awk
. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicit if
and print
to awk; simply use your condition as the "pattern".– mosvy
Dec 1 at 0:08
the shell variables are not accessible inside
awk
. Try something like zcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicit if
and print
to awk; simply use your condition as the "pattern".– mosvy
Dec 1 at 0:08
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
Shell variables are single quoted. In single quotes, variables are not expanded.
$ start=100
$ echo '$start'
$start
The same happens to awk:
$ start=100
$ echo awk '$3>=$start'
awk $3>=$start
The usual solution is to set the values with -v
:
awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'
So, your script should work with:
up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Shell variables are single quoted. In single quotes, variables are not expanded.
$ start=100
$ echo '$start'
$start
The same happens to awk:
$ start=100
$ echo awk '$3>=$start'
awk $3>=$start
The usual solution is to set the values with -v
:
awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'
So, your script should work with:
up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered
add a comment |
up vote
0
down vote
accepted
Shell variables are single quoted. In single quotes, variables are not expanded.
$ start=100
$ echo '$start'
$start
The same happens to awk:
$ start=100
$ echo awk '$3>=$start'
awk $3>=$start
The usual solution is to set the values with -v
:
awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'
So, your script should work with:
up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Shell variables are single quoted. In single quotes, variables are not expanded.
$ start=100
$ echo '$start'
$start
The same happens to awk:
$ start=100
$ echo awk '$3>=$start'
awk $3>=$start
The usual solution is to set the values with -v
:
awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'
So, your script should work with:
up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered
Shell variables are single quoted. In single quotes, variables are not expanded.
$ start=100
$ echo '$start'
$start
The same happens to awk:
$ start=100
$ echo awk '$3>=$start'
awk $3>=$start
The usual solution is to set the values with -v
:
awk -vvar1=$lower -vvar2=$upper '{if ($3 >= var1 && $3 <= $var2) print $0}'
So, your script should work with:
up_b=$(expr $gene_stop + 1000000)
lo_b=$(expr $gene_start - 1000000)
zcat chr1.genotypes.txt.gz | tail -n +2 |
awk -vlo=$lo_b -vup=$up_b '{if ($3 >= lo && $3 <= up) print $0}' > tmp_filtered
answered Dec 1 at 6:18
Isaac
10.9k11548
10.9k11548
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485273%2ffiltering-with-awk-between-a-range%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
the shell variables are not accessible inside
awk
. Try something likezcat your_file | awk -vstart=$start -v stop=$stop '$3 >= start - 1000000 && $3 <= stop + 1000000'
. Notice that you don't need an explicitif
andprint
to awk; simply use your condition as the "pattern".– mosvy
Dec 1 at 0:08
Please post always input file and expected output
– Praveen Kumar BS
Dec 1 at 6:29