Can I use a variable storing a regular expression wherever a regular expression is expected?
up vote
1
down vote
favorite
In Awk, when I store a regular expression in a variable, can I use the variable wherever a regular expression is expected?
The AWK Programming Language by Aho says
Note that the string-matching pattern
/Asia/
is a shorthand for
$O ~ /Asia/
I have a text file:
$ cat f1
line 1; li
ne
2
line 3
lin
e 4
Why do the following two ways work
$ awk -v pat='in' '{if (match($0, pat)) print $0; } ' f1
line 1; li
line 3
lin
$ awk -v pat='in' ' $0 ~ pat {print $0} ' f1
line 1; li
line 3
lin
while the following doesn't
$ awk -v pat='in' ' pat {print $0} ' f1
line 1; li
ne
2
line 3
lin
e 4
?
Thanks.
awk
add a comment |
up vote
1
down vote
favorite
In Awk, when I store a regular expression in a variable, can I use the variable wherever a regular expression is expected?
The AWK Programming Language by Aho says
Note that the string-matching pattern
/Asia/
is a shorthand for
$O ~ /Asia/
I have a text file:
$ cat f1
line 1; li
ne
2
line 3
lin
e 4
Why do the following two ways work
$ awk -v pat='in' '{if (match($0, pat)) print $0; } ' f1
line 1; li
line 3
lin
$ awk -v pat='in' ' $0 ~ pat {print $0} ' f1
line 1; li
line 3
lin
while the following doesn't
$ awk -v pat='in' ' pat {print $0} ' f1
line 1; li
ne
2
line 3
lin
e 4
?
Thanks.
awk
You can't replace the syntax/pattern/
with a variablepat
.
– Kusalananda
Nov 15 at 20:40
Is there a rule governing that?
– Tim
Nov 15 at 20:43
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In Awk, when I store a regular expression in a variable, can I use the variable wherever a regular expression is expected?
The AWK Programming Language by Aho says
Note that the string-matching pattern
/Asia/
is a shorthand for
$O ~ /Asia/
I have a text file:
$ cat f1
line 1; li
ne
2
line 3
lin
e 4
Why do the following two ways work
$ awk -v pat='in' '{if (match($0, pat)) print $0; } ' f1
line 1; li
line 3
lin
$ awk -v pat='in' ' $0 ~ pat {print $0} ' f1
line 1; li
line 3
lin
while the following doesn't
$ awk -v pat='in' ' pat {print $0} ' f1
line 1; li
ne
2
line 3
lin
e 4
?
Thanks.
awk
In Awk, when I store a regular expression in a variable, can I use the variable wherever a regular expression is expected?
The AWK Programming Language by Aho says
Note that the string-matching pattern
/Asia/
is a shorthand for
$O ~ /Asia/
I have a text file:
$ cat f1
line 1; li
ne
2
line 3
lin
e 4
Why do the following two ways work
$ awk -v pat='in' '{if (match($0, pat)) print $0; } ' f1
line 1; li
line 3
lin
$ awk -v pat='in' ' $0 ~ pat {print $0} ' f1
line 1; li
line 3
lin
while the following doesn't
$ awk -v pat='in' ' pat {print $0} ' f1
line 1; li
ne
2
line 3
lin
e 4
?
Thanks.
awk
awk
edited Nov 15 at 20:43
asked Nov 15 at 20:36
Tim
1
1
You can't replace the syntax/pattern/
with a variablepat
.
– Kusalananda
Nov 15 at 20:40
Is there a rule governing that?
– Tim
Nov 15 at 20:43
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48
add a comment |
You can't replace the syntax/pattern/
with a variablepat
.
– Kusalananda
Nov 15 at 20:40
Is there a rule governing that?
– Tim
Nov 15 at 20:43
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48
You can't replace the syntax
/pattern/
with a variable pat
.– Kusalananda
Nov 15 at 20:40
You can't replace the syntax
/pattern/
with a variable pat
.– Kusalananda
Nov 15 at 20:40
Is there a rule governing that?
– Tim
Nov 15 at 20:43
Is there a rule governing that?
– Tim
Nov 15 at 20:43
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
Only /foo/
alone is short for $0 ~ /foo/
.
In ... ~ /.../
or match(/.../, ...)
..., it's only some form of quoting operator for regexps, while in other contexts, it's more an operator that resolves to a number (0 or 1).
That double meaning can be a bit confusing. There are a lot of those double meanings / ambiguities in awk
.
/foo/
expands to 1 or 0 depending on whether $0
matches the foo
regexp or not but "1" ~ /foo/
is not "1" ~ "1"
when $0
happens to match foo
, here /foo/
is no longer short for ($0 ~ /foo/). In the case of
"1" ~ (/foo/)or
"1" ~ +/foo/`, you'll see the behaviour varies between implementations though.
var
is only var
.
var
as a condition means true if the variable is numeric or a numeric string and resolves to a number other than zero or if it's a string and resolves to a non-empty string.
variables declared with -v var=value
are of those that may considered numeric strings if they look like numbers and strings otherwise.
awk -v var=in 'var {print "x"}'
prints x
for every record because in
doesn't look like a number and is not the empty string.
awk -v var=0 'var {print "x"}'
Would not print x
, while:
awk 'BEGIN{var = "0"}; var {print "x"}'
would print x
for every record as var
was explicitly declared as a string variable. So even though it looks like a number, it's not considered as such.
That's another one of those double meanings. A variable may be considered as numerical or string depending on context. See also >
that depending on context is taken as a comparison operator or a redirection operator (which again several ambiguous situations where the behaviour varies between implementations).
Note that you can also do things like:
awk '{print /foo/ + /bar/}'
Which is the same as:
awk '{print ($0 ~ /foo/) + ($0 ~ /bar/)}'
But if using concatenation instead of +
awk '{print /foo/ /bar/}'
that doesn't work as there's again an ambiguity between the /RE/
operator and the /
division operator. When in doubt, use parens:
awk '{print (/foo/) (/bar/)}'
By the way, you should avoid using -v
to store regexps or anything that may contain backslashes, as ANSI escape sequences are expanded in them. Instead, you should use environment variables:
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
for instance.
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) ForRE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just useawk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?
– Tim
Nov 15 at 21:14
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Only /foo/
alone is short for $0 ~ /foo/
.
In ... ~ /.../
or match(/.../, ...)
..., it's only some form of quoting operator for regexps, while in other contexts, it's more an operator that resolves to a number (0 or 1).
That double meaning can be a bit confusing. There are a lot of those double meanings / ambiguities in awk
.
/foo/
expands to 1 or 0 depending on whether $0
matches the foo
regexp or not but "1" ~ /foo/
is not "1" ~ "1"
when $0
happens to match foo
, here /foo/
is no longer short for ($0 ~ /foo/). In the case of
"1" ~ (/foo/)or
"1" ~ +/foo/`, you'll see the behaviour varies between implementations though.
var
is only var
.
var
as a condition means true if the variable is numeric or a numeric string and resolves to a number other than zero or if it's a string and resolves to a non-empty string.
variables declared with -v var=value
are of those that may considered numeric strings if they look like numbers and strings otherwise.
awk -v var=in 'var {print "x"}'
prints x
for every record because in
doesn't look like a number and is not the empty string.
awk -v var=0 'var {print "x"}'
Would not print x
, while:
awk 'BEGIN{var = "0"}; var {print "x"}'
would print x
for every record as var
was explicitly declared as a string variable. So even though it looks like a number, it's not considered as such.
That's another one of those double meanings. A variable may be considered as numerical or string depending on context. See also >
that depending on context is taken as a comparison operator or a redirection operator (which again several ambiguous situations where the behaviour varies between implementations).
Note that you can also do things like:
awk '{print /foo/ + /bar/}'
Which is the same as:
awk '{print ($0 ~ /foo/) + ($0 ~ /bar/)}'
But if using concatenation instead of +
awk '{print /foo/ /bar/}'
that doesn't work as there's again an ambiguity between the /RE/
operator and the /
division operator. When in doubt, use parens:
awk '{print (/foo/) (/bar/)}'
By the way, you should avoid using -v
to store regexps or anything that may contain backslashes, as ANSI escape sequences are expanded in them. Instead, you should use environment variables:
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
for instance.
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) ForRE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just useawk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?
– Tim
Nov 15 at 21:14
add a comment |
up vote
3
down vote
accepted
Only /foo/
alone is short for $0 ~ /foo/
.
In ... ~ /.../
or match(/.../, ...)
..., it's only some form of quoting operator for regexps, while in other contexts, it's more an operator that resolves to a number (0 or 1).
That double meaning can be a bit confusing. There are a lot of those double meanings / ambiguities in awk
.
/foo/
expands to 1 or 0 depending on whether $0
matches the foo
regexp or not but "1" ~ /foo/
is not "1" ~ "1"
when $0
happens to match foo
, here /foo/
is no longer short for ($0 ~ /foo/). In the case of
"1" ~ (/foo/)or
"1" ~ +/foo/`, you'll see the behaviour varies between implementations though.
var
is only var
.
var
as a condition means true if the variable is numeric or a numeric string and resolves to a number other than zero or if it's a string and resolves to a non-empty string.
variables declared with -v var=value
are of those that may considered numeric strings if they look like numbers and strings otherwise.
awk -v var=in 'var {print "x"}'
prints x
for every record because in
doesn't look like a number and is not the empty string.
awk -v var=0 'var {print "x"}'
Would not print x
, while:
awk 'BEGIN{var = "0"}; var {print "x"}'
would print x
for every record as var
was explicitly declared as a string variable. So even though it looks like a number, it's not considered as such.
That's another one of those double meanings. A variable may be considered as numerical or string depending on context. See also >
that depending on context is taken as a comparison operator or a redirection operator (which again several ambiguous situations where the behaviour varies between implementations).
Note that you can also do things like:
awk '{print /foo/ + /bar/}'
Which is the same as:
awk '{print ($0 ~ /foo/) + ($0 ~ /bar/)}'
But if using concatenation instead of +
awk '{print /foo/ /bar/}'
that doesn't work as there's again an ambiguity between the /RE/
operator and the /
division operator. When in doubt, use parens:
awk '{print (/foo/) (/bar/)}'
By the way, you should avoid using -v
to store regexps or anything that may contain backslashes, as ANSI escape sequences are expanded in them. Instead, you should use environment variables:
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
for instance.
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) ForRE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just useawk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?
– Tim
Nov 15 at 21:14
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Only /foo/
alone is short for $0 ~ /foo/
.
In ... ~ /.../
or match(/.../, ...)
..., it's only some form of quoting operator for regexps, while in other contexts, it's more an operator that resolves to a number (0 or 1).
That double meaning can be a bit confusing. There are a lot of those double meanings / ambiguities in awk
.
/foo/
expands to 1 or 0 depending on whether $0
matches the foo
regexp or not but "1" ~ /foo/
is not "1" ~ "1"
when $0
happens to match foo
, here /foo/
is no longer short for ($0 ~ /foo/). In the case of
"1" ~ (/foo/)or
"1" ~ +/foo/`, you'll see the behaviour varies between implementations though.
var
is only var
.
var
as a condition means true if the variable is numeric or a numeric string and resolves to a number other than zero or if it's a string and resolves to a non-empty string.
variables declared with -v var=value
are of those that may considered numeric strings if they look like numbers and strings otherwise.
awk -v var=in 'var {print "x"}'
prints x
for every record because in
doesn't look like a number and is not the empty string.
awk -v var=0 'var {print "x"}'
Would not print x
, while:
awk 'BEGIN{var = "0"}; var {print "x"}'
would print x
for every record as var
was explicitly declared as a string variable. So even though it looks like a number, it's not considered as such.
That's another one of those double meanings. A variable may be considered as numerical or string depending on context. See also >
that depending on context is taken as a comparison operator or a redirection operator (which again several ambiguous situations where the behaviour varies between implementations).
Note that you can also do things like:
awk '{print /foo/ + /bar/}'
Which is the same as:
awk '{print ($0 ~ /foo/) + ($0 ~ /bar/)}'
But if using concatenation instead of +
awk '{print /foo/ /bar/}'
that doesn't work as there's again an ambiguity between the /RE/
operator and the /
division operator. When in doubt, use parens:
awk '{print (/foo/) (/bar/)}'
By the way, you should avoid using -v
to store regexps or anything that may contain backslashes, as ANSI escape sequences are expanded in them. Instead, you should use environment variables:
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
for instance.
Only /foo/
alone is short for $0 ~ /foo/
.
In ... ~ /.../
or match(/.../, ...)
..., it's only some form of quoting operator for regexps, while in other contexts, it's more an operator that resolves to a number (0 or 1).
That double meaning can be a bit confusing. There are a lot of those double meanings / ambiguities in awk
.
/foo/
expands to 1 or 0 depending on whether $0
matches the foo
regexp or not but "1" ~ /foo/
is not "1" ~ "1"
when $0
happens to match foo
, here /foo/
is no longer short for ($0 ~ /foo/). In the case of
"1" ~ (/foo/)or
"1" ~ +/foo/`, you'll see the behaviour varies between implementations though.
var
is only var
.
var
as a condition means true if the variable is numeric or a numeric string and resolves to a number other than zero or if it's a string and resolves to a non-empty string.
variables declared with -v var=value
are of those that may considered numeric strings if they look like numbers and strings otherwise.
awk -v var=in 'var {print "x"}'
prints x
for every record because in
doesn't look like a number and is not the empty string.
awk -v var=0 'var {print "x"}'
Would not print x
, while:
awk 'BEGIN{var = "0"}; var {print "x"}'
would print x
for every record as var
was explicitly declared as a string variable. So even though it looks like a number, it's not considered as such.
That's another one of those double meanings. A variable may be considered as numerical or string depending on context. See also >
that depending on context is taken as a comparison operator or a redirection operator (which again several ambiguous situations where the behaviour varies between implementations).
Note that you can also do things like:
awk '{print /foo/ + /bar/}'
Which is the same as:
awk '{print ($0 ~ /foo/) + ($0 ~ /bar/)}'
But if using concatenation instead of +
awk '{print /foo/ /bar/}'
that doesn't work as there's again an ambiguity between the /RE/
operator and the /
division operator. When in doubt, use parens:
awk '{print (/foo/) (/bar/)}'
By the way, you should avoid using -v
to store regexps or anything that may contain backslashes, as ANSI escape sequences are expanded in them. Instead, you should use environment variables:
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
for instance.
edited Nov 15 at 22:43
answered Nov 15 at 20:51
Stéphane Chazelas
294k54551893
294k54551893
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) ForRE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just useawk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?
– Tim
Nov 15 at 21:14
add a comment |
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) ForRE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just useawk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?
– Tim
Nov 15 at 21:14
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) For
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just use awk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?– Tim
Nov 15 at 21:14
Thanks. (1) If I am correct, in a pattern-action statement, the pattern can be an expression which can be a regular expression or not a regular expression. So using a variable as a pattern is not using it where only regular expression is expected. That's the reason of it not working. (2) For
RE='.txt$' awk '$0 ~ ENVIRON["RE"] {...}'
, can I just use awk -v RE='\.txt$' '$0 ~ RE {...}'
(doubling the backslash) equally well?– Tim
Nov 15 at 21:14
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482036%2fcan-i-use-a-variable-storing-a-regular-expression-wherever-a-regular-expression%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You can't replace the syntax
/pattern/
with a variablepat
.– Kusalananda
Nov 15 at 20:40
Is there a rule governing that?
– Tim
Nov 15 at 20:43
@Tim The grammar for the language disallows it. What you have in your non-working example is an expression that evaluates to true (it's non-zero), therefore all lines are printed.
– Kusalananda
Nov 15 at 20:48