How are backslashes processed succesively by bash, gawk and gensub()?
up vote
2
down vote
favorite
I have a file
$ cat f2
line 1; li
ne 2$
where note that the last $
is bash prompt, not part of the file content.
I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n
. I was wondering why gawk commands with more than three backslashes before n
fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2
Can someone explain what gawk and gensub() see when n
, \n
, \n
, and \\n
pass through bash and gawk respectively?
Take n
as example, does bash not modify it (because of single quotes in bash) so gawk sees n
? Does gawk modify n
to be n
so gensub() sees n
, and if yes, why can gensub() know it is a newline to match?
bash awk
add a comment |
up vote
2
down vote
favorite
I have a file
$ cat f2
line 1; li
ne 2$
where note that the last $
is bash prompt, not part of the file content.
I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n
. I was wondering why gawk commands with more than three backslashes before n
fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2
Can someone explain what gawk and gensub() see when n
, \n
, \n
, and \\n
pass through bash and gawk respectively?
Take n
as example, does bash not modify it (because of single quotes in bash) so gawk sees n
? Does gawk modify n
to be n
so gensub() sees n
, and if yes, why can gensub() know it is a newline to match?
bash awk
1
Regarding the bash part, if you're using single quotes ('...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a file
$ cat f2
line 1; li
ne 2$
where note that the last $
is bash prompt, not part of the file content.
I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n
. I was wondering why gawk commands with more than three backslashes before n
fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2
Can someone explain what gawk and gensub() see when n
, \n
, \n
, and \\n
pass through bash and gawk respectively?
Take n
as example, does bash not modify it (because of single quotes in bash) so gawk sees n
? Does gawk modify n
to be n
so gensub() sees n
, and if yes, why can gensub() know it is a newline to match?
bash awk
I have a file
$ cat f2
line 1; li
ne 2$
where note that the last $
is bash prompt, not part of the file content.
I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n
. I was wondering why gawk commands with more than three backslashes before n
fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2
Can someone explain what gawk and gensub() see when n
, \n
, \n
, and \\n
pass through bash and gawk respectively?
Take n
as example, does bash not modify it (because of single quotes in bash) so gawk sees n
? Does gawk modify n
to be n
so gensub() sees n
, and if yes, why can gensub() know it is a newline to match?
bash awk
bash awk
edited yesterday
asked 2 days ago
Tim
24.9k70239434
24.9k70239434
1
Regarding the bash part, if you're using single quotes ('...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago
add a comment |
1
Regarding the bash part, if you're using single quotes ('...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago
1
1
Regarding the bash part, if you're using single quotes (
'...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.– Filipe Brandenburger
2 days ago
Regarding the bash part, if you're using single quotes (
'...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.– Filipe Brandenburger
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
In bash
, '...'
are strong quotes, so with 'n'
, a literal n
is passed to awk
and with '\n'
, a literal \n
. There's no transformation.
In awk
, inside "..."
, n
and \
... are expanded. So when passed "n"
to gensub()
(or print
or anything in awk
), that's an actual newline character, and when passed "\"
, that's a .
Now, gensub()
also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.
What is consistent between implementations is that a \
regexp matches a literal just like
.
matches a literal .
. However for a n
regexp, whether that matches a newline character or a n
varies with the implementation. In the case of gawk
, that matches on newline. So both gensub("n", "x")
and gensub("\n", "x")
replace newline characters with x
, the first one because a literal newline character is passed to gensub()
, the second because n
is passed to gensub()
which is understood as a regexp that matches a newline character.
Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk
. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.
It gets even more confused when using /n/
instead of "n"
.
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
In bash
, '...'
are strong quotes, so with 'n'
, a literal n
is passed to awk
and with '\n'
, a literal \n
. There's no transformation.
In awk
, inside "..."
, n
and \
... are expanded. So when passed "n"
to gensub()
(or print
or anything in awk
), that's an actual newline character, and when passed "\"
, that's a .
Now, gensub()
also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.
What is consistent between implementations is that a \
regexp matches a literal just like
.
matches a literal .
. However for a n
regexp, whether that matches a newline character or a n
varies with the implementation. In the case of gawk
, that matches on newline. So both gensub("n", "x")
and gensub("\n", "x")
replace newline characters with x
, the first one because a literal newline character is passed to gensub()
, the second because n
is passed to gensub()
which is understood as a regexp that matches a newline character.
Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk
. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.
It gets even more confused when using /n/
instead of "n"
.
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
add a comment |
up vote
2
down vote
accepted
In bash
, '...'
are strong quotes, so with 'n'
, a literal n
is passed to awk
and with '\n'
, a literal \n
. There's no transformation.
In awk
, inside "..."
, n
and \
... are expanded. So when passed "n"
to gensub()
(or print
or anything in awk
), that's an actual newline character, and when passed "\"
, that's a .
Now, gensub()
also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.
What is consistent between implementations is that a \
regexp matches a literal just like
.
matches a literal .
. However for a n
regexp, whether that matches a newline character or a n
varies with the implementation. In the case of gawk
, that matches on newline. So both gensub("n", "x")
and gensub("\n", "x")
replace newline characters with x
, the first one because a literal newline character is passed to gensub()
, the second because n
is passed to gensub()
which is understood as a regexp that matches a newline character.
Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk
. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.
It gets even more confused when using /n/
instead of "n"
.
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
In bash
, '...'
are strong quotes, so with 'n'
, a literal n
is passed to awk
and with '\n'
, a literal \n
. There's no transformation.
In awk
, inside "..."
, n
and \
... are expanded. So when passed "n"
to gensub()
(or print
or anything in awk
), that's an actual newline character, and when passed "\"
, that's a .
Now, gensub()
also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.
What is consistent between implementations is that a \
regexp matches a literal just like
.
matches a literal .
. However for a n
regexp, whether that matches a newline character or a n
varies with the implementation. In the case of gawk
, that matches on newline. So both gensub("n", "x")
and gensub("\n", "x")
replace newline characters with x
, the first one because a literal newline character is passed to gensub()
, the second because n
is passed to gensub()
which is understood as a regexp that matches a newline character.
Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk
. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.
It gets even more confused when using /n/
instead of "n"
.
In bash
, '...'
are strong quotes, so with 'n'
, a literal n
is passed to awk
and with '\n'
, a literal \n
. There's no transformation.
In awk
, inside "..."
, n
and \
... are expanded. So when passed "n"
to gensub()
(or print
or anything in awk
), that's an actual newline character, and when passed "\"
, that's a .
Now, gensub()
also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.
What is consistent between implementations is that a \
regexp matches a literal just like
.
matches a literal .
. However for a n
regexp, whether that matches a newline character or a n
varies with the implementation. In the case of gawk
, that matches on newline. So both gensub("n", "x")
and gensub("\n", "x")
replace newline characters with x
, the first one because a literal newline character is passed to gensub()
, the second because n
is passed to gensub()
which is understood as a regexp that matches a newline character.
Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk
. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.
It gets even more confused when using /n/
instead of "n"
.
answered yesterday
Stéphane Chazelas
293k54547888
293k54547888
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
add a comment |
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481610%2fhow-are-backslashes-processed-succesively-by-bash-gawk-and-gensub%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Regarding the bash part, if you're using single quotes (
'...'
) then everything including backspaces are preserved as is, so they're all passed down to gawk.– Filipe Brandenburger
2 days ago