How are backslashes processed succesively by bash, gawk and gensub()?











up vote
2
down vote

favorite












I have a file



$ cat f2
line 1; li
ne 2$


where note that the last $ is bash prompt, not part of the file content.



I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n. I was wondering why gawk commands with more than three backslashes before n fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.



$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2


Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?



Take n as example, does bash not modify it (because of single quotes in bash) so gawk sees n? Does gawk modify n to be n so gensub() sees n, and if yes, why can gensub() know it is a newline to match?










share|improve this question




















  • 1




    Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
    – Filipe Brandenburger
    2 days ago

















up vote
2
down vote

favorite












I have a file



$ cat f2
line 1; li
ne 2$


where note that the last $ is bash prompt, not part of the file content.



I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n. I was wondering why gawk commands with more than three backslashes before n fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.



$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2


Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?



Take n as example, does bash not modify it (because of single quotes in bash) so gawk sees n? Does gawk modify n to be n so gensub() sees n, and if yes, why can gensub() know it is a newline to match?










share|improve this question




















  • 1




    Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
    – Filipe Brandenburger
    2 days ago















up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a file



$ cat f2
line 1; li
ne 2$


where note that the last $ is bash prompt, not part of the file content.



I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n. I was wondering why gawk commands with more than three backslashes before n fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.



$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2


Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?



Take n as example, does bash not modify it (because of single quotes in bash) so gawk sees n? Does gawk modify n to be n so gensub() sees n, and if yes, why can gensub() know it is a newline to match?










share|improve this question















I have a file



$ cat f2
line 1; li
ne 2$


where note that the last $ is bash prompt, not part of the file content.



I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n. I was wondering why gawk commands with more than three backslashes before n fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.



$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2
line 1; line 2
$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2
line 1; li
ne 2


Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?



Take n as example, does bash not modify it (because of single quotes in bash) so gawk sees n? Does gawk modify n to be n so gensub() sees n, and if yes, why can gensub() know it is a newline to match?







bash awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday

























asked 2 days ago









Tim

24.9k70239434




24.9k70239434








  • 1




    Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
    – Filipe Brandenburger
    2 days ago
















  • 1




    Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
    – Filipe Brandenburger
    2 days ago










1




1




Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago






Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.



In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .



Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.



What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.



Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.



It gets even more confused when using /n/ instead of "n".






share|improve this answer





















  • Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
    – Tim
    yesterday













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481610%2fhow-are-backslashes-processed-succesively-by-bash-gawk-and-gensub%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.



In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .



Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.



What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.



Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.



It gets even more confused when using /n/ instead of "n".






share|improve this answer





















  • Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
    – Tim
    yesterday

















up vote
2
down vote



accepted










In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.



In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .



Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.



What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.



Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.



It gets even more confused when using /n/ instead of "n".






share|improve this answer





















  • Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
    – Tim
    yesterday















up vote
2
down vote



accepted







up vote
2
down vote



accepted






In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.



In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .



Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.



What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.



Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.



It gets even more confused when using /n/ instead of "n".






share|improve this answer












In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.



In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .



Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.



What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.



Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.



It gets even more confused when using /n/ instead of "n".







share|improve this answer












share|improve this answer



share|improve this answer










answered yesterday









Stéphane Chazelas

293k54547888




293k54547888












  • Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
    – Tim
    yesterday




















  • Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
    – Tim
    yesterday


















Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday






Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday




















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481610%2fhow-are-backslashes-processed-succesively-by-bash-gawk-and-gensub%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre