How are backslashes processed succesively by bash, gawk and gensub()?

up vote
2
down vote

favorite

I have a file

$ cat f2

line 1; li

ne 2$

where note that the last $ is bash prompt, not part of the file content.

I try to concatenate each line which doesn't end in a digit with its next line with gawk. But unlike my previous post, now I try to figure out how backslashes are handled by bash, gawk and gensub(), by experimenting with different number of backslashes in front of the new line character n. I was wondering why gawk commands with more than three backslashes before n fail to find a line not ending in a digit, and succeed when otherwise?
Generally, how are backslashes processed succesively by bash, gawk and gensub()? Thanks.

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2

line 1; li

ne 2

Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?

Take n as example, does bash not modify it (because of single quotes in bash) so gawk sees n? Does gawk modify n to be n so gensub() sees n, and if yes, why can gensub() know it is a newline to match?

edited yesterday

asked 2 days ago

Tim

24.9k70239434

1

Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago

add a comment |

up vote
2
down vote

favorite

I have a file

$ cat f2

line 1; li

ne 2$

where note that the last $ is bash prompt, not part of the file content.

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2

line 1; li

ne 2

Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?

edited yesterday

asked 2 days ago

Tim

24.9k70239434

1

Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago

add a comment |

up vote
2
down vote

favorite

I have a file

$ cat f2

line 1; li

ne 2$

where note that the last $ is bash prompt, not part of the file content.

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2

line 1; li

ne 2

Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?

edited yesterday

asked 2 days ago

Tim

24.9k70239434

I have a file

$ cat f2

line 1; li

ne 2$

where note that the last $ is bash prompt, not part of the file content.

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\n", "\1", "g"); print b}' f2

line 1; line 2

$ gawk 'BEGIN{RS="f"} {b=gensub("([^[:digit:] ]) *\\n", "\1", "g"); print b}' f2

line 1; li

ne 2

Can someone explain what gawk and gensub() see when n, \n, \n, and \\n pass through bash and gawk respectively?

bash awk

edited yesterday

asked 2 days ago

Tim

24.9k70239434

edited yesterday

asked 2 days ago

Tim

24.9k70239434

edited yesterday

asked 2 days ago

Tim

24.9k70239434

asked 2 days ago

Tim

24.9k70239434

asked 2 days ago

Tim

24.9k70239434

1

Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago

add a comment |

1

Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago

Regarding the bash part, if you're using single quotes ('...') then everything including backspaces are preserved as is, so they're all passed down to gawk.
– Filipe Brandenburger
2 days ago

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.

In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .

Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.

What is consistent between implementations is that a \ regexp matches a literal just like . matches a literal .. However for a n regexp, whether that matches a newline character or a n varies with the implementation. In the case of gawk, that matches on newline. So both gensub("n", "x") and gensub("\n", "x") replace newline characters with x, the first one because a literal newline character is passed to gensub(), the second because n is passed to gensub() which is understood as a regexp that matches a newline character.

Note that the POSIX specification used to have several issues when it came to backslash processing in regular expressions in awk. That will be corrected in the next version of the specification. See http://austingroupbugs.net/view.php?id=1105 for details.

It gets even more confused when using /n/ instead of "n".

answered yesterday

Stéphane Chazelas

293k54547888

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481610%2fhow-are-backslashes-processed-succesively-by-bash-gawk-and-gensub%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.

In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .

Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.

It gets even more confused when using /n/ instead of "n".

answered yesterday

Stéphane Chazelas

293k54547888

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

add a comment |

up vote
2
down vote

accepted

In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.

In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .

Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.

It gets even more confused when using /n/ instead of "n".

answered yesterday

Stéphane Chazelas

293k54547888

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

add a comment |

up vote
2
down vote

accepted

In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.

In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .

Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.

It gets even more confused when using /n/ instead of "n".

answered yesterday

Stéphane Chazelas

293k54547888

In bash, '...' are strong quotes, so with 'n', a literal n is passed to awk and with '\n', a literal \n. There's no transformation.

In awk, inside "...", n and \... are expanded. So when passed "n" to gensub() (or print or anything in awk), that's an actual newline character, and when passed "\", that's a .

Now, gensub() also understands its first argument as a regular expression, where also has a special meaning which varies between implementations.

It gets even more confused when using /n/ instead of "n".

answered yesterday

Stéphane Chazelas

293k54547888

answered yesterday

Stéphane Chazelas

293k54547888

answered yesterday

Stéphane Chazelas

293k54547888

answered yesterday

Stéphane Chazelas

293k54547888

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

add a comment |

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

Thanks. I seem to remember you have posted some shell commands for automatically double backslashes in a string (possibly for the same purpose). Do you happen to remember whether and where you have posted it?
– Tim
yesterday

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk