Scripting for text processing: Delete a set of lines only if entire pattern matches












1














I want to delete a set of lines (globally) only if the entire pattern matches.



Pattern Description:



Line1:^[#]+ .*



Line2:^[[:space:]]*$



Line3:^-[[:space:]]*$



Line4:^[[:space:]]*$



Line5:^[#]+ .*$|^[-]+[[:space:]]*$



Note:




  1. Line3 can have space(s) after -

  2. Line2 and Line4 may have a space character or should be blank

  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$

  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.


Example:



# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book


Expected output:



# Body

- Inside Body

# Bibliography

- Read this book




Note: The provided solution works, is it possible to write it more clearly as follows:



e = '(^|n)[#]+ .*
n[t ]*
n-[t ]*
n[t ]*
n([#]+ .*|[-]+[t ]*)n'


Also, how can we do the provided solution for multiple occurrences of the multiline pattern?










share|improve this question
























  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
    – goodguy5
    Dec 18 at 13:31










  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
    – Nikhil
    Dec 18 at 13:33










  • Thank you. Corrected.
    – Nikhil
    Dec 18 at 13:36










  • Related: sed multiple lines
    – goodguy5
    Dec 18 at 13:42












  • Is this document in a known format? Does it have a parser?
    – Kusalananda
    Dec 19 at 13:37
















1














I want to delete a set of lines (globally) only if the entire pattern matches.



Pattern Description:



Line1:^[#]+ .*



Line2:^[[:space:]]*$



Line3:^-[[:space:]]*$



Line4:^[[:space:]]*$



Line5:^[#]+ .*$|^[-]+[[:space:]]*$



Note:




  1. Line3 can have space(s) after -

  2. Line2 and Line4 may have a space character or should be blank

  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$

  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.


Example:



# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book


Expected output:



# Body

- Inside Body

# Bibliography

- Read this book




Note: The provided solution works, is it possible to write it more clearly as follows:



e = '(^|n)[#]+ .*
n[t ]*
n-[t ]*
n[t ]*
n([#]+ .*|[-]+[t ]*)n'


Also, how can we do the provided solution for multiple occurrences of the multiline pattern?










share|improve this question
























  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
    – goodguy5
    Dec 18 at 13:31










  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
    – Nikhil
    Dec 18 at 13:33










  • Thank you. Corrected.
    – Nikhil
    Dec 18 at 13:36










  • Related: sed multiple lines
    – goodguy5
    Dec 18 at 13:42












  • Is this document in a known format? Does it have a parser?
    – Kusalananda
    Dec 19 at 13:37














1












1








1







I want to delete a set of lines (globally) only if the entire pattern matches.



Pattern Description:



Line1:^[#]+ .*



Line2:^[[:space:]]*$



Line3:^-[[:space:]]*$



Line4:^[[:space:]]*$



Line5:^[#]+ .*$|^[-]+[[:space:]]*$



Note:




  1. Line3 can have space(s) after -

  2. Line2 and Line4 may have a space character or should be blank

  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$

  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.


Example:



# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book


Expected output:



# Body

- Inside Body

# Bibliography

- Read this book




Note: The provided solution works, is it possible to write it more clearly as follows:



e = '(^|n)[#]+ .*
n[t ]*
n-[t ]*
n[t ]*
n([#]+ .*|[-]+[t ]*)n'


Also, how can we do the provided solution for multiple occurrences of the multiline pattern?










share|improve this question















I want to delete a set of lines (globally) only if the entire pattern matches.



Pattern Description:



Line1:^[#]+ .*



Line2:^[[:space:]]*$



Line3:^-[[:space:]]*$



Line4:^[[:space:]]*$



Line5:^[#]+ .*$|^[-]+[[:space:]]*$



Note:




  1. Line3 can have space(s) after -

  2. Line2 and Line4 may have a space character or should be blank

  3. Line5, either matches ^[#]+ .*$ or ^[-]+[[:space:]]*$

  4. I don't want to delete the last line of the pattern i.e. Line5 in the pattern description.


Example:



# Body

- Inside Body

# Summary

-

# Bibliography

- Read this book


Expected output:



# Body

- Inside Body

# Bibliography

- Read this book




Note: The provided solution works, is it possible to write it more clearly as follows:



e = '(^|n)[#]+ .*
n[t ]*
n-[t ]*
n[t ]*
n([#]+ .*|[-]+[t ]*)n'


Also, how can we do the provided solution for multiple occurrences of the multiline pattern?







shell-script text-processing awk sed python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 19 at 13:07

























asked Dec 18 at 13:09









Nikhil

24319




24319












  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
    – goodguy5
    Dec 18 at 13:31










  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
    – Nikhil
    Dec 18 at 13:33










  • Thank you. Corrected.
    – Nikhil
    Dec 18 at 13:36










  • Related: sed multiple lines
    – goodguy5
    Dec 18 at 13:42












  • Is this document in a known format? Does it have a parser?
    – Kusalananda
    Dec 19 at 13:37


















  • Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
    – goodguy5
    Dec 18 at 13:31










  • I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
    – Nikhil
    Dec 18 at 13:33










  • Thank you. Corrected.
    – Nikhil
    Dec 18 at 13:36










  • Related: sed multiple lines
    – goodguy5
    Dec 18 at 13:42












  • Is this document in a known format? Does it have a parser?
    – Kusalananda
    Dec 19 at 13:37
















Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
– goodguy5
Dec 18 at 13:31




Do you know the line terminator that will be present? Also would an answer using awk (or any other test processing tool) be acceptable?
– goodguy5
Dec 18 at 13:31












I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
– Nikhil
Dec 18 at 13:33




I would be happy if its portable to both Windows and uni, if not possible Unix would be preferable. Other scripting languages are also good like awk, python, javascript
– Nikhil
Dec 18 at 13:33












Thank you. Corrected.
– Nikhil
Dec 18 at 13:36




Thank you. Corrected.
– Nikhil
Dec 18 at 13:36












Related: sed multiple lines
– goodguy5
Dec 18 at 13:42






Related: sed multiple lines
– goodguy5
Dec 18 at 13:42














Is this document in a known format? Does it have a parser?
– Kusalananda
Dec 19 at 13:37




Is this document in a known format? Does it have a parser?
– Kusalananda
Dec 19 at 13:37










1 Answer
1






active

oldest

votes


















2














A python solution, should work for python2 or 3.
reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [t ].



#!/usr/bin/python3

import sys
import re
e='(^|n)[#]+ .*n[t ]*n-[t ]*n[t ]*n([#]+ .*|[-]+[t ]*)n'
print(re.sub(e, '\1\2n', sys.stdin.read()))





share|improve this answer





















  • Clarification: Can we do inplace substitution when, read from a file?
    – Nikhil
    Dec 18 at 14:34












  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Dec 18 at 14:44










  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Nikhil
    Dec 18 at 15:04










  • Additionally, please see the Note at the end of the question.
    – Nikhil
    Dec 19 at 13:05












  • I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Nikhil
    Dec 19 at 13:29











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f489685%2fscripting-for-text-processing-delete-a-set-of-lines-only-if-entire-pattern-matc%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














A python solution, should work for python2 or 3.
reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [t ].



#!/usr/bin/python3

import sys
import re
e='(^|n)[#]+ .*n[t ]*n-[t ]*n[t ]*n([#]+ .*|[-]+[t ]*)n'
print(re.sub(e, '\1\2n', sys.stdin.read()))





share|improve this answer





















  • Clarification: Can we do inplace substitution when, read from a file?
    – Nikhil
    Dec 18 at 14:34












  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Dec 18 at 14:44










  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Nikhil
    Dec 18 at 15:04










  • Additionally, please see the Note at the end of the question.
    – Nikhil
    Dec 19 at 13:05












  • I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Nikhil
    Dec 19 at 13:29
















2














A python solution, should work for python2 or 3.
reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [t ].



#!/usr/bin/python3

import sys
import re
e='(^|n)[#]+ .*n[t ]*n-[t ]*n[t ]*n([#]+ .*|[-]+[t ]*)n'
print(re.sub(e, '\1\2n', sys.stdin.read()))





share|improve this answer





















  • Clarification: Can we do inplace substitution when, read from a file?
    – Nikhil
    Dec 18 at 14:34












  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Dec 18 at 14:44










  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Nikhil
    Dec 18 at 15:04










  • Additionally, please see the Note at the end of the question.
    – Nikhil
    Dec 19 at 13:05












  • I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Nikhil
    Dec 19 at 13:29














2












2








2






A python solution, should work for python2 or 3.
reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [t ].



#!/usr/bin/python3

import sys
import re
e='(^|n)[#]+ .*n[t ]*n-[t ]*n[t ]*n([#]+ .*|[-]+[t ]*)n'
print(re.sub(e, '\1\2n', sys.stdin.read()))





share|improve this answer












A python solution, should work for python2 or 3.
reads from stdin, outputs to stdout. About the only thing I did was change the expression for [[:space:]] to [t ].



#!/usr/bin/python3

import sys
import re
e='(^|n)[#]+ .*n[t ]*n-[t ]*n[t ]*n([#]+ .*|[-]+[t ]*)n'
print(re.sub(e, '\1\2n', sys.stdin.read()))






share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 18 at 14:20









icarus

5,6611929




5,6611929












  • Clarification: Can we do inplace substitution when, read from a file?
    – Nikhil
    Dec 18 at 14:34












  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Dec 18 at 14:44










  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Nikhil
    Dec 18 at 15:04










  • Additionally, please see the Note at the end of the question.
    – Nikhil
    Dec 19 at 13:05












  • I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Nikhil
    Dec 19 at 13:29


















  • Clarification: Can we do inplace substitution when, read from a file?
    – Nikhil
    Dec 18 at 14:34












  • linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
    – icarus
    Dec 18 at 14:44










  • Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
    – Nikhil
    Dec 18 at 15:04










  • Additionally, please see the Note at the end of the question.
    – Nikhil
    Dec 19 at 13:05












  • I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
    – Nikhil
    Dec 19 at 13:29
















Clarification: Can we do inplace substitution when, read from a file?
– Nikhil
Dec 18 at 14:34






Clarification: Can we do inplace substitution when, read from a file?
– Nikhil
Dec 18 at 14:34














linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
– icarus
Dec 18 at 14:44




linux doesn't have any primitives to do this, as the file changes length. So you need to write to a temporary file and rename. See stackoverflow.com/questions/42429320/… for examples of modules to do it.
– icarus
Dec 18 at 14:44












Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
– Nikhil
Dec 18 at 15:04




Thanks, I did this and it works: python Code.py < ./Input.md > ./Output.md
– Nikhil
Dec 18 at 15:04












Additionally, please see the Note at the end of the question.
– Nikhil
Dec 19 at 13:05






Additionally, please see the Note at the end of the question.
– Nikhil
Dec 19 at 13:05














I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
– Nikhil
Dec 19 at 13:29




I tried to do global replacement using this print(re.sub(e, '\1\2n', sys.stdin.read(), flags=re.MULTILINE)) but it does only once. Could you please have a look? I am using Python3.6
– Nikhil
Dec 19 at 13:29


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f489685%2fscripting-for-text-processing-delete-a-set-of-lines-only-if-entire-pattern-matc%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Scott Moir

Souastre

Morgemoulin