How to extract data between two different xml tags











up vote
2
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question




















  • 3




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14















up vote
2
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question




















  • 3




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question















I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






text-processing xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 '17 at 23:36

























asked Feb 27 '17 at 1:37









averagescripter

2124




2124








  • 3




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14














  • 3




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14








3




3




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14










5 Answers
5






active

oldest

votes

















up vote
1
down vote













As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



#!/usr/bin/env python
# coding: ascii
"""extract.py

Extract everything between two XML tags
in a (possibly poorly formed) XML document."""

from bs4 import BeautifulSoup
import sys

# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"

# Set the closing tag name
closing_name = "dateAccessed"

# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')

# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass

# Output the results on a single line
print("".join(elements))


You would run it something like this:



python extract.py data.xml


For your given example data:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


It produces the following output:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





share|improve this answer




























    up vote
    1
    down vote













    Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:



    xmlstarlet sel -t -m '//ID[. = 2]' 
    -c . -c './following-sibling::*[position()<5]' -nl file.xml


    For the given data (modified to insert <root> at the start and </root> at the end), this would return



    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


    The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).



    The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:



    { echo '<root>'; cat file.xml; echo '</root>'; } |
    xmlstarlet sel -t -m '//ID[. = 2]'
    -c . -c './following-sibling::*[position()<5]' -nl





    share|improve this answer






























      up vote
      -1
      down vote













      Grep



      grep -oE '<data>[^<]*</data>' yourxmlfile


      Bash



      tag='data'
      tL="<$tag>" tR="</$tag>"
      xml=$(< yourxmlfile)
      while case $xml in *"$tL"* ) :;; * ) break;; esac; do
      t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}
      echo "${tL}${t2}${tR}"
      done


      Perl



      perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


      Sed



      sed -e "
      s|<$tag>|n&|
      s/.*n//
      s|</$tag>|&n|
      /n/P;D
      " yourxmlfile


      Output



       <data>asdf</data>
      <data>asdf</data>
      <data>asdf</data>
      <data>asdf</data>





      share|improve this answer






























        up vote
        -2
        down vote













        if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



        awk -F"[<>]" '{print $3}' input.txt


        if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



        awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt





        share|improve this answer




























          up vote
          -4
          down vote













          provided XML has no line breaks.
          why don't you try inserting n between >< which will make the XML in standard format



          Example:-
          i have created a file called stack with the given xml.



          below is the sed operation to introduce line breaks.



           cat stack|sed -e 's/></>n</g'

          <ID>2</ID>
          <data>asdf</data>
          <data2>asdf</data2>
          <dataX>asdf</dataX>
          <dateAccessed>somedate</dateAccessed>


          now you can access the tags you want






          share|improve this answer





















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            5 Answers
            5






            active

            oldest

            votes








            5 Answers
            5






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



            With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



            #!/usr/bin/env python
            # coding: ascii
            """extract.py

            Extract everything between two XML tags
            in a (possibly poorly formed) XML document."""

            from bs4 import BeautifulSoup
            import sys

            # Set the opening tag name and value
            opening_name = "ID"
            opening_text = "2"

            # Set the closing tag name
            closing_name = "dateAccessed"

            # Get the XML data from a file and instantiate a BeautifulSoup parser
            # We add a root node because the input data is missing a root
            with open(sys.argv[1], 'r') as xmlfile:
            xmldoc = "<root>" + xmlfile.read() + "</root>"
            soup = BeautifulSoup(xmldoc, 'xml')

            # Iterate through the elements of the XML data and collect
            # all of the elements inbetween the opening and closing tags
            elements =
            match = False
            for e in soup.find_all():
            if match is True:
            elements.append(str(e))
            if e.name==closing_name:
            break
            else:
            try:
            if e.name==opening_name and e.text==opening_text:
            match = True
            elements.append(str(e))
            except AttributeError:
            pass

            # Output the results on a single line
            print("".join(elements))


            You would run it something like this:



            python extract.py data.xml


            For your given example data:



            <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


            It produces the following output:



            <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





            share|improve this answer

























              up vote
              1
              down vote













              As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



              With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



              #!/usr/bin/env python
              # coding: ascii
              """extract.py

              Extract everything between two XML tags
              in a (possibly poorly formed) XML document."""

              from bs4 import BeautifulSoup
              import sys

              # Set the opening tag name and value
              opening_name = "ID"
              opening_text = "2"

              # Set the closing tag name
              closing_name = "dateAccessed"

              # Get the XML data from a file and instantiate a BeautifulSoup parser
              # We add a root node because the input data is missing a root
              with open(sys.argv[1], 'r') as xmlfile:
              xmldoc = "<root>" + xmlfile.read() + "</root>"
              soup = BeautifulSoup(xmldoc, 'xml')

              # Iterate through the elements of the XML data and collect
              # all of the elements inbetween the opening and closing tags
              elements =
              match = False
              for e in soup.find_all():
              if match is True:
              elements.append(str(e))
              if e.name==closing_name:
              break
              else:
              try:
              if e.name==opening_name and e.text==opening_text:
              match = True
              elements.append(str(e))
              except AttributeError:
              pass

              # Output the results on a single line
              print("".join(elements))


              You would run it something like this:



              python extract.py data.xml


              For your given example data:



              <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


              It produces the following output:



              <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





              share|improve this answer























                up vote
                1
                down vote










                up vote
                1
                down vote









                As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                #!/usr/bin/env python
                # coding: ascii
                """extract.py

                Extract everything between two XML tags
                in a (possibly poorly formed) XML document."""

                from bs4 import BeautifulSoup
                import sys

                # Set the opening tag name and value
                opening_name = "ID"
                opening_text = "2"

                # Set the closing tag name
                closing_name = "dateAccessed"

                # Get the XML data from a file and instantiate a BeautifulSoup parser
                # We add a root node because the input data is missing a root
                with open(sys.argv[1], 'r') as xmlfile:
                xmldoc = "<root>" + xmlfile.read() + "</root>"
                soup = BeautifulSoup(xmldoc, 'xml')

                # Iterate through the elements of the XML data and collect
                # all of the elements inbetween the opening and closing tags
                elements =
                match = False
                for e in soup.find_all():
                if match is True:
                elements.append(str(e))
                if e.name==closing_name:
                break
                else:
                try:
                if e.name==opening_name and e.text==opening_text:
                match = True
                elements.append(str(e))
                except AttributeError:
                pass

                # Output the results on a single line
                print("".join(elements))


                You would run it something like this:



                python extract.py data.xml


                For your given example data:



                <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                It produces the following output:



                <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                share|improve this answer












                As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                #!/usr/bin/env python
                # coding: ascii
                """extract.py

                Extract everything between two XML tags
                in a (possibly poorly formed) XML document."""

                from bs4 import BeautifulSoup
                import sys

                # Set the opening tag name and value
                opening_name = "ID"
                opening_text = "2"

                # Set the closing tag name
                closing_name = "dateAccessed"

                # Get the XML data from a file and instantiate a BeautifulSoup parser
                # We add a root node because the input data is missing a root
                with open(sys.argv[1], 'r') as xmlfile:
                xmldoc = "<root>" + xmlfile.read() + "</root>"
                soup = BeautifulSoup(xmldoc, 'xml')

                # Iterate through the elements of the XML data and collect
                # all of the elements inbetween the opening and closing tags
                elements =
                match = False
                for e in soup.find_all():
                if match is True:
                elements.append(str(e))
                if e.name==closing_name:
                break
                else:
                try:
                if e.name==opening_name and e.text==opening_text:
                match = True
                elements.append(str(e))
                except AttributeError:
                pass

                # Output the results on a single line
                print("".join(elements))


                You would run it something like this:



                python extract.py data.xml


                For your given example data:



                <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                It produces the following output:



                <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 11 at 22:23









                igal

                5,0871031




                5,0871031
























                    up vote
                    1
                    down vote













                    Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:



                    xmlstarlet sel -t -m '//ID[. = 2]' 
                    -c . -c './following-sibling::*[position()<5]' -nl file.xml


                    For the given data (modified to insert <root> at the start and </root> at the end), this would return



                    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                    The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).



                    The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:



                    { echo '<root>'; cat file.xml; echo '</root>'; } |
                    xmlstarlet sel -t -m '//ID[. = 2]'
                    -c . -c './following-sibling::*[position()<5]' -nl





                    share|improve this answer



























                      up vote
                      1
                      down vote













                      Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:



                      xmlstarlet sel -t -m '//ID[. = 2]' 
                      -c . -c './following-sibling::*[position()<5]' -nl file.xml


                      For the given data (modified to insert <root> at the start and </root> at the end), this would return



                      <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                      The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).



                      The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:



                      { echo '<root>'; cat file.xml; echo '</root>'; } |
                      xmlstarlet sel -t -m '//ID[. = 2]'
                      -c . -c './following-sibling::*[position()<5]' -nl





                      share|improve this answer

























                        up vote
                        1
                        down vote










                        up vote
                        1
                        down vote









                        Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:



                        xmlstarlet sel -t -m '//ID[. = 2]' 
                        -c . -c './following-sibling::*[position()<5]' -nl file.xml


                        For the given data (modified to insert <root> at the start and </root> at the end), this would return



                        <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                        The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).



                        The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:



                        { echo '<root>'; cat file.xml; echo '</root>'; } |
                        xmlstarlet sel -t -m '//ID[. = 2]'
                        -c . -c './following-sibling::*[position()<5]' -nl





                        share|improve this answer














                        Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:



                        xmlstarlet sel -t -m '//ID[. = 2]' 
                        -c . -c './following-sibling::*[position()<5]' -nl file.xml


                        For the given data (modified to insert <root> at the start and </root> at the end), this would return



                        <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                        The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).



                        The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:



                        { echo '<root>'; cat file.xml; echo '</root>'; } |
                        xmlstarlet sel -t -m '//ID[. = 2]'
                        -c . -c './following-sibling::*[position()<5]' -nl






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Nov 23 at 23:29

























                        answered Nov 23 at 23:22









                        Kusalananda

                        118k16222360




                        118k16222360






















                            up vote
                            -1
                            down vote













                            Grep



                            grep -oE '<data>[^<]*</data>' yourxmlfile


                            Bash



                            tag='data'
                            tL="<$tag>" tR="</$tag>"
                            xml=$(< yourxmlfile)
                            while case $xml in *"$tL"* ) :;; * ) break;; esac; do
                            t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}
                            echo "${tL}${t2}${tR}"
                            done


                            Perl



                            perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


                            Sed



                            sed -e "
                            s|<$tag>|n&|
                            s/.*n//
                            s|</$tag>|&n|
                            /n/P;D
                            " yourxmlfile


                            Output



                             <data>asdf</data>
                            <data>asdf</data>
                            <data>asdf</data>
                            <data>asdf</data>





                            share|improve this answer



























                              up vote
                              -1
                              down vote













                              Grep



                              grep -oE '<data>[^<]*</data>' yourxmlfile


                              Bash



                              tag='data'
                              tL="<$tag>" tR="</$tag>"
                              xml=$(< yourxmlfile)
                              while case $xml in *"$tL"* ) :;; * ) break;; esac; do
                              t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}
                              echo "${tL}${t2}${tR}"
                              done


                              Perl



                              perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


                              Sed



                              sed -e "
                              s|<$tag>|n&|
                              s/.*n//
                              s|</$tag>|&n|
                              /n/P;D
                              " yourxmlfile


                              Output



                               <data>asdf</data>
                              <data>asdf</data>
                              <data>asdf</data>
                              <data>asdf</data>





                              share|improve this answer

























                                up vote
                                -1
                                down vote










                                up vote
                                -1
                                down vote









                                Grep



                                grep -oE '<data>[^<]*</data>' yourxmlfile


                                Bash



                                tag='data'
                                tL="<$tag>" tR="</$tag>"
                                xml=$(< yourxmlfile)
                                while case $xml in *"$tL"* ) :;; * ) break;; esac; do
                                t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}
                                echo "${tL}${t2}${tR}"
                                done


                                Perl



                                perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


                                Sed



                                sed -e "
                                s|<$tag>|n&|
                                s/.*n//
                                s|</$tag>|&n|
                                /n/P;D
                                " yourxmlfile


                                Output



                                 <data>asdf</data>
                                <data>asdf</data>
                                <data>asdf</data>
                                <data>asdf</data>





                                share|improve this answer














                                Grep



                                grep -oE '<data>[^<]*</data>' yourxmlfile


                                Bash



                                tag='data'
                                tL="<$tag>" tR="</$tag>"
                                xml=$(< yourxmlfile)
                                while case $xml in *"$tL"* ) :;; * ) break;; esac; do
                                t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}
                                echo "${tL}${t2}${tR}"
                                done


                                Perl



                                perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


                                Sed



                                sed -e "
                                s|<$tag>|n&|
                                s/.*n//
                                s|</$tag>|&n|
                                /n/P;D
                                " yourxmlfile


                                Output



                                 <data>asdf</data>
                                <data>asdf</data>
                                <data>asdf</data>
                                <data>asdf</data>






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Feb 27 '17 at 4:33

























                                answered Feb 27 '17 at 3:48









                                Rakesh Sharma

                                62013




                                62013






















                                    up vote
                                    -2
                                    down vote













                                    if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                                    awk -F"[<>]" '{print $3}' input.txt


                                    if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                                    awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt





                                    share|improve this answer

























                                      up vote
                                      -2
                                      down vote













                                      if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                                      awk -F"[<>]" '{print $3}' input.txt


                                      if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                                      awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt





                                      share|improve this answer























                                        up vote
                                        -2
                                        down vote










                                        up vote
                                        -2
                                        down vote









                                        if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                                        awk -F"[<>]" '{print $3}' input.txt


                                        if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                                        awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt





                                        share|improve this answer












                                        if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                                        awk -F"[<>]" '{print $3}' input.txt


                                        if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                                        awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Feb 27 '17 at 3:32









                                        Kamaraj

                                        2,9161513




                                        2,9161513






















                                            up vote
                                            -4
                                            down vote













                                            provided XML has no line breaks.
                                            why don't you try inserting n between >< which will make the XML in standard format



                                            Example:-
                                            i have created a file called stack with the given xml.



                                            below is the sed operation to introduce line breaks.



                                             cat stack|sed -e 's/></>n</g'

                                            <ID>2</ID>
                                            <data>asdf</data>
                                            <data2>asdf</data2>
                                            <dataX>asdf</dataX>
                                            <dateAccessed>somedate</dateAccessed>


                                            now you can access the tags you want






                                            share|improve this answer

























                                              up vote
                                              -4
                                              down vote













                                              provided XML has no line breaks.
                                              why don't you try inserting n between >< which will make the XML in standard format



                                              Example:-
                                              i have created a file called stack with the given xml.



                                              below is the sed operation to introduce line breaks.



                                               cat stack|sed -e 's/></>n</g'

                                              <ID>2</ID>
                                              <data>asdf</data>
                                              <data2>asdf</data2>
                                              <dataX>asdf</dataX>
                                              <dateAccessed>somedate</dateAccessed>


                                              now you can access the tags you want






                                              share|improve this answer























                                                up vote
                                                -4
                                                down vote










                                                up vote
                                                -4
                                                down vote









                                                provided XML has no line breaks.
                                                why don't you try inserting n between >< which will make the XML in standard format



                                                Example:-
                                                i have created a file called stack with the given xml.



                                                below is the sed operation to introduce line breaks.



                                                 cat stack|sed -e 's/></>n</g'

                                                <ID>2</ID>
                                                <data>asdf</data>
                                                <data2>asdf</data2>
                                                <dataX>asdf</dataX>
                                                <dateAccessed>somedate</dateAccessed>


                                                now you can access the tags you want






                                                share|improve this answer












                                                provided XML has no line breaks.
                                                why don't you try inserting n between >< which will make the XML in standard format



                                                Example:-
                                                i have created a file called stack with the given xml.



                                                below is the sed operation to introduce line breaks.



                                                 cat stack|sed -e 's/></>n</g'

                                                <ID>2</ID>
                                                <data>asdf</data>
                                                <data2>asdf</data2>
                                                <dataX>asdf</dataX>
                                                <dateAccessed>somedate</dateAccessed>


                                                now you can access the tags you want







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Oct 18 '17 at 7:08









                                                user256118

                                                1




                                                1






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.





                                                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                    Please pay close attention to the following guidance:


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Morgemoulin

                                                    Scott Moir

                                                    Souastre