How to extract data between two different xml tags

up vote
2
down vote

favorite

I have looked but haven't been able to find anyone else with the same sort of problem I have.

I have an xml file like this:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.

I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?

EDIT:

I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID

So I can get something like this result:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

3

I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14

add a comment |

up vote
2
down vote

favorite

I have looked but haven't been able to find anyone else with the same sort of problem I have.

I have an xml file like this:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?

EDIT:

So I can get something like this result:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

3

I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14

add a comment |

up vote
2
down vote

favorite

I have looked but haven't been able to find anyone else with the same sort of problem I have.

I have an xml file like this:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?

EDIT:

So I can get something like this result:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

I have looked but haven't been able to find anyone else with the same sort of problem I have.

I have an xml file like this:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?

EDIT:

So I can get something like this result:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

text-processing xml

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

edited Feb 27 '17 at 23:36

asked Feb 27 '17 at 1:37

averagescripter

2124

asked Feb 27 '17 at 1:37

averagescripter

2124

asked Feb 27 '17 at 1:37

averagescripter

2124

3

I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14

add a comment |

3

I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14

I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14

add a comment |

5 Answers
5

active

oldest

votes

up vote
1
down vote

As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?

With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):

#!/usr/bin/env python

# coding: ascii

"""extract.py



Extract everything between two XML tags

in a (possibly poorly formed) XML document."""



from bs4 import BeautifulSoup

import sys



# Set the opening tag name and value

opening_name = "ID"

opening_text = "2"



# Set the closing tag name

closing_name = "dateAccessed"



# Get the XML data from a file and instantiate a BeautifulSoup parser

# We add a root node because the input data is missing a root

with open(sys.argv[1], 'r') as xmlfile:

    xmldoc = "<root>" + xmlfile.read() + "</root>"

    soup = BeautifulSoup(xmldoc, 'xml')



# Iterate through the elements of the XML data and collect

# all of the elements inbetween the opening and closing tags

elements = 

match = False

for e in soup.find_all():

    if match is True:

        elements.append(str(e))

        if e.name==closing_name:

            break

    else:

        try:

            if e.name==opening_name and e.text==opening_text:

                match = True

                elements.append(str(e))

        except AttributeError:

            pass



# Output the results on a single line

print("".join(elements))

You would run it something like this:

python extract.py data.xml

For your given example data:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

It produces the following output:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

answered Jul 11 at 22:23

igal

5,0871031

add a comment |

up vote
1
down vote

Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl file.xml

For the given data (modified to insert <root> at the start and </root> at the end), this would return

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

The XMLstarlet query selects any ID node whose contents is 2 (-m '//ID[. = 2]'). For each of these nodes (only one in the given data), it returns a copy of the node itself (-c .) along with a copy of the following five sibling nodes (-c './following-sibling::*[position()<5]'), ending the output by inserting a newline (-nl).

The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:

{ echo '<root>'; cat file.xml; echo '</root>'; } |

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

add a comment |

up vote
-1
down vote

Grep

grep -oE '<data>[^<]*</data>' yourxmlfile

Bash

tag='data'

tL="<$tag>" tR="</$tag>"

xml=$(< yourxmlfile)

while case $xml in *"$tL"* ) :;; * ) break;; esac; do

  t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}

  echo "${tL}${t2}${tR}"

done

Perl

perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile

Sed

sed -e "

  s|<$tag>|n&|

  s/.*n//

  s|</$tag>|&n|

  /n/P;D

" yourxmlfile

Output

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

add a comment |

up vote
-2
down vote

if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this

awk -F"[<>]" '{print $3}' input.txt

if you want to search for specific tag, then try this awk command. you need to change the value of input=ID

awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

add a comment |

up vote
-4
down vote

provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format

Example:-
i have created a file called stack with the given xml.

below is the sed operation to introduce line breaks.

 cat stack|sed -e 's/></>n</g'



<ID>2</ID>

<data>asdf</data>

<data2>asdf</data2>

<dataX>asdf</dataX>

<dateAccessed>somedate</dateAccessed>

now you can access the tags you want

answered Oct 18 '17 at 7:08

user256118

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
1
down vote

With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):

#!/usr/bin/env python

# coding: ascii

"""extract.py



Extract everything between two XML tags

in a (possibly poorly formed) XML document."""



from bs4 import BeautifulSoup

import sys



# Set the opening tag name and value

opening_name = "ID"

opening_text = "2"



# Set the closing tag name

closing_name = "dateAccessed"



# Get the XML data from a file and instantiate a BeautifulSoup parser

# We add a root node because the input data is missing a root

with open(sys.argv[1], 'r') as xmlfile:

    xmldoc = "<root>" + xmlfile.read() + "</root>"

    soup = BeautifulSoup(xmldoc, 'xml')



# Iterate through the elements of the XML data and collect

# all of the elements inbetween the opening and closing tags

elements = 

match = False

for e in soup.find_all():

    if match is True:

        elements.append(str(e))

        if e.name==closing_name:

            break

    else:

        try:

            if e.name==opening_name and e.text==opening_text:

                match = True

                elements.append(str(e))

        except AttributeError:

            pass



# Output the results on a single line

print("".join(elements))

You would run it something like this:

python extract.py data.xml

For your given example data:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

It produces the following output:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

answered Jul 11 at 22:23

igal

5,0871031

add a comment |

up vote
1
down vote

With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):

#!/usr/bin/env python

# coding: ascii

"""extract.py



Extract everything between two XML tags

in a (possibly poorly formed) XML document."""



from bs4 import BeautifulSoup

import sys



# Set the opening tag name and value

opening_name = "ID"

opening_text = "2"



# Set the closing tag name

closing_name = "dateAccessed"



# Get the XML data from a file and instantiate a BeautifulSoup parser

# We add a root node because the input data is missing a root

with open(sys.argv[1], 'r') as xmlfile:

    xmldoc = "<root>" + xmlfile.read() + "</root>"

    soup = BeautifulSoup(xmldoc, 'xml')



# Iterate through the elements of the XML data and collect

# all of the elements inbetween the opening and closing tags

elements = 

match = False

for e in soup.find_all():

    if match is True:

        elements.append(str(e))

        if e.name==closing_name:

            break

    else:

        try:

            if e.name==opening_name and e.text==opening_text:

                match = True

                elements.append(str(e))

        except AttributeError:

            pass



# Output the results on a single line

print("".join(elements))

You would run it something like this:

python extract.py data.xml

For your given example data:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

It produces the following output:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

answered Jul 11 at 22:23

igal

5,0871031

add a comment |

up vote
1
down vote

With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):

#!/usr/bin/env python

# coding: ascii

"""extract.py



Extract everything between two XML tags

in a (possibly poorly formed) XML document."""



from bs4 import BeautifulSoup

import sys



# Set the opening tag name and value

opening_name = "ID"

opening_text = "2"



# Set the closing tag name

closing_name = "dateAccessed"



# Get the XML data from a file and instantiate a BeautifulSoup parser

# We add a root node because the input data is missing a root

with open(sys.argv[1], 'r') as xmlfile:

    xmldoc = "<root>" + xmlfile.read() + "</root>"

    soup = BeautifulSoup(xmldoc, 'xml')



# Iterate through the elements of the XML data and collect

# all of the elements inbetween the opening and closing tags

elements = 

match = False

for e in soup.find_all():

    if match is True:

        elements.append(str(e))

        if e.name==closing_name:

            break

    else:

        try:

            if e.name==opening_name and e.text==opening_text:

                match = True

                elements.append(str(e))

        except AttributeError:

            pass



# Output the results on a single line

print("".join(elements))

You would run it something like this:

python extract.py data.xml

For your given example data:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

It produces the following output:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

answered Jul 11 at 22:23

igal

5,0871031

With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):

#!/usr/bin/env python

# coding: ascii

"""extract.py



Extract everything between two XML tags

in a (possibly poorly formed) XML document."""



from bs4 import BeautifulSoup

import sys



# Set the opening tag name and value

opening_name = "ID"

opening_text = "2"



# Set the closing tag name

closing_name = "dateAccessed"



# Get the XML data from a file and instantiate a BeautifulSoup parser

# We add a root node because the input data is missing a root

with open(sys.argv[1], 'r') as xmlfile:

    xmldoc = "<root>" + xmlfile.read() + "</root>"

    soup = BeautifulSoup(xmldoc, 'xml')



# Iterate through the elements of the XML data and collect

# all of the elements inbetween the opening and closing tags

elements = 

match = False

for e in soup.find_all():

    if match is True:

        elements.append(str(e))

        if e.name==closing_name:

            break

    else:

        try:

            if e.name==opening_name and e.text==opening_text:

                match = True

                elements.append(str(e))

        except AttributeError:

            pass



# Output the results on a single line

print("".join(elements))

You would run it something like this:

python extract.py data.xml

For your given example data:

<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

It produces the following output:

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

answered Jul 11 at 22:23

igal

5,0871031

answered Jul 11 at 22:23

igal

5,0871031

answered Jul 11 at 22:23

igal

5,0871031

answered Jul 11 at 22:23

igal

5,0871031

add a comment |

up vote
1
down vote

Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl file.xml

For the given data (modified to insert <root> at the start and </root> at the end), this would return

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:

{ echo '<root>'; cat file.xml; echo '</root>'; } |

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

add a comment |

up vote
1
down vote

Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl file.xml

For the given data (modified to insert <root> at the start and </root> at the end), this would return

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:

{ echo '<root>'; cat file.xml; echo '</root>'; } |

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

add a comment |

up vote
1
down vote

Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl file.xml

For the given data (modified to insert <root> at the start and </root> at the end), this would return

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:

{ echo '<root>'; cat file.xml; echo '</root>'; } |

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

Assuming that the XML document actually has a root tag (your XML does not and is therefore not well formed), then you may use XMLstarlet like this:

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl file.xml

For the given data (modified to insert <root> at the start and </root> at the end), this would return

<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>

The <root> start and end tags could be inserted into the document itself, or be handed to XMLstarlet like so:

{ echo '<root>'; cat file.xml; echo '</root>'; } |

xmlstarlet sel -t -m '//ID[. = 2]' 

    -c . -c './following-sibling::*[position()<5]' -nl

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

edited Nov 23 at 23:29

answered Nov 23 at 23:22

Kusalananda

118k16222360

answered Nov 23 at 23:22

Kusalananda

118k16222360

answered Nov 23 at 23:22

Kusalananda

118k16222360

add a comment |

up vote
-1
down vote

Grep

grep -oE '<data>[^<]*</data>' yourxmlfile

Bash

tag='data'

tL="<$tag>" tR="</$tag>"

xml=$(< yourxmlfile)

while case $xml in *"$tL"* ) :;; * ) break;; esac; do

  t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}

  echo "${tL}${t2}${tR}"

done

Perl

perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile

Sed

sed -e "

  s|<$tag>|n&|

  s/.*n//

  s|</$tag>|&n|

  /n/P;D

" yourxmlfile

Output

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

add a comment |

up vote
-1
down vote

Grep

grep -oE '<data>[^<]*</data>' yourxmlfile

Bash

tag='data'

tL="<$tag>" tR="</$tag>"

xml=$(< yourxmlfile)

while case $xml in *"$tL"* ) :;; * ) break;; esac; do

  t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}

  echo "${tL}${t2}${tR}"

done

Perl

perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile

Sed

sed -e "

  s|<$tag>|n&|

  s/.*n//

  s|</$tag>|&n|

  /n/P;D

" yourxmlfile

Output

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

add a comment |

up vote
-1
down vote

Grep

grep -oE '<data>[^<]*</data>' yourxmlfile

Bash

tag='data'

tL="<$tag>" tR="</$tag>"

xml=$(< yourxmlfile)

while case $xml in *"$tL"* ) :;; * ) break;; esac; do

  t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}

  echo "${tL}${t2}${tR}"

done

Perl

perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile

Sed

sed -e "

  s|<$tag>|n&|

  s/.*n//

  s|</$tag>|&n|

  /n/P;D

" yourxmlfile

Output

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

Grep

grep -oE '<data>[^<]*</data>' yourxmlfile

Bash

tag='data'

tL="<$tag>" tR="</$tag>"

xml=$(< yourxmlfile)

while case $xml in *"$tL"* ) :;; * ) break;; esac; do

  t1=${xml#*"$tL"} t2=${t1%%"$tR"*} xml=${t1#*"$tR"}

  echo "${tL}${t2}${tR}"

done

Perl

perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile

Sed

sed -e "

  s|<$tag>|n&|

  s/.*n//

  s|</$tag>|&n|

  /n/P;D

" yourxmlfile

Output

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

 <data>asdf</data>

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

edited Feb 27 '17 at 4:33

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

answered Feb 27 '17 at 3:48

Rakesh Sharma

62013

add a comment |

up vote
-2
down vote

if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this

awk -F"[<>]" '{print $3}' input.txt

if you want to search for specific tag, then try this awk command. you need to change the value of input=ID

awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

add a comment |

up vote
-2
down vote

if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this

awk -F"[<>]" '{print $3}' input.txt

if you want to search for specific tag, then try this awk command. you need to change the value of input=ID

awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

add a comment |

up vote
-2
down vote

if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this

awk -F"[<>]" '{print $3}' input.txt

if you want to search for specific tag, then try this awk command. you need to change the value of input=ID

awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this

awk -F"[<>]" '{print $3}' input.txt

if you want to search for specific tag, then try this awk command. you need to change the value of input=ID

awk -F"[<>]" '{for(i=1;i<=NF;i++)if($i~input){print $(i+1);next}}' input=ID input.txt

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

answered Feb 27 '17 at 3:32

Kamaraj

2,9161513

add a comment |

up vote
-4
down vote

provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format

Example:-
i have created a file called stack with the given xml.

below is the sed operation to introduce line breaks.

 cat stack|sed -e 's/></>n</g'



<ID>2</ID>

<data>asdf</data>

<data2>asdf</data2>

<dataX>asdf</dataX>

<dateAccessed>somedate</dateAccessed>

now you can access the tags you want

answered Oct 18 '17 at 7:08

user256118

add a comment |

up vote
-4
down vote

provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format

Example:-
i have created a file called stack with the given xml.

below is the sed operation to introduce line breaks.

 cat stack|sed -e 's/></>n</g'



<ID>2</ID>

<data>asdf</data>

<data2>asdf</data2>

<dataX>asdf</dataX>

<dateAccessed>somedate</dateAccessed>

now you can access the tags you want

answered Oct 18 '17 at 7:08

user256118

add a comment |

up vote
-4
down vote

provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format

Example:-
i have created a file called stack with the given xml.

below is the sed operation to introduce line breaks.

 cat stack|sed -e 's/></>n</g'



<ID>2</ID>

<data>asdf</data>

<data2>asdf</data2>

<dataX>asdf</dataX>

<dateAccessed>somedate</dateAccessed>

now you can access the tags you want

answered Oct 18 '17 at 7:08

user256118

provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format

Example:-
i have created a file called stack with the given xml.

below is the sed operation to introduce line breaks.

 cat stack|sed -e 's/></>n</g'



<ID>2</ID>

<data>asdf</data>

<data2>asdf</data2>

<dataX>asdf</dataX>

<dateAccessed>somedate</dateAccessed>

now you can access the tags you want

answered Oct 18 '17 at 7:08

user256118

answered Oct 18 '17 at 7:08

user256118

answered Oct 18 '17 at 7:08

user256118

answered Oct 18 '17 at 7:08

user256118

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk

How to extract data between two different xml tags

5 Answers
5

Grep

Bash

Perl

Sed

Output

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

How to extract data between two different xml tags

5 Answers 5

Grep

Bash

Perl

Sed

Output

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Grep

Bash

Perl

Sed

Output

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

5 Answers
5

5 Answers
5

5 Answers
5