From gpx to csv file
up vote
0
down vote
favorite
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time</wpt>
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt>
<wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt>
I have a file like having the above lines which needed to be converted into
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
linux awk sed xml
add a comment |
up vote
0
down vote
favorite
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time</wpt>
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt>
<wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt>
I have a file like having the above lines which needed to be converted into
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
linux awk sed xml
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time</wpt>
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt>
<wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt>
I have a file like having the above lines which needed to be converted into
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
linux awk sed xml
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time</wpt>
<wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt>
<wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt>
I have a file like having the above lines which needed to be converted into
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
linux awk sed xml
linux awk sed xml
edited Nov 24 at 19:41
Rui F Ribeiro
38.3k1476127
38.3k1476127
asked Feb 9 '17 at 6:00
RKR
23219
23219
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
up vote
1
down vote
accepted
You could use sed to strip out the characters you don't want:
sed 's/[^0-9.T:-]+/,/g;s/T/ /;s/^,|,$//g' file
s/[^0-9.T:-]+/,/g is replacing unwanted characters with a comma
s/T/ / is replacing the character T with a space
s/^,|,$//g is removing the first and last comma
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
Sorry, I know this is technically correct, but ...sedis a really awful tool for XML parsing.XMLis contextual, and regular expressions are not.
– Sobrique
Feb 10 '17 at 8:51
|
show 1 more comment
up vote
3
down vote
GPX is an XML format, so you can't use awk or sed to parse it reliably.
Instead, use something like XMLStarlet:
$ xml sel -t -m '//wpt'
-v '@lat' -o ',' -v '@lon' -o ','
-v 'time' -nl data.gpx
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.3982529841,103.90877152,2010-01-01T00:00:00Z
Alternatively:
$ xml sel -t -m '//wpt' -v 'concat(@lat, ",", @lon, ",", time)' -nl data.wpx
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes bexmlstarletrather than justxml.
– Kusalananda
Feb 9 '17 at 10:32
add a comment |
up vote
1
down vote
this answer is Based on the input given...
awk -F"[<>"]" '{print $3,$5,$9}' OFS=, input.txt | sed "s/[TZ]/ /g"
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
awk -F"[<>"]" '{gsub(/T|Z/," ",$9);print $3,$5,$9}' OFS=, input.txt
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
add a comment |
up vote
1
down vote
Please, please - don't use a regular expression based solution, like awk or sed.
XML is contextual, where regular expressions are not - so they can NEVER work properly, they're only at best a bit of a hack.
But XML does have a solution to this problem - it's called xpath, that lets you 'search' in a contextual way.
So to take your example:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile('your_file.xml');
foreach my $wpt ( $xml -> get_xpath('//wpt') ) {
print join ",", $wpt -> att('lat'),
$wpt -> att('lon'),
$wpt -> first_child_text('time'), "n";
}
Which gives the desired result, but it will also handle a variety of otherwise perfectly valid and semantically identical forms of your XML.
Like indented:
<xml>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.3982529841" lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
All on a single line:
<xml><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt></xml>
Another style of indenting:
<xml>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.3982529841"
lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
Or even:
<xml
><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.3982529841"
lon="103.90877152"
><time
>2010-01-01T00:00:00Z</time></wpt></xml>
These are all semantically identical, and should be parsed the same way. Hopefully it's fairly clear that a regular expression to do this is a LOT more complicated than just using an XML parser.
For the sake of being concise though:
perl -MXML::Twig -0777 -e 'XML::Twig->new(twig_handlers=>{wpt=>sub{print join ",", $_->att("lat", $_->att("lon"),$_->first_child_text("time"), "n" }})->parse(<>)'
add a comment |
up vote
0
down vote
Assuming f.xml is our input (a valid xml):
$ perl -MXML::DT -E 'dt("f.xml",
time=>sub{$a=father;
$c =~ s/[TZ]/ /g;
say "$a->{lat},$a->{lon},$c"}
)'
-MXML::DTload XML::DT module (xml down translator)
dt( file, time => sub{....}): parse file and every time we see atimeexecute the correspondent sub
$a=father: get the attributes from father
$c: is the current element content
Warning: I am one of the authors of XML::DT (install with cpan XML::DT)
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could use sed to strip out the characters you don't want:
sed 's/[^0-9.T:-]+/,/g;s/T/ /;s/^,|,$//g' file
s/[^0-9.T:-]+/,/g is replacing unwanted characters with a comma
s/T/ / is replacing the character T with a space
s/^,|,$//g is removing the first and last comma
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
Sorry, I know this is technically correct, but ...sedis a really awful tool for XML parsing.XMLis contextual, and regular expressions are not.
– Sobrique
Feb 10 '17 at 8:51
|
show 1 more comment
up vote
1
down vote
accepted
You could use sed to strip out the characters you don't want:
sed 's/[^0-9.T:-]+/,/g;s/T/ /;s/^,|,$//g' file
s/[^0-9.T:-]+/,/g is replacing unwanted characters with a comma
s/T/ / is replacing the character T with a space
s/^,|,$//g is removing the first and last comma
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
Sorry, I know this is technically correct, but ...sedis a really awful tool for XML parsing.XMLis contextual, and regular expressions are not.
– Sobrique
Feb 10 '17 at 8:51
|
show 1 more comment
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could use sed to strip out the characters you don't want:
sed 's/[^0-9.T:-]+/,/g;s/T/ /;s/^,|,$//g' file
s/[^0-9.T:-]+/,/g is replacing unwanted characters with a comma
s/T/ / is replacing the character T with a space
s/^,|,$//g is removing the first and last comma
You could use sed to strip out the characters you don't want:
sed 's/[^0-9.T:-]+/,/g;s/T/ /;s/^,|,$//g' file
s/[^0-9.T:-]+/,/g is replacing unwanted characters with a comma
s/T/ / is replacing the character T with a space
s/^,|,$//g is removing the first and last comma
edited Feb 10 '17 at 6:56
answered Feb 9 '17 at 8:43
oliv
1,651311
1,651311
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
Sorry, I know this is technically correct, but ...sedis a really awful tool for XML parsing.XMLis contextual, and regular expressions are not.
– Sobrique
Feb 10 '17 at 8:51
|
show 1 more comment
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
Sorry, I know this is technically correct, but ...sedis a really awful tool for XML parsing.XMLis contextual, and regular expressions are not.
– Sobrique
Feb 10 '17 at 8:51
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
Yours is ok but comma will not come.We have to add it by ourselves
– RKR
Feb 9 '17 at 10:29
1
1
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
didn't notice the comma, answer updated
– oliv
Feb 9 '17 at 10:46
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
oliv everything is fine but instead of 103.75... in the second column ,I am getting 103,75.(For everything in column 2) Anything wrong in the code ?
– RKR
Feb 10 '17 at 1:57
1
1
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
look at the updated answer, it should the solve the comma/dot notation
– oliv
Feb 10 '17 at 6:57
1
1
Sorry, I know this is technically correct, but ...
sed is a really awful tool for XML parsing. XML is contextual, and regular expressions are not.– Sobrique
Feb 10 '17 at 8:51
Sorry, I know this is technically correct, but ...
sed is a really awful tool for XML parsing. XML is contextual, and regular expressions are not.– Sobrique
Feb 10 '17 at 8:51
|
show 1 more comment
up vote
3
down vote
GPX is an XML format, so you can't use awk or sed to parse it reliably.
Instead, use something like XMLStarlet:
$ xml sel -t -m '//wpt'
-v '@lat' -o ',' -v '@lon' -o ','
-v 'time' -nl data.gpx
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.3982529841,103.90877152,2010-01-01T00:00:00Z
Alternatively:
$ xml sel -t -m '//wpt' -v 'concat(@lat, ",", @lon, ",", time)' -nl data.wpx
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes bexmlstarletrather than justxml.
– Kusalananda
Feb 9 '17 at 10:32
add a comment |
up vote
3
down vote
GPX is an XML format, so you can't use awk or sed to parse it reliably.
Instead, use something like XMLStarlet:
$ xml sel -t -m '//wpt'
-v '@lat' -o ',' -v '@lon' -o ','
-v 'time' -nl data.gpx
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.3982529841,103.90877152,2010-01-01T00:00:00Z
Alternatively:
$ xml sel -t -m '//wpt' -v 'concat(@lat, ",", @lon, ",", time)' -nl data.wpx
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes bexmlstarletrather than justxml.
– Kusalananda
Feb 9 '17 at 10:32
add a comment |
up vote
3
down vote
up vote
3
down vote
GPX is an XML format, so you can't use awk or sed to parse it reliably.
Instead, use something like XMLStarlet:
$ xml sel -t -m '//wpt'
-v '@lat' -o ',' -v '@lon' -o ','
-v 'time' -nl data.gpx
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.3982529841,103.90877152,2010-01-01T00:00:00Z
Alternatively:
$ xml sel -t -m '//wpt' -v 'concat(@lat, ",", @lon, ",", time)' -nl data.wpx
GPX is an XML format, so you can't use awk or sed to parse it reliably.
Instead, use something like XMLStarlet:
$ xml sel -t -m '//wpt'
-v '@lat' -o ',' -v '@lon' -o ','
-v 'time' -nl data.gpx
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.345529841,103.7577152,2010-01-01T00:00:00Z
1.3982529841,103.90877152,2010-01-01T00:00:00Z
Alternatively:
$ xml sel -t -m '//wpt' -v 'concat(@lat, ",", @lon, ",", time)' -nl data.wpx
edited Feb 9 '17 at 9:49
answered Feb 9 '17 at 9:38
Kusalananda
118k16223361
118k16223361
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes bexmlstarletrather than justxml.
– Kusalananda
Feb 9 '17 at 10:32
add a comment |
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes bexmlstarletrather than justxml.
– Kusalananda
Feb 9 '17 at 10:32
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
No command xml found.Should I download anything?
– RKR
Feb 9 '17 at 10:31
2
2
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes be
xmlstarlet rather than just xml.– Kusalananda
Feb 9 '17 at 10:32
@RKR I'm using XMLStarlet. It's likely available as a package for your Linux. The command may sometimes be
xmlstarlet rather than just xml.– Kusalananda
Feb 9 '17 at 10:32
add a comment |
up vote
1
down vote
this answer is Based on the input given...
awk -F"[<>"]" '{print $3,$5,$9}' OFS=, input.txt | sed "s/[TZ]/ /g"
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
awk -F"[<>"]" '{gsub(/T|Z/," ",$9);print $3,$5,$9}' OFS=, input.txt
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
add a comment |
up vote
1
down vote
this answer is Based on the input given...
awk -F"[<>"]" '{print $3,$5,$9}' OFS=, input.txt | sed "s/[TZ]/ /g"
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
awk -F"[<>"]" '{gsub(/T|Z/," ",$9);print $3,$5,$9}' OFS=, input.txt
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
add a comment |
up vote
1
down vote
up vote
1
down vote
this answer is Based on the input given...
awk -F"[<>"]" '{print $3,$5,$9}' OFS=, input.txt | sed "s/[TZ]/ /g"
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
awk -F"[<>"]" '{gsub(/T|Z/," ",$9);print $3,$5,$9}' OFS=, input.txt
this answer is Based on the input given...
awk -F"[<>"]" '{print $3,$5,$9}' OFS=, input.txt | sed "s/[TZ]/ /g"
1.345529841,103.7577152,2010-01-01 00:00:00
1.345529841,103.7577152,2010-01-01 00:00:00
1.3982529841,103.90877152,2010-01-01 00:00:00
awk -F"[<>"]" '{gsub(/T|Z/," ",$9);print $3,$5,$9}' OFS=, input.txt
edited Feb 9 '17 at 6:58
answered Feb 9 '17 at 6:51
Kamaraj
2,9161513
2,9161513
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
add a comment |
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
Itruns for quite a long time and does not seems to be finishing even for small file
– RKR
Feb 9 '17 at 10:30
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
what is your OS ? provide the exact command you typed in your terminal
– Kamaraj
Feb 9 '17 at 23:27
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
seems, you missed some double quotes in the command and it expects to close and waits for long time...
– Kamaraj
Feb 9 '17 at 23:47
add a comment |
up vote
1
down vote
Please, please - don't use a regular expression based solution, like awk or sed.
XML is contextual, where regular expressions are not - so they can NEVER work properly, they're only at best a bit of a hack.
But XML does have a solution to this problem - it's called xpath, that lets you 'search' in a contextual way.
So to take your example:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile('your_file.xml');
foreach my $wpt ( $xml -> get_xpath('//wpt') ) {
print join ",", $wpt -> att('lat'),
$wpt -> att('lon'),
$wpt -> first_child_text('time'), "n";
}
Which gives the desired result, but it will also handle a variety of otherwise perfectly valid and semantically identical forms of your XML.
Like indented:
<xml>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.3982529841" lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
All on a single line:
<xml><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt></xml>
Another style of indenting:
<xml>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.3982529841"
lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
Or even:
<xml
><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.3982529841"
lon="103.90877152"
><time
>2010-01-01T00:00:00Z</time></wpt></xml>
These are all semantically identical, and should be parsed the same way. Hopefully it's fairly clear that a regular expression to do this is a LOT more complicated than just using an XML parser.
For the sake of being concise though:
perl -MXML::Twig -0777 -e 'XML::Twig->new(twig_handlers=>{wpt=>sub{print join ",", $_->att("lat", $_->att("lon"),$_->first_child_text("time"), "n" }})->parse(<>)'
add a comment |
up vote
1
down vote
Please, please - don't use a regular expression based solution, like awk or sed.
XML is contextual, where regular expressions are not - so they can NEVER work properly, they're only at best a bit of a hack.
But XML does have a solution to this problem - it's called xpath, that lets you 'search' in a contextual way.
So to take your example:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile('your_file.xml');
foreach my $wpt ( $xml -> get_xpath('//wpt') ) {
print join ",", $wpt -> att('lat'),
$wpt -> att('lon'),
$wpt -> first_child_text('time'), "n";
}
Which gives the desired result, but it will also handle a variety of otherwise perfectly valid and semantically identical forms of your XML.
Like indented:
<xml>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.3982529841" lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
All on a single line:
<xml><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt></xml>
Another style of indenting:
<xml>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.3982529841"
lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
Or even:
<xml
><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.3982529841"
lon="103.90877152"
><time
>2010-01-01T00:00:00Z</time></wpt></xml>
These are all semantically identical, and should be parsed the same way. Hopefully it's fairly clear that a regular expression to do this is a LOT more complicated than just using an XML parser.
For the sake of being concise though:
perl -MXML::Twig -0777 -e 'XML::Twig->new(twig_handlers=>{wpt=>sub{print join ",", $_->att("lat", $_->att("lon"),$_->first_child_text("time"), "n" }})->parse(<>)'
add a comment |
up vote
1
down vote
up vote
1
down vote
Please, please - don't use a regular expression based solution, like awk or sed.
XML is contextual, where regular expressions are not - so they can NEVER work properly, they're only at best a bit of a hack.
But XML does have a solution to this problem - it's called xpath, that lets you 'search' in a contextual way.
So to take your example:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile('your_file.xml');
foreach my $wpt ( $xml -> get_xpath('//wpt') ) {
print join ",", $wpt -> att('lat'),
$wpt -> att('lon'),
$wpt -> first_child_text('time'), "n";
}
Which gives the desired result, but it will also handle a variety of otherwise perfectly valid and semantically identical forms of your XML.
Like indented:
<xml>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.3982529841" lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
All on a single line:
<xml><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt></xml>
Another style of indenting:
<xml>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.3982529841"
lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
Or even:
<xml
><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.3982529841"
lon="103.90877152"
><time
>2010-01-01T00:00:00Z</time></wpt></xml>
These are all semantically identical, and should be parsed the same way. Hopefully it's fairly clear that a regular expression to do this is a LOT more complicated than just using an XML parser.
For the sake of being concise though:
perl -MXML::Twig -0777 -e 'XML::Twig->new(twig_handlers=>{wpt=>sub{print join ",", $_->att("lat", $_->att("lon"),$_->first_child_text("time"), "n" }})->parse(<>)'
Please, please - don't use a regular expression based solution, like awk or sed.
XML is contextual, where regular expressions are not - so they can NEVER work properly, they're only at best a bit of a hack.
But XML does have a solution to this problem - it's called xpath, that lets you 'search' in a contextual way.
So to take your example:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
my $xml = XML::Twig -> new -> parsefile('your_file.xml');
foreach my $wpt ( $xml -> get_xpath('//wpt') ) {
print join ",", $wpt -> att('lat'),
$wpt -> att('lon'),
$wpt -> first_child_text('time'), "n";
}
Which gives the desired result, but it will also handle a variety of otherwise perfectly valid and semantically identical forms of your XML.
Like indented:
<xml>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.345529841" lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt lat="1.3982529841" lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
All on a single line:
<xml><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.345529841" lon="103.7577152"><time>2010-01-01T00:00:00Z</time></wpt><wpt lat="1.3982529841" lon="103.90877152"><time>2010-01-01T00:00:00Z</time></wpt></xml>
Another style of indenting:
<xml>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.345529841"
lon="103.7577152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
<wpt
lat="1.3982529841"
lon="103.90877152">
<time>2010-01-01T00:00:00Z</time>
</wpt>
</xml>
Or even:
<xml
><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.345529841"
lon="103.7577152"
><time
>2010-01-01T00:00:00Z</time></wpt><wpt
lat="1.3982529841"
lon="103.90877152"
><time
>2010-01-01T00:00:00Z</time></wpt></xml>
These are all semantically identical, and should be parsed the same way. Hopefully it's fairly clear that a regular expression to do this is a LOT more complicated than just using an XML parser.
For the sake of being concise though:
perl -MXML::Twig -0777 -e 'XML::Twig->new(twig_handlers=>{wpt=>sub{print join ",", $_->att("lat", $_->att("lon"),$_->first_child_text("time"), "n" }})->parse(<>)'
edited May 23 '17 at 12:40
Community♦
1
1
answered Feb 10 '17 at 9:08
Sobrique
3,759517
3,759517
add a comment |
add a comment |
up vote
0
down vote
Assuming f.xml is our input (a valid xml):
$ perl -MXML::DT -E 'dt("f.xml",
time=>sub{$a=father;
$c =~ s/[TZ]/ /g;
say "$a->{lat},$a->{lon},$c"}
)'
-MXML::DTload XML::DT module (xml down translator)
dt( file, time => sub{....}): parse file and every time we see atimeexecute the correspondent sub
$a=father: get the attributes from father
$c: is the current element content
Warning: I am one of the authors of XML::DT (install with cpan XML::DT)
add a comment |
up vote
0
down vote
Assuming f.xml is our input (a valid xml):
$ perl -MXML::DT -E 'dt("f.xml",
time=>sub{$a=father;
$c =~ s/[TZ]/ /g;
say "$a->{lat},$a->{lon},$c"}
)'
-MXML::DTload XML::DT module (xml down translator)
dt( file, time => sub{....}): parse file and every time we see atimeexecute the correspondent sub
$a=father: get the attributes from father
$c: is the current element content
Warning: I am one of the authors of XML::DT (install with cpan XML::DT)
add a comment |
up vote
0
down vote
up vote
0
down vote
Assuming f.xml is our input (a valid xml):
$ perl -MXML::DT -E 'dt("f.xml",
time=>sub{$a=father;
$c =~ s/[TZ]/ /g;
say "$a->{lat},$a->{lon},$c"}
)'
-MXML::DTload XML::DT module (xml down translator)
dt( file, time => sub{....}): parse file and every time we see atimeexecute the correspondent sub
$a=father: get the attributes from father
$c: is the current element content
Warning: I am one of the authors of XML::DT (install with cpan XML::DT)
Assuming f.xml is our input (a valid xml):
$ perl -MXML::DT -E 'dt("f.xml",
time=>sub{$a=father;
$c =~ s/[TZ]/ /g;
say "$a->{lat},$a->{lon},$c"}
)'
-MXML::DTload XML::DT module (xml down translator)
dt( file, time => sub{....}): parse file and every time we see atimeexecute the correspondent sub
$a=father: get the attributes from father
$c: is the current element content
Warning: I am one of the authors of XML::DT (install with cpan XML::DT)
edited Feb 10 '17 at 11:52
answered Feb 10 '17 at 11:45
JJoao
6,9841827
6,9841827
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f343636%2ffrom-gpx-to-csv-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown