How to extract between 2 strings when file contains multiple symbols
up vote
0
down vote
favorite
I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.
I have this consistent part of the log:
Machine info and user info blah blah blah [senderID=60,
ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
servicesList= | BeatController | BeatMaker | WaveShow, client=apache,
all lines appear like this.
From this line I need to make it look like this:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13],
serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,
*Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"
I've tried this command from a post here,
sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'
but it only prints out
servicesList= | BeatController | BeatMaker | WaveShow
I have tried to modify it in some iterations with regex, played with grep and sed but no progress
please assist :)
linux text-processing sed grep
New contributor
add a comment |
up vote
0
down vote
favorite
I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.
I have this consistent part of the log:
Machine info and user info blah blah blah [senderID=60,
ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
servicesList= | BeatController | BeatMaker | WaveShow, client=apache,
all lines appear like this.
From this line I need to make it look like this:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13],
serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,
*Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"
I've tried this command from a post here,
sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'
but it only prints out
servicesList= | BeatController | BeatMaker | WaveShow
I have tried to modify it in some iterations with regex, played with grep and sed but no progress
please assist :)
linux text-processing sed grep
New contributor
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.
I have this consistent part of the log:
Machine info and user info blah blah blah [senderID=60,
ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
servicesList= | BeatController | BeatMaker | WaveShow, client=apache,
all lines appear like this.
From this line I need to make it look like this:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13],
serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,
*Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"
I've tried this command from a post here,
sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'
but it only prints out
servicesList= | BeatController | BeatMaker | WaveShow
I have tried to modify it in some iterations with regex, played with grep and sed but no progress
please assist :)
linux text-processing sed grep
New contributor
I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.
I have this consistent part of the log:
Machine info and user info blah blah blah [senderID=60,
ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
servicesList= | BeatController | BeatMaker | WaveShow, client=apache,
all lines appear like this.
From this line I need to make it look like this:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13],
serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,
*Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"
I've tried this command from a post here,
sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'
but it only prints out
servicesList= | BeatController | BeatMaker | WaveShow
I have tried to modify it in some iterations with regex, played with grep and sed but no progress
please assist :)
linux text-processing sed grep
linux text-processing sed grep
New contributor
New contributor
edited Nov 14 at 10:46
ctrl-alt-delor
9,87031954
9,87031954
New contributor
asked Nov 14 at 10:14
dtuaev25
102
102
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
If what you are trying to do is output everything between and including senderID=
and WaveShow,
, then you need this sed
command:
sed -n 's/.*(senderID=.*WaveShow,).*/1/p'
This will capture everything between those two strings using the (
and )
brackets and output it using 1
(and 2
etc. if you have more captures).
Note that the leading .*
is "greedy", meaning that if you have the string senderID=
twice in the input, then the first one will be discarded. If this is not what you want, then sed
is not the correct tool; perl
can handle this. The command then becomes:
perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'
-n
means "execute a loop for each line of input, and don't print the line at the end of the loop". -e
specifies the expression to execute inside the loop.
The ?
after the .*
changes the *
to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1
for the first capture, $2
for the second, etc.
However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:
perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'
There are probably many more ways of doing this in perl, perhaps even more efficiently.
add a comment |
up vote
0
down vote
Is the trailing comma required?
If not, this should work:
grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5
Output:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
If what you are trying to do is output everything between and including senderID=
and WaveShow,
, then you need this sed
command:
sed -n 's/.*(senderID=.*WaveShow,).*/1/p'
This will capture everything between those two strings using the (
and )
brackets and output it using 1
(and 2
etc. if you have more captures).
Note that the leading .*
is "greedy", meaning that if you have the string senderID=
twice in the input, then the first one will be discarded. If this is not what you want, then sed
is not the correct tool; perl
can handle this. The command then becomes:
perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'
-n
means "execute a loop for each line of input, and don't print the line at the end of the loop". -e
specifies the expression to execute inside the loop.
The ?
after the .*
changes the *
to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1
for the first capture, $2
for the second, etc.
However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:
perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'
There are probably many more ways of doing this in perl, perhaps even more efficiently.
add a comment |
up vote
0
down vote
accepted
If what you are trying to do is output everything between and including senderID=
and WaveShow,
, then you need this sed
command:
sed -n 's/.*(senderID=.*WaveShow,).*/1/p'
This will capture everything between those two strings using the (
and )
brackets and output it using 1
(and 2
etc. if you have more captures).
Note that the leading .*
is "greedy", meaning that if you have the string senderID=
twice in the input, then the first one will be discarded. If this is not what you want, then sed
is not the correct tool; perl
can handle this. The command then becomes:
perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'
-n
means "execute a loop for each line of input, and don't print the line at the end of the loop". -e
specifies the expression to execute inside the loop.
The ?
after the .*
changes the *
to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1
for the first capture, $2
for the second, etc.
However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:
perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'
There are probably many more ways of doing this in perl, perhaps even more efficiently.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
If what you are trying to do is output everything between and including senderID=
and WaveShow,
, then you need this sed
command:
sed -n 's/.*(senderID=.*WaveShow,).*/1/p'
This will capture everything between those two strings using the (
and )
brackets and output it using 1
(and 2
etc. if you have more captures).
Note that the leading .*
is "greedy", meaning that if you have the string senderID=
twice in the input, then the first one will be discarded. If this is not what you want, then sed
is not the correct tool; perl
can handle this. The command then becomes:
perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'
-n
means "execute a loop for each line of input, and don't print the line at the end of the loop". -e
specifies the expression to execute inside the loop.
The ?
after the .*
changes the *
to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1
for the first capture, $2
for the second, etc.
However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:
perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'
There are probably many more ways of doing this in perl, perhaps even more efficiently.
If what you are trying to do is output everything between and including senderID=
and WaveShow,
, then you need this sed
command:
sed -n 's/.*(senderID=.*WaveShow,).*/1/p'
This will capture everything between those two strings using the (
and )
brackets and output it using 1
(and 2
etc. if you have more captures).
Note that the leading .*
is "greedy", meaning that if you have the string senderID=
twice in the input, then the first one will be discarded. If this is not what you want, then sed
is not the correct tool; perl
can handle this. The command then becomes:
perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'
-n
means "execute a loop for each line of input, and don't print the line at the end of the loop". -e
specifies the expression to execute inside the loop.
The ?
after the .*
changes the *
to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1
for the first capture, $2
for the second, etc.
However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:
perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'
There are probably many more ways of doing this in perl, perhaps even more efficiently.
answered 2 days ago
wurtel
9,59511325
9,59511325
add a comment |
add a comment |
up vote
0
down vote
Is the trailing comma required?
If not, this should work:
grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5
Output:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow
add a comment |
up vote
0
down vote
Is the trailing comma required?
If not, this should work:
grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5
Output:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow
add a comment |
up vote
0
down vote
up vote
0
down vote
Is the trailing comma required?
If not, this should work:
grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5
Output:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow
Is the trailing comma required?
If not, this should work:
grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5
Output:
senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow
answered 2 days ago
Panki
41019
41019
add a comment |
add a comment |
dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.
dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.
dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.
dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481665%2fhow-to-extract-between-2-strings-when-file-contains-multiple-symbols%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown