How to extract between 2 strings when file contains multiple symbols











up vote
0
down vote

favorite












I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.

I have this consistent part of the log:



Machine info and user info blah blah blah [senderID=60, 
ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
servicesList= | BeatController | BeatMaker | WaveShow, client=apache,


all lines appear like this.

From this line I need to make it look like this:



senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], 
serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,


*Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"



I've tried this command from a post here,



sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'



but it only prints out



servicesList= | BeatController | BeatMaker | WaveShow



I have tried to modify it in some iterations with regex, played with grep and sed but no progress

please assist :)










share|improve this question









New contributor




dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    0
    down vote

    favorite












    I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.

    I have this consistent part of the log:



    Machine info and user info blah blah blah [senderID=60, 
    ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
    servicesList= | BeatController | BeatMaker | WaveShow, client=apache,


    all lines appear like this.

    From this line I need to make it look like this:



    senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], 
    serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,


    *Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"



    I've tried this command from a post here,



    sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'



    but it only prints out



    servicesList= | BeatController | BeatMaker | WaveShow



    I have tried to modify it in some iterations with regex, played with grep and sed but no progress

    please assist :)










    share|improve this question









    New contributor




    dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.

      I have this consistent part of the log:



      Machine info and user info blah blah blah [senderID=60, 
      ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
      servicesList= | BeatController | BeatMaker | WaveShow, client=apache,


      all lines appear like this.

      From this line I need to make it look like this:



      senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], 
      serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,


      *Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"



      I've tried this command from a post here,



      sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'



      but it only prints out



      servicesList= | BeatController | BeatMaker | WaveShow



      I have tried to modify it in some iterations with regex, played with grep and sed but no progress

      please assist :)










      share|improve this question









      New contributor




      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I've been trying to extract form data, from a huge file. I need a very specific pattern which so far fails me.

      I have this consistent part of the log:



      Machine info and user info blah blah blah [senderID=60, 
      ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111,
      servicesList= | BeatController | BeatMaker | WaveShow, client=apache,


      all lines appear like this.

      From this line I need to make it look like this:



      senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], 
      serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow,


      *Note, everything after the "WaveShow," is irrelevant, as is everything before "senderID"



      I've tried this command from a post here,



      sed -n '/servicesList=/{s/.*servicesList=//;s/S*=.*//;p}'



      but it only prints out



      servicesList= | BeatController | BeatMaker | WaveShow



      I have tried to modify it in some iterations with regex, played with grep and sed but no progress

      please assist :)







      linux text-processing sed grep






      share|improve this question









      New contributor




      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Nov 14 at 10:46









      ctrl-alt-delor

      9,87031954




      9,87031954






      New contributor




      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Nov 14 at 10:14









      dtuaev25

      102




      102




      New contributor




      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      dtuaev25 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          If what you are trying to do is output everything between and including senderID= and WaveShow,, then you need this sed command:



          sed -n 's/.*(senderID=.*WaveShow,).*/1/p'


          This will capture everything between those two strings using the ( and ) brackets and output it using 1 (and 2 etc. if you have more captures).



          Note that the leading .* is "greedy", meaning that if you have the string senderID= twice in the input, then the first one will be discarded. If this is not what you want, then sed is not the correct tool; perl can handle this. The command then becomes:



          perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'


          -n means "execute a loop for each line of input, and don't print the line at the end of the loop". -e specifies the expression to execute inside the loop.



          The ? after the .* changes the * to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1 for the first capture, $2 for the second, etc.



          However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:



          perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'


          There are probably many more ways of doing this in perl, perhaps even more efficiently.






          share|improve this answer




























            up vote
            0
            down vote













            Is the trailing comma required?



            If not, this should work:



            grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5



            Output:



            senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow






            share|improve this answer





















              Your Answer








              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "106"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.










               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481665%2fhow-to-extract-between-2-strings-when-file-contains-multiple-symbols%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              0
              down vote



              accepted










              If what you are trying to do is output everything between and including senderID= and WaveShow,, then you need this sed command:



              sed -n 's/.*(senderID=.*WaveShow,).*/1/p'


              This will capture everything between those two strings using the ( and ) brackets and output it using 1 (and 2 etc. if you have more captures).



              Note that the leading .* is "greedy", meaning that if you have the string senderID= twice in the input, then the first one will be discarded. If this is not what you want, then sed is not the correct tool; perl can handle this. The command then becomes:



              perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'


              -n means "execute a loop for each line of input, and don't print the line at the end of the loop". -e specifies the expression to execute inside the loop.



              The ? after the .* changes the * to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1 for the first capture, $2 for the second, etc.



              However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:



              perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'


              There are probably many more ways of doing this in perl, perhaps even more efficiently.






              share|improve this answer

























                up vote
                0
                down vote



                accepted










                If what you are trying to do is output everything between and including senderID= and WaveShow,, then you need this sed command:



                sed -n 's/.*(senderID=.*WaveShow,).*/1/p'


                This will capture everything between those two strings using the ( and ) brackets and output it using 1 (and 2 etc. if you have more captures).



                Note that the leading .* is "greedy", meaning that if you have the string senderID= twice in the input, then the first one will be discarded. If this is not what you want, then sed is not the correct tool; perl can handle this. The command then becomes:



                perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'


                -n means "execute a loop for each line of input, and don't print the line at the end of the loop". -e specifies the expression to execute inside the loop.



                The ? after the .* changes the * to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1 for the first capture, $2 for the second, etc.



                However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:



                perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'


                There are probably many more ways of doing this in perl, perhaps even more efficiently.






                share|improve this answer























                  up vote
                  0
                  down vote



                  accepted







                  up vote
                  0
                  down vote



                  accepted






                  If what you are trying to do is output everything between and including senderID= and WaveShow,, then you need this sed command:



                  sed -n 's/.*(senderID=.*WaveShow,).*/1/p'


                  This will capture everything between those two strings using the ( and ) brackets and output it using 1 (and 2 etc. if you have more captures).



                  Note that the leading .* is "greedy", meaning that if you have the string senderID= twice in the input, then the first one will be discarded. If this is not what you want, then sed is not the correct tool; perl can handle this. The command then becomes:



                  perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'


                  -n means "execute a loop for each line of input, and don't print the line at the end of the loop". -e specifies the expression to execute inside the loop.



                  The ? after the .* changes the * to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1 for the first capture, $2 for the second, etc.



                  However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:



                  perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'


                  There are probably many more ways of doing this in perl, perhaps even more efficiently.






                  share|improve this answer












                  If what you are trying to do is output everything between and including senderID= and WaveShow,, then you need this sed command:



                  sed -n 's/.*(senderID=.*WaveShow,).*/1/p'


                  This will capture everything between those two strings using the ( and ) brackets and output it using 1 (and 2 etc. if you have more captures).



                  Note that the leading .* is "greedy", meaning that if you have the string senderID= twice in the input, then the first one will be discarded. If this is not what you want, then sed is not the correct tool; perl can handle this. The command then becomes:



                  perl -ne 'print if s/.*?(senderID=.*WaveShow,).*/$1/'


                  -n means "execute a loop for each line of input, and don't print the line at the end of the loop". -e specifies the expression to execute inside the loop.



                  The ? after the .* changes the * to match at little as possible (i.e. match non-greedily). The brackets cause perl to group that part and to capture it, which then can be used as $1 for the first capture, $2 for the second, etc.



                  However that is not optimal way of doing it in perl. This is a lot better as it does not involve changing strings needlessly, capturing the text and printing just that:



                  perl -ne 'print "$1n" if /(senderID=.*WaveShow,)/'


                  There are probably many more ways of doing this in perl, perhaps even more efficiently.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 days ago









                  wurtel

                  9,59511325




                  9,59511325
























                      up vote
                      0
                      down vote













                      Is the trailing comma required?



                      If not, this should work:



                      grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5



                      Output:



                      senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow






                      share|improve this answer

























                        up vote
                        0
                        down vote













                        Is the trailing comma required?



                        If not, this should work:



                        grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5



                        Output:



                        senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow






                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Is the trailing comma required?



                          If not, this should work:



                          grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5



                          Output:



                          senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow






                          share|improve this answer












                          Is the trailing comma required?



                          If not, this should work:



                          grep senderID filename | cut -d '[' -f 2- | cut -d ',' -f -5



                          Output:



                          senderID=60, ipaddress=/10.1.1.11:8443, serviceIdinList=[13], serviceBitbox=11111, servicesList= | BeatController | BeatMaker | WaveShow







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 2 days ago









                          Panki

                          41019




                          41019






















                              dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.










                               

                              draft saved


                              draft discarded


















                              dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.













                              dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.












                              dtuaev25 is a new contributor. Be nice, and check out our Code of Conduct.















                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481665%2fhow-to-extract-between-2-strings-when-file-contains-multiple-symbols%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Morgemoulin

                              Scott Moir

                              Souastre