How to extract duplicate numbers from a log file? [closed]












-1














I have some log files in one of my server having the below mentioned log entries.



FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2



I wanted to extract the following information from these logs:



1- Extract all SubscriberNumber from the log files.



2- Then find the SubscriberNumbers which have multiple occurrences in the logs.










share|improve this question















closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.




















    -1














    I have some log files in one of my server having the below mentioned log entries.



    FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
    d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
    35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2



    I wanted to extract the following information from these logs:



    1- Extract all SubscriberNumber from the log files.



    2- Then find the SubscriberNumbers which have multiple occurrences in the logs.










    share|improve this question















    closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31


    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















      -1












      -1








      -1







      I have some log files in one of my server having the below mentioned log entries.



      FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
      d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
      35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2



      I wanted to extract the following information from these logs:



      1- Extract all SubscriberNumber from the log files.



      2- Then find the SubscriberNumbers which have multiple occurrences in the logs.










      share|improve this question















      I have some log files in one of my server having the below mentioned log entries.



      FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
      d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
      35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2



      I wanted to extract the following information from these logs:



      1- Extract all SubscriberNumber from the log files.



      2- Then find the SubscriberNumbers which have multiple occurrences in the logs.







      linux shell






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 7:54









      Rui F Ribeiro

      39.4k1479131




      39.4k1479131










      asked Dec 30 '18 at 5:41









      Jack AndersonJack Anderson

      1




      1




      closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






      closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You could use:



          grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd




          • grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from your logfile;


          • sort -n sorts them numerically, and


          • uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.






          share|improve this answer























          • Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
            – Jack Anderson
            Dec 30 '18 at 8:10












          • Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
            – Jack Anderson
            Dec 30 '18 at 9:47






          • 2




            @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
            – ozzy
            Dec 30 '18 at 10:09










          • Dear Ozzy, can you please also explain the purpose of following switches? K(d+
            – Jack Anderson
            Dec 31 '18 at 7:05










          • @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
            – ozzy
            Dec 31 '18 at 8:10


















          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You could use:



          grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd




          • grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from your logfile;


          • sort -n sorts them numerically, and


          • uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.






          share|improve this answer























          • Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
            – Jack Anderson
            Dec 30 '18 at 8:10












          • Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
            – Jack Anderson
            Dec 30 '18 at 9:47






          • 2




            @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
            – ozzy
            Dec 30 '18 at 10:09










          • Dear Ozzy, can you please also explain the purpose of following switches? K(d+
            – Jack Anderson
            Dec 31 '18 at 7:05










          • @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
            – ozzy
            Dec 31 '18 at 8:10
















          1














          You could use:



          grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd




          • grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from your logfile;


          • sort -n sorts them numerically, and


          • uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.






          share|improve this answer























          • Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
            – Jack Anderson
            Dec 30 '18 at 8:10












          • Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
            – Jack Anderson
            Dec 30 '18 at 9:47






          • 2




            @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
            – ozzy
            Dec 30 '18 at 10:09










          • Dear Ozzy, can you please also explain the purpose of following switches? K(d+
            – Jack Anderson
            Dec 31 '18 at 7:05










          • @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
            – ozzy
            Dec 31 '18 at 8:10














          1












          1








          1






          You could use:



          grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd




          • grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from your logfile;


          • sort -n sorts them numerically, and


          • uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.






          share|improve this answer














          You could use:



          grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd




          • grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from your logfile;


          • sort -n sorts them numerically, and


          • uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 30 '18 at 8:27

























          answered Dec 30 '18 at 7:20









          ozzyozzy

          4414




          4414












          • Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
            – Jack Anderson
            Dec 30 '18 at 8:10












          • Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
            – Jack Anderson
            Dec 30 '18 at 9:47






          • 2




            @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
            – ozzy
            Dec 30 '18 at 10:09










          • Dear Ozzy, can you please also explain the purpose of following switches? K(d+
            – Jack Anderson
            Dec 31 '18 at 7:05










          • @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
            – ozzy
            Dec 31 '18 at 8:10


















          • Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
            – Jack Anderson
            Dec 30 '18 at 8:10












          • Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
            – Jack Anderson
            Dec 30 '18 at 9:47






          • 2




            @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
            – ozzy
            Dec 30 '18 at 10:09










          • Dear Ozzy, can you please also explain the purpose of following switches? K(d+
            – Jack Anderson
            Dec 31 '18 at 7:05










          • @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
            – ozzy
            Dec 31 '18 at 8:10
















          Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
          – Jack Anderson
          Dec 30 '18 at 8:10






          Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
          – Jack Anderson
          Dec 30 '18 at 8:10














          Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
          – Jack Anderson
          Dec 30 '18 at 9:47




          Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
          – Jack Anderson
          Dec 30 '18 at 9:47




          2




          2




          @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
          – ozzy
          Dec 30 '18 at 10:09




          @JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
          – ozzy
          Dec 30 '18 at 10:09












          Dear Ozzy, can you please also explain the purpose of following switches? K(d+
          – Jack Anderson
          Dec 31 '18 at 7:05




          Dear Ozzy, can you please also explain the purpose of following switches? K(d+
          – Jack Anderson
          Dec 31 '18 at 7:05












          @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
          – ozzy
          Dec 31 '18 at 8:10




          @JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
          – ozzy
          Dec 31 '18 at 8:10



          Popular posts from this blog

          Morgemoulin

          Scott Moir

          Souastre