How to extract duplicate numbers from a log file? [closed]
I have some log files in one of my server having the below mentioned log entries.
FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2
I wanted to extract the following information from these logs:
1- Extract all SubscriberNumber from the log files.
2- Then find the SubscriberNumbers which have multiple occurrences in the logs.
linux shell
closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I have some log files in one of my server having the below mentioned log entries.
FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2
I wanted to extract the following information from these logs:
1- Extract all SubscriberNumber from the log files.
2- Then find the SubscriberNumbers which have multiple occurrences in the logs.
linux shell
closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I have some log files in one of my server having the below mentioned log entries.
FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2
I wanted to extract the following information from these logs:
1- Extract all SubscriberNumber from the log files.
2- Then find the SubscriberNumbers which have multiple occurrences in the logs.
linux shell
I have some log files in one of my server having the below mentioned log entries.
FTM.FC103.20181228034503.20181228035250:2018-12-28 08:19:59.893 FAIL DROP: Too many resend tries failed Failed for request id: 8397796 Cause: unknown Info: Code: ,USSD RequestId=8397796 OriginalId=8397545 EventCorrelationI
d="03a4264124" CreationTime="20181228081949" ResendCount=1 Timestamp=1545968994377 (Fri Dec 28 08:19:54 AFT 2018) State=STATE_SENT SubscriberNumber=96700606310 UssdText=Last event was charged 3.00 RYL, Duration 0:00:52, Remaining balance
35.29 AFN and will expire 25.12.2020.1500 RYL = 32GB valid 30 Days, Dial *811*32*1#. NumberingPlan=1 Nadi=4 UssdFormat=2
I wanted to extract the following information from these logs:
1- Extract all SubscriberNumber from the log files.
2- Then find the SubscriberNumbers which have multiple occurrences in the logs.
linux shell
linux shell
edited Dec 30 '18 at 7:54
Rui F Ribeiro
39.4k1479131
39.4k1479131
asked Dec 30 '18 at 5:41
Jack AndersonJack Anderson
1
1
closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by G-Man, αғsнιη, Thomas, Archemar, peterh Jan 1 at 4:31
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You could use:
grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd
grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from yourlogfile
;
sort -n sorts them numerically, and
uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
|
show 1 more comment
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could use:
grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd
grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from yourlogfile
;
sort -n sorts them numerically, and
uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
|
show 1 more comment
You could use:
grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd
grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from yourlogfile
;
sort -n sorts them numerically, and
uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
|
show 1 more comment
You could use:
grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd
grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from yourlogfile
;
sort -n sorts them numerically, and
uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.
You could use:
grep -oP 'SubscriberNumber=K(d+)' logfile | sort -n | uniq -cd
grep -oP 'SubscriberNumber=K(d+)' logfile isolates all individual SubscriberNumbers from yourlogfile
;
sort -n sorts them numerically, and
uniq -cd prints any duplicate numbers, i.e. those with multiple occurrences, including a count.
edited Dec 30 '18 at 8:27
answered Dec 30 '18 at 7:20
ozzyozzy
4414
4414
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
|
show 1 more comment
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Many Thanks dear Ozzy, can you plz also help in counting the duplicate occurrences of each number? for example: 11 ---- 967090099887
– Jack Anderson
Dec 30 '18 at 8:10
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
Thanks again dear Ozzy. I have got the duplicate occurrences. now is it possible if I can get the list of those numbers having maximum duplicate occurrences for example between 20 - 100 or let say more than 30? grep -oP 'SubscriberNumber=K(d+)' 30.unknown.txt | sort -n | uniq -cd 3 96700000165 2 96700000584 23 96700001632 6 96700001744 4 96700001876 2 96700002632 2 96700003071 2 96700004656 3 96700004948 10 96700006053 2 96700007154 2 96700007248
– Jack Anderson
Dec 30 '18 at 9:47
2
2
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
@JackAnderson It would be easier to help you if you'd first formulate a complete question. Alternatively, you can mark the original question as answered and open a new one. Answering an iteratively changing question may require repeated revisions of the answer, and even backtracking on a chosen approach.
– ozzy
Dec 30 '18 at 10:09
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
Dear Ozzy, can you please also explain the purpose of following switches? K(d+
– Jack Anderson
Dec 31 '18 at 7:05
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
@JackAnderson The d matches any digit; (d+) is a group of one or more consecutive digits. The K is explained here: stackoverflow.com/questions/33573920/….
– ozzy
Dec 31 '18 at 8:10
|
show 1 more comment