detecting unique lines from log file











up vote
0
down vote

favorite












I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    Nov 20 at 12:07










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    Nov 20 at 12:14










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    Nov 20 at 16:25












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    Nov 20 at 16:31










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    Nov 20 at 21:04















up vote
0
down vote

favorite












I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    Nov 20 at 12:07










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    Nov 20 at 12:14










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    Nov 20 at 16:25












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    Nov 20 at 16:31










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    Nov 20 at 21:04













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.







command-line logs wildcards text






share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Nov 20 at 12:06









user772266

1




1




New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    Nov 20 at 12:07










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    Nov 20 at 12:14










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    Nov 20 at 16:25












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    Nov 20 at 16:31










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    Nov 20 at 21:04














  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    Nov 20 at 12:07










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    Nov 20 at 12:14










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    Nov 20 at 16:25












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    Nov 20 at 16:31










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    Nov 20 at 21:04








1




1




What steps have you tried to take on your own? What were the results? Please include this in your question.
– Panki
Nov 20 at 12:07




What steps have you tried to take on your own? What were the results? Please include this in your question.
– Panki
Nov 20 at 12:07












have you looked at cut and uniq?
– ctrl-alt-delor
Nov 20 at 12:14




have you looked at cut and uniq?
– ctrl-alt-delor
Nov 20 at 12:14












Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
– user772266
Nov 20 at 16:25






Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
– user772266
Nov 20 at 16:25














I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
– user772266
Nov 20 at 16:31




I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
– user772266
Nov 20 at 16:31












I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
– Jeff Schaller
Nov 20 at 21:04




I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
– Jeff Schaller
Nov 20 at 21:04















active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






user772266 is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482960%2fdetecting-unique-lines-from-log-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








user772266 is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















user772266 is a new contributor. Be nice, and check out our Code of Conduct.













user772266 is a new contributor. Be nice, and check out our Code of Conduct.












user772266 is a new contributor. Be nice, and check out our Code of Conduct.















 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482960%2fdetecting-unique-lines-from-log-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre