how to find duplicate field in over 100 files
up vote
1
down vote
favorite
I have about 120 files each with >1000 lines
Each line has it's own key in it. The columns are | separated
here is an example line, the key column (column 11 always column 11) is: 201075ITE854075_RECardProtectionlogi.msg
Error: null, Data:
|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05
14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country
not known
Is there a way to find all lines that have matching key/column 11 value(the whole line won't match)?
Can i do this on the command line or in a script?
I will be using cygwin.
I have no idea how to even start, so even if you only feel willing to give me suitable commands to look up, that would be gratefully received.
Each line has it's own key so there are potentially as many keys as there are lines.
I just want the script to run on an entire directory and report duplicate keys amongst all the files, without any other user input.
What defines a key is being in column 11.
shell-script files
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
1
down vote
favorite
I have about 120 files each with >1000 lines
Each line has it's own key in it. The columns are | separated
here is an example line, the key column (column 11 always column 11) is: 201075ITE854075_RECardProtectionlogi.msg
Error: null, Data:
|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05
14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country
not known
Is there a way to find all lines that have matching key/column 11 value(the whole line won't match)?
Can i do this on the command line or in a script?
I will be using cygwin.
I have no idea how to even start, so even if you only feel willing to give me suitable commands to look up, that would be gratefully received.
Each line has it's own key so there are potentially as many keys as there are lines.
I just want the script to run on an entire directory and report duplicate keys amongst all the files, without any other user input.
What defines a key is being in column 11.
shell-script files
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I'm confused by your question. Are you simply looking for the literal string201075ITE854075_RECardProtectionlogi.msgin all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just usegrep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files
– Jesse_b
Nov 21 at 16:18
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have about 120 files each with >1000 lines
Each line has it's own key in it. The columns are | separated
here is an example line, the key column (column 11 always column 11) is: 201075ITE854075_RECardProtectionlogi.msg
Error: null, Data:
|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05
14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country
not known
Is there a way to find all lines that have matching key/column 11 value(the whole line won't match)?
Can i do this on the command line or in a script?
I will be using cygwin.
I have no idea how to even start, so even if you only feel willing to give me suitable commands to look up, that would be gratefully received.
Each line has it's own key so there are potentially as many keys as there are lines.
I just want the script to run on an entire directory and report duplicate keys amongst all the files, without any other user input.
What defines a key is being in column 11.
shell-script files
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have about 120 files each with >1000 lines
Each line has it's own key in it. The columns are | separated
here is an example line, the key column (column 11 always column 11) is: 201075ITE854075_RECardProtectionlogi.msg
Error: null, Data:
|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05
14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country
not known
Is there a way to find all lines that have matching key/column 11 value(the whole line won't match)?
Can i do this on the command line or in a script?
I will be using cygwin.
I have no idea how to even start, so even if you only feel willing to give me suitable commands to look up, that would be gratefully received.
Each line has it's own key so there are potentially as many keys as there are lines.
I just want the script to run on an entire directory and report duplicate keys amongst all the files, without any other user input.
What defines a key is being in column 11.
shell-script files
shell-script files
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Nov 21 at 17:47
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Nov 21 at 16:00
WendyG
1063
1063
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
WendyG is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I'm confused by your question. Are you simply looking for the literal string201075ITE854075_RECardProtectionlogi.msgin all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just usegrep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files
– Jesse_b
Nov 21 at 16:18
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14
add a comment |
I'm confused by your question. Are you simply looking for the literal string201075ITE854075_RECardProtectionlogi.msgin all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just usegrep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files
– Jesse_b
Nov 21 at 16:18
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14
I'm confused by your question. Are you simply looking for the literal string
201075ITE854075_RECardProtectionlogi.msg in all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just use grep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files– Jesse_b
Nov 21 at 16:18
I'm confused by your question. Are you simply looking for the literal string
201075ITE854075_RECardProtectionlogi.msg in all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just use grep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files– Jesse_b
Nov 21 at 16:18
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
Assuming that by "key" you mean "column", you could use something like this:
cut -f 11 -d "|" $(find . -type f -iname "*.txt") | sort | uniq -d | sed 's/\/./g' | while read duplicate; do grep -rHn "|$duplicate|" * ; done
You will probably have to change the contents of the $(find -iname) to whatever extension your log files have (or just remove it if the only files in the directory are log files. This will recursively find all log files and match them.
The output for some test data looks like this:
test_data.txt:1:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data.txt:5:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data_2.txt:2:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data.txt:3:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data_2.txt:4:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
test_data.txt:7:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
Where those are all lines within the files with field 11 duplicated.
Explanation of what the command does.
cut -f 11 -d "|" Get 11th field (as delimited by |)
find . -type f -iname "*.txt" consider any files ending in .txt in current directory (recursively)
sort | uniq -d show all duplicated "field 11s"
sed /\/./g' This is a hack because messes up bash. We replace it with ., which grep matches as any character.
while read duplicate; do grep -rHn "|$duplicate|" *; done - iterate over list of duplicates and find all occurrences of them, outputting filename and line numbers of where duplicates occured.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
add a comment |
up vote
0
down vote
It's no clear what are you trying to do, but, I'll give a try:
First, what is your line? You gave this as a line:
Error: null, Data:|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201175ITE854075_RECardProtectionlogi.msg|Country not known
If your lines looks like that, then, your key is in field 11
201175ITE854075_RECardProtectionlogi.msg
but, what define your key? Is it just to be in field 11?
If that so, you can do something like this, in the directory with your target files:
sort --field-separator='|' --key=11 <(grep --recursive --line-number --color=always --with-filename '' *)
this will give you a colorized output of the name of the file, followed by the line number in that file, and then the line itself, all sorted by key field 11; so, in the output, all matching keys in any file, appears one on top of each other...
I think that this will give you a clue, at least
Note: the backslash in front of grep it's to prevent any grep aliases.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Assuming that by "key" you mean "column", you could use something like this:
cut -f 11 -d "|" $(find . -type f -iname "*.txt") | sort | uniq -d | sed 's/\/./g' | while read duplicate; do grep -rHn "|$duplicate|" * ; done
You will probably have to change the contents of the $(find -iname) to whatever extension your log files have (or just remove it if the only files in the directory are log files. This will recursively find all log files and match them.
The output for some test data looks like this:
test_data.txt:1:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data.txt:5:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data_2.txt:2:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data.txt:3:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data_2.txt:4:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
test_data.txt:7:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
Where those are all lines within the files with field 11 duplicated.
Explanation of what the command does.
cut -f 11 -d "|" Get 11th field (as delimited by |)
find . -type f -iname "*.txt" consider any files ending in .txt in current directory (recursively)
sort | uniq -d show all duplicated "field 11s"
sed /\/./g' This is a hack because messes up bash. We replace it with ., which grep matches as any character.
while read duplicate; do grep -rHn "|$duplicate|" *; done - iterate over list of duplicates and find all occurrences of them, outputting filename and line numbers of where duplicates occured.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
add a comment |
up vote
0
down vote
Assuming that by "key" you mean "column", you could use something like this:
cut -f 11 -d "|" $(find . -type f -iname "*.txt") | sort | uniq -d | sed 's/\/./g' | while read duplicate; do grep -rHn "|$duplicate|" * ; done
You will probably have to change the contents of the $(find -iname) to whatever extension your log files have (or just remove it if the only files in the directory are log files. This will recursively find all log files and match them.
The output for some test data looks like this:
test_data.txt:1:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data.txt:5:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data_2.txt:2:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data.txt:3:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data_2.txt:4:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
test_data.txt:7:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
Where those are all lines within the files with field 11 duplicated.
Explanation of what the command does.
cut -f 11 -d "|" Get 11th field (as delimited by |)
find . -type f -iname "*.txt" consider any files ending in .txt in current directory (recursively)
sort | uniq -d show all duplicated "field 11s"
sed /\/./g' This is a hack because messes up bash. We replace it with ., which grep matches as any character.
while read duplicate; do grep -rHn "|$duplicate|" *; done - iterate over list of duplicates and find all occurrences of them, outputting filename and line numbers of where duplicates occured.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
add a comment |
up vote
0
down vote
up vote
0
down vote
Assuming that by "key" you mean "column", you could use something like this:
cut -f 11 -d "|" $(find . -type f -iname "*.txt") | sort | uniq -d | sed 's/\/./g' | while read duplicate; do grep -rHn "|$duplicate|" * ; done
You will probably have to change the contents of the $(find -iname) to whatever extension your log files have (or just remove it if the only files in the directory are log files. This will recursively find all log files and match them.
The output for some test data looks like this:
test_data.txt:1:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data.txt:5:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data_2.txt:2:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data.txt:3:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data_2.txt:4:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
test_data.txt:7:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
Where those are all lines within the files with field 11 duplicated.
Explanation of what the command does.
cut -f 11 -d "|" Get 11th field (as delimited by |)
find . -type f -iname "*.txt" consider any files ending in .txt in current directory (recursively)
sort | uniq -d show all duplicated "field 11s"
sed /\/./g' This is a hack because messes up bash. We replace it with ., which grep matches as any character.
while read duplicate; do grep -rHn "|$duplicate|" *; done - iterate over list of duplicates and find all occurrences of them, outputting filename and line numbers of where duplicates occured.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Assuming that by "key" you mean "column", you could use something like this:
cut -f 11 -d "|" $(find . -type f -iname "*.txt") | sort | uniq -d | sed 's/\/./g' | while read duplicate; do grep -rHn "|$duplicate|" * ; done
You will probably have to change the contents of the $(find -iname) to whatever extension your log files have (or just remove it if the only files in the directory are log files. This will recursively find all log files and match them.
The output for some test data looks like this:
test_data.txt:1:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data.txt:5:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msg|Country not known
test_data_2.txt:2:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data.txt:3:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIDONTMATCH|Country not known
test_data_2.txt:4:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
test_data.txt:7:Error: null, Data: |862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201075ITE854075_RECardProtectionlogi.msgIlikecake|Country not known
Where those are all lines within the files with field 11 duplicated.
Explanation of what the command does.
cut -f 11 -d "|" Get 11th field (as delimited by |)
find . -type f -iname "*.txt" consider any files ending in .txt in current directory (recursively)
sort | uniq -d show all duplicated "field 11s"
sed /\/./g' This is a hack because messes up bash. We replace it with ., which grep matches as any character.
while read duplicate; do grep -rHn "|$duplicate|" *; done - iterate over list of duplicates and find all occurrences of them, outputting filename and line numbers of where duplicates occured.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered Nov 21 at 17:18
f41lurizer
1091
1091
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
f41lurizer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
add a comment |
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
yes i did mean column, these are errors from a DB migration, and these are the keys in this DB table (filename)
– WendyG
Nov 21 at 17:31
add a comment |
up vote
0
down vote
It's no clear what are you trying to do, but, I'll give a try:
First, what is your line? You gave this as a line:
Error: null, Data:|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201175ITE854075_RECardProtectionlogi.msg|Country not known
If your lines looks like that, then, your key is in field 11
201175ITE854075_RECardProtectionlogi.msg
but, what define your key? Is it just to be in field 11?
If that so, you can do something like this, in the directory with your target files:
sort --field-separator='|' --key=11 <(grep --recursive --line-number --color=always --with-filename '' *)
this will give you a colorized output of the name of the file, followed by the line number in that file, and then the line itself, all sorted by key field 11; so, in the output, all matching keys in any file, appears one on top of each other...
I think that this will give you a clue, at least
Note: the backslash in front of grep it's to prevent any grep aliases.
add a comment |
up vote
0
down vote
It's no clear what are you trying to do, but, I'll give a try:
First, what is your line? You gave this as a line:
Error: null, Data:|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201175ITE854075_RECardProtectionlogi.msg|Country not known
If your lines looks like that, then, your key is in field 11
201175ITE854075_RECardProtectionlogi.msg
but, what define your key? Is it just to be in field 11?
If that so, you can do something like this, in the directory with your target files:
sort --field-separator='|' --key=11 <(grep --recursive --line-number --color=always --with-filename '' *)
this will give you a colorized output of the name of the file, followed by the line number in that file, and then the line itself, all sorted by key field 11; so, in the output, all matching keys in any file, appears one on top of each other...
I think that this will give you a clue, at least
Note: the backslash in front of grep it's to prevent any grep aliases.
add a comment |
up vote
0
down vote
up vote
0
down vote
It's no clear what are you trying to do, but, I'll give a try:
First, what is your line? You gave this as a line:
Error: null, Data:|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201175ITE854075_RECardProtectionlogi.msg|Country not known
If your lines looks like that, then, your key is in field 11
201175ITE854075_RECardProtectionlogi.msg
but, what define your key? Is it just to be in field 11?
If that so, you can do something like this, in the directory with your target files:
sort --field-separator='|' --key=11 <(grep --recursive --line-number --color=always --with-filename '' *)
this will give you a colorized output of the name of the file, followed by the line number in that file, and then the line itself, all sorted by key field 11; so, in the output, all matching keys in any file, appears one on top of each other...
I think that this will give you a clue, at least
Note: the backslash in front of grep it's to prevent any grep aliases.
It's no clear what are you trying to do, but, I'll give a try:
First, what is your line? You gave this as a line:
Error: null, Data:|862799|00318070L|EMA|EMAIL|null|20100705|2010-07-05 14:59:39.0|null|AUTO_20100705|201175ITE854075_RECardProtectionlogi.msg|Country not known
If your lines looks like that, then, your key is in field 11
201175ITE854075_RECardProtectionlogi.msg
but, what define your key? Is it just to be in field 11?
If that so, you can do something like this, in the directory with your target files:
sort --field-separator='|' --key=11 <(grep --recursive --line-number --color=always --with-filename '' *)
this will give you a colorized output of the name of the file, followed by the line number in that file, and then the line itself, all sorted by key field 11; so, in the output, all matching keys in any file, appears one on top of each other...
I think that this will give you a clue, at least
Note: the backslash in front of grep it's to prevent any grep aliases.
edited Nov 21 at 17:37
answered Nov 21 at 17:29
matsib.dev
14613
14613
add a comment |
add a comment |
WendyG is a new contributor. Be nice, and check out our Code of Conduct.
WendyG is a new contributor. Be nice, and check out our Code of Conduct.
WendyG is a new contributor. Be nice, and check out our Code of Conduct.
WendyG is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483248%2fhow-to-find-duplicate-field-in-over-100-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm confused by your question. Are you simply looking for the literal string
201075ITE854075_RECardProtectionlogi.msgin all files or will the search key be dynamic in some way? If you are only looking for that one string you can probably just usegrep -r '201075ITE854075_RECardProtectionlogi.msg' /directory/with/files– Jesse_b
Nov 21 at 16:18
@Jesse_b I believe this was easier to understand until it was edited, the key was in bold in the middle of the row. I have added extra detail
– WendyG
Nov 21 at 16:36
Knowing the key is not the issue. I don't understand whether the key is dynamic or not.
– Jesse_b
Nov 21 at 16:56
do you want the user of your script to enter a key to search for? Or do you want to extract all of the keys and find if any are duplicated (title)? Are the keys the 4th pipe-delimited field in all the files?
– Jeff Schaller
Nov 21 at 17:14