Is there a Unix command that searches for similar strings, based mostly on how they sound when spoken?
I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly. I know that grep
has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware, it does not have functionality to correct for spelling errors, and even if it did, since these are names of people, they wouldn't be found inside a standard dictionary.
Perhaps I can make my file of names into a special dictionary, and then use some standard spell checking tool? Of particular importance in this application is the ability to match similarly sounding words.
For example: "jacob"
should return "Jakob"
. Even better would be if inter-language similarities were also accounted for, so that "miguel"
should match "Michael"
.
Is this something that has been implemented already, or will I have to build my own?
search text pattern-matching natural-language
add a comment |
I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly. I know that grep
has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware, it does not have functionality to correct for spelling errors, and even if it did, since these are names of people, they wouldn't be found inside a standard dictionary.
Perhaps I can make my file of names into a special dictionary, and then use some standard spell checking tool? Of particular importance in this application is the ability to match similarly sounding words.
For example: "jacob"
should return "Jakob"
. Even better would be if inter-language similarities were also accounted for, so that "miguel"
should match "Michael"
.
Is this something that has been implemented already, or will I have to build my own?
search text pattern-matching natural-language
3
agrep
for approximategrep
(not for sound/language). Also inzsh
pattern matching(#a3)
for allow up to 3 mistakes.
– Stéphane Chazelas
Jun 14 '13 at 9:36
7
Take a look at theText::Soundex
perl
core module too: pastebin.com/UbeVFBQA
– manatwork
Jun 14 '13 at 10:40
1
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
2
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17
add a comment |
I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly. I know that grep
has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware, it does not have functionality to correct for spelling errors, and even if it did, since these are names of people, they wouldn't be found inside a standard dictionary.
Perhaps I can make my file of names into a special dictionary, and then use some standard spell checking tool? Of particular importance in this application is the ability to match similarly sounding words.
For example: "jacob"
should return "Jakob"
. Even better would be if inter-language similarities were also accounted for, so that "miguel"
should match "Michael"
.
Is this something that has been implemented already, or will I have to build my own?
search text pattern-matching natural-language
I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly. I know that grep
has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware, it does not have functionality to correct for spelling errors, and even if it did, since these are names of people, they wouldn't be found inside a standard dictionary.
Perhaps I can make my file of names into a special dictionary, and then use some standard spell checking tool? Of particular importance in this application is the ability to match similarly sounding words.
For example: "jacob"
should return "Jakob"
. Even better would be if inter-language similarities were also accounted for, so that "miguel"
should match "Michael"
.
Is this something that has been implemented already, or will I have to build my own?
search text pattern-matching natural-language
search text pattern-matching natural-language
edited Jan 8 at 7:07
gabkdlly
asked Jun 14 '13 at 9:20
gabkdllygabkdlly
1384
1384
3
agrep
for approximategrep
(not for sound/language). Also inzsh
pattern matching(#a3)
for allow up to 3 mistakes.
– Stéphane Chazelas
Jun 14 '13 at 9:36
7
Take a look at theText::Soundex
perl
core module too: pastebin.com/UbeVFBQA
– manatwork
Jun 14 '13 at 10:40
1
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
2
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17
add a comment |
3
agrep
for approximategrep
(not for sound/language). Also inzsh
pattern matching(#a3)
for allow up to 3 mistakes.
– Stéphane Chazelas
Jun 14 '13 at 9:36
7
Take a look at theText::Soundex
perl
core module too: pastebin.com/UbeVFBQA
– manatwork
Jun 14 '13 at 10:40
1
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
2
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17
3
3
agrep
for approximate grep
(not for sound/language). Also in zsh
pattern matching (#a3)
for allow up to 3 mistakes.– Stéphane Chazelas
Jun 14 '13 at 9:36
agrep
for approximate grep
(not for sound/language). Also in zsh
pattern matching (#a3)
for allow up to 3 mistakes.– Stéphane Chazelas
Jun 14 '13 at 9:36
7
7
Take a look at the
Text::Soundex
perl
core module too: pastebin.com/UbeVFBQA– manatwork
Jun 14 '13 at 10:40
Take a look at the
Text::Soundex
perl
core module too: pastebin.com/UbeVFBQA– manatwork
Jun 14 '13 at 10:40
1
1
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
2
2
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17
add a comment |
1 Answer
1
active
oldest
votes
@manatwork has it right, soundex is probably the tool you're looking for.
Install the perl Soundex module using CPAN:
$ sudo cpan Text::Soundex
CPAN: Storable loaded ok (v2.27)
....
Text::Soundex is up to date (3.04).
Make a file full of names to test called names.txt
jacob
Jakob
miguel
Michael
Now the perl script to use the Soundex module, soundslike.pl
#!/usr/bin/perl
use Text::Soundex;
open(FH, 'names.txt');
$targetSoundex=soundex($ARGV[0]);
print "Target soundex of $ARGV[0] is $targetSoundexn";
while(<FH>) {
chomp;
print "Soundex of $_ is ".soundex($_);
if($targetSoundex eq soundex($_)) {
print " (match).n";
}else {
print " (no match).n";
}
}
close(FH);
Make it executable and run some examples:
$ chmod +x soundslike.pl
$ ./soundslike.pl michael
Target soundex of michael is M240
Soundex of jacob is J210 (no match).
Soundex of Jakob is J210 (no match).
Soundex of miguel is M240 (match).
Soundex of Michael is M240 (match).
$ ./soundslike.pl jagub
Target soundex of jagub is J210
Soundex of jacob is J210 (match).
Soundex of Jakob is J210 (match).
Soundex of miguel is M240 (no match).
Soundex of Michael is M240 (no match).
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f79377%2fis-there-a-unix-command-that-searches-for-similar-strings-based-mostly-on-how-t%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
@manatwork has it right, soundex is probably the tool you're looking for.
Install the perl Soundex module using CPAN:
$ sudo cpan Text::Soundex
CPAN: Storable loaded ok (v2.27)
....
Text::Soundex is up to date (3.04).
Make a file full of names to test called names.txt
jacob
Jakob
miguel
Michael
Now the perl script to use the Soundex module, soundslike.pl
#!/usr/bin/perl
use Text::Soundex;
open(FH, 'names.txt');
$targetSoundex=soundex($ARGV[0]);
print "Target soundex of $ARGV[0] is $targetSoundexn";
while(<FH>) {
chomp;
print "Soundex of $_ is ".soundex($_);
if($targetSoundex eq soundex($_)) {
print " (match).n";
}else {
print " (no match).n";
}
}
close(FH);
Make it executable and run some examples:
$ chmod +x soundslike.pl
$ ./soundslike.pl michael
Target soundex of michael is M240
Soundex of jacob is J210 (no match).
Soundex of Jakob is J210 (no match).
Soundex of miguel is M240 (match).
Soundex of Michael is M240 (match).
$ ./soundslike.pl jagub
Target soundex of jagub is J210
Soundex of jacob is J210 (match).
Soundex of Jakob is J210 (match).
Soundex of miguel is M240 (no match).
Soundex of Michael is M240 (no match).
add a comment |
@manatwork has it right, soundex is probably the tool you're looking for.
Install the perl Soundex module using CPAN:
$ sudo cpan Text::Soundex
CPAN: Storable loaded ok (v2.27)
....
Text::Soundex is up to date (3.04).
Make a file full of names to test called names.txt
jacob
Jakob
miguel
Michael
Now the perl script to use the Soundex module, soundslike.pl
#!/usr/bin/perl
use Text::Soundex;
open(FH, 'names.txt');
$targetSoundex=soundex($ARGV[0]);
print "Target soundex of $ARGV[0] is $targetSoundexn";
while(<FH>) {
chomp;
print "Soundex of $_ is ".soundex($_);
if($targetSoundex eq soundex($_)) {
print " (match).n";
}else {
print " (no match).n";
}
}
close(FH);
Make it executable and run some examples:
$ chmod +x soundslike.pl
$ ./soundslike.pl michael
Target soundex of michael is M240
Soundex of jacob is J210 (no match).
Soundex of Jakob is J210 (no match).
Soundex of miguel is M240 (match).
Soundex of Michael is M240 (match).
$ ./soundslike.pl jagub
Target soundex of jagub is J210
Soundex of jacob is J210 (match).
Soundex of Jakob is J210 (match).
Soundex of miguel is M240 (no match).
Soundex of Michael is M240 (no match).
add a comment |
@manatwork has it right, soundex is probably the tool you're looking for.
Install the perl Soundex module using CPAN:
$ sudo cpan Text::Soundex
CPAN: Storable loaded ok (v2.27)
....
Text::Soundex is up to date (3.04).
Make a file full of names to test called names.txt
jacob
Jakob
miguel
Michael
Now the perl script to use the Soundex module, soundslike.pl
#!/usr/bin/perl
use Text::Soundex;
open(FH, 'names.txt');
$targetSoundex=soundex($ARGV[0]);
print "Target soundex of $ARGV[0] is $targetSoundexn";
while(<FH>) {
chomp;
print "Soundex of $_ is ".soundex($_);
if($targetSoundex eq soundex($_)) {
print " (match).n";
}else {
print " (no match).n";
}
}
close(FH);
Make it executable and run some examples:
$ chmod +x soundslike.pl
$ ./soundslike.pl michael
Target soundex of michael is M240
Soundex of jacob is J210 (no match).
Soundex of Jakob is J210 (no match).
Soundex of miguel is M240 (match).
Soundex of Michael is M240 (match).
$ ./soundslike.pl jagub
Target soundex of jagub is J210
Soundex of jacob is J210 (match).
Soundex of Jakob is J210 (match).
Soundex of miguel is M240 (no match).
Soundex of Michael is M240 (no match).
@manatwork has it right, soundex is probably the tool you're looking for.
Install the perl Soundex module using CPAN:
$ sudo cpan Text::Soundex
CPAN: Storable loaded ok (v2.27)
....
Text::Soundex is up to date (3.04).
Make a file full of names to test called names.txt
jacob
Jakob
miguel
Michael
Now the perl script to use the Soundex module, soundslike.pl
#!/usr/bin/perl
use Text::Soundex;
open(FH, 'names.txt');
$targetSoundex=soundex($ARGV[0]);
print "Target soundex of $ARGV[0] is $targetSoundexn";
while(<FH>) {
chomp;
print "Soundex of $_ is ".soundex($_);
if($targetSoundex eq soundex($_)) {
print " (match).n";
}else {
print " (no match).n";
}
}
close(FH);
Make it executable and run some examples:
$ chmod +x soundslike.pl
$ ./soundslike.pl michael
Target soundex of michael is M240
Soundex of jacob is J210 (no match).
Soundex of Jakob is J210 (no match).
Soundex of miguel is M240 (match).
Soundex of Michael is M240 (match).
$ ./soundslike.pl jagub
Target soundex of jagub is J210
Soundex of jacob is J210 (match).
Soundex of Jakob is J210 (match).
Soundex of miguel is M240 (no match).
Soundex of Michael is M240 (no match).
answered Jun 17 '13 at 13:04
Nate from KalamazooNate from Kalamazoo
96157
96157
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f79377%2fis-there-a-unix-command-that-searches-for-similar-strings-based-mostly-on-how-t%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
agrep
for approximategrep
(not for sound/language). Also inzsh
pattern matching(#a3)
for allow up to 3 mistakes.– Stéphane Chazelas
Jun 14 '13 at 9:36
7
Take a look at the
Text::Soundex
perl
core module too: pastebin.com/UbeVFBQA– manatwork
Jun 14 '13 at 10:40
1
@manatwork - you should write that up as an answer!
– slm♦
Jun 14 '13 at 14:01
2
@slm, not sure how useful that can be in practice. For example “miguel”, “Michael”, “michelle”, “majkul” and “mysql” all have soundex code “M240”. When I tried to use it, I discovered that its too broad to be useful for most tasks. So I better let someone capable to fine tune it to make it an answer.
– manatwork
Jun 14 '13 at 14:17