Name comparison using fuzzy string matching
I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.
TLDR; I need help optimizing this code so it has a shorter run time.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns
def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))
options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")
python performance
New contributor
add a comment |
I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.
TLDR; I need help optimizing this code so it has a shorter run time.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns
def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))
options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")
python performance
New contributor
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago
add a comment |
I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.
TLDR; I need help optimizing this code so it has a shorter run time.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns
def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))
options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")
python performance
New contributor
I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.
TLDR; I need help optimizing this code so it has a shorter run time.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns
def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))
options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")
python performance
python performance
New contributor
New contributor
edited 1 min ago
New contributor
asked 1 hour ago
Jason L
12
12
New contributor
New contributor
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago
add a comment |
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Jason L is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210651%2fname-comparison-using-fuzzy-string-matching%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Jason L is a new contributor. Be nice, and check out our Code of Conduct.
Jason L is a new contributor. Be nice, and check out our Code of Conduct.
Jason L is a new contributor. Be nice, and check out our Code of Conduct.
Jason L is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210651%2fname-comparison-using-fuzzy-string-matching%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago