Name comparison using fuzzy string matching












0














I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.



TLDR; I need help optimizing this code so it has a shorter run time.



from fuzzywuzzy import fuzz
from fuzzywuzzy import process

accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns


def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))

options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")









share|improve this question









New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    14 mins ago
















0














I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.



TLDR; I need help optimizing this code so it has a shorter run time.



from fuzzywuzzy import fuzz
from fuzzywuzzy import process

accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns


def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))

options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")









share|improve this question









New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    14 mins ago














0












0








0







I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.



TLDR; I need help optimizing this code so it has a shorter run time.



from fuzzywuzzy import fuzz
from fuzzywuzzy import process

accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns


def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))

options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")









share|improve this question









New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm somewhat new to python and wrote this piece of code to do a string comparison of accounts that are being requested for import into our data base against accounts that are already present. The issue is that the accounts currently in our DB is over 65K and I'm comparing over 5K accounts for import causing this code to take over 5 hours to run. I suspect this has to do with the loop I'm using but I'm not certain how to improve it.



TLDR; I need help optimizing this code so it has a shorter run time.



from fuzzywuzzy import fuzz
from fuzzywuzzy import process

accounts_DB = pd.read_csv("file.csv") #65,000 rows and 15 columns
accounts_SF = pd.read_csv("Requested Import.csv") #5,000 rows and 30 columns


def NameComparison(DB_account, choices):
"""Function uses fuzzywuzzy module to perform Levenshtein distance string comparison"""
return(process.extractBests(DB_account, choices, score_cutoff= 95))

options = accounts_sf["Account Name"]
a_list =
for i in range(len(accounts_db)):
a_list.append(NameComparison(accounts_db.at[i,"Company Name"], options))
b_list = pd.DataFrame(a_list)
b_list.to_csv("Matched Accounts.csv")






python performance






share|improve this question









New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 min ago





















New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









Jason L

12




12




New contributor




Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Jason L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    14 mins ago


















  • Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
    – Sᴀᴍ Onᴇᴌᴀ
    14 mins ago
















Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago




Welcome to Code Review! What task does this code accomplish? Please tell us, and also make that the title of the question via edit. Maybe you missed the placeholder on the title element: "State the task that your code accomplishes. Make your title distinctive.". Also from How to Ask: "State what your code does in your title, not your main concerns about it.".
– Sᴀᴍ Onᴇᴌᴀ
14 mins ago















active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Jason L is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210651%2fname-comparison-using-fuzzy-string-matching%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








Jason L is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Jason L is a new contributor. Be nice, and check out our Code of Conduct.













Jason L is a new contributor. Be nice, and check out our Code of Conduct.












Jason L is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210651%2fname-comparison-using-fuzzy-string-matching%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre