Did Statistics.com publish the wrong answer?
Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?
https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true
My peer and I both came up with the same answer independently and it doesn't match the published solution.
Our solution:
(.9*.1)/((.9*.1)+(.2*.9))=1/3
Their solution:
This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.
Who was right
probability bayesian puzzle
add a comment |
Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?
https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true
My peer and I both came up with the same answer independently and it doesn't match the published solution.
Our solution:
(.9*.1)/((.9*.1)+(.2*.9))=1/3
Their solution:
This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.
Who was right
probability bayesian puzzle
4
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
2
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18
add a comment |
Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?
https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true
My peer and I both came up with the same answer independently and it doesn't match the published solution.
Our solution:
(.9*.1)/((.9*.1)+(.2*.9))=1/3
Their solution:
This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.
Who was right
probability bayesian puzzle
Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?
https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true
My peer and I both came up with the same answer independently and it doesn't match the published solution.
Our solution:
(.9*.1)/((.9*.1)+(.2*.9))=1/3
Their solution:
This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.
Who was right
probability bayesian puzzle
probability bayesian puzzle
edited Dec 18 at 15:44
Tim♦
55.5k9124213
55.5k9124213
asked Dec 18 at 15:38
ChrisG
24325
24325
4
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
2
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18
add a comment |
4
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
2
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18
4
4
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
2
2
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18
add a comment |
2 Answers
2
active
oldest
votes
I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.
add a comment |
You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383605%2fdid-statistics-com-publish-the-wrong-answer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.
add a comment |
I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.
add a comment |
I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.
I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.
answered Dec 18 at 16:15
James Otto
55335
55335
add a comment |
add a comment |
You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.
add a comment |
You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.
add a comment |
You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.
You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.
answered Dec 18 at 16:15
Dilip Sarwate
29.8k252147
29.8k252147
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383605%2fdid-statistics-com-publish-the-wrong-answer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11
2
@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26
Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40
"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49
This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18