Did Statistics.com publish the wrong answer?












28














Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?



https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true



My peer and I both came up with the same answer independently and it doesn't match the published solution.



Our solution:




(.9*.1)/((.9*.1)+(.2*.9))=1/3




Their solution:




This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.




Who was right










share|cite|improve this question




















  • 4




    looks like they corrected the solution on their website to be in line with what you calculated
    – nope
    Dec 18 at 16:11






  • 2




    @nope, quietly corrected the answer. sneaky
    – Aksakal
    Dec 18 at 16:26










  • Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
    – Kodiologist
    Dec 18 at 19:40










  • "The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
    – Acccumulation
    Dec 18 at 20:49










  • This is known as the false positive paradox
    – BlueRaja - Danny Pflughoeft
    Dec 18 at 23:18
















28














Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?



https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true



My peer and I both came up with the same answer independently and it doesn't match the published solution.



Our solution:




(.9*.1)/((.9*.1)+(.2*.9))=1/3




Their solution:




This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.




Who was right










share|cite|improve this question




















  • 4




    looks like they corrected the solution on their website to be in line with what you calculated
    – nope
    Dec 18 at 16:11






  • 2




    @nope, quietly corrected the answer. sneaky
    – Aksakal
    Dec 18 at 16:26










  • Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
    – Kodiologist
    Dec 18 at 19:40










  • "The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
    – Acccumulation
    Dec 18 at 20:49










  • This is known as the false positive paradox
    – BlueRaja - Danny Pflughoeft
    Dec 18 at 23:18














28












28








28


5





Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?



https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true



My peer and I both came up with the same answer independently and it doesn't match the published solution.



Our solution:




(.9*.1)/((.9*.1)+(.2*.9))=1/3




Their solution:




This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.




Who was right










share|cite|improve this question















Statistics.com published a problem of the week:
The rate of residential insurance fraud is 10% (one out of ten claims is fraudulent). A consultant has proposed a machine learning system to review claims and classify them as fraud or no-fraud. The system is 90% effective in detecting the fraudulent claims, but only 80% effective in correctly classifying the non-fraud claims (it mistakenly labels one in five as “fraud”). If the system classifies a claim as fraudulent, what is the probability that it really is fraudulent?



https://www.statistics.com/news/231/192/Conditional-Probability/?showtemplate=true



My peer and I both came up with the same answer independently and it doesn't match the published solution.



Our solution:




(.9*.1)/((.9*.1)+(.2*.9))=1/3




Their solution:




This is a problem in conditional probability. (It’s also a Bayesian problem, but applying the formula in Bayes Rule only helps to obscure what’s going on.) Consider 100 claims. 10 will be fraudulent, and the system will correctly label 9 of them as “fraud.” 90 claims will be OK, but the system will incorrectly classify 72 (80%) as “fraud.” So a total of 81 claims have been labeled as fraudulent, but only 9 of them, 11%, are actually fraudulent.




Who was right







probability bayesian puzzle






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 18 at 15:44









Tim

55.5k9124213




55.5k9124213










asked Dec 18 at 15:38









ChrisG

24325




24325








  • 4




    looks like they corrected the solution on their website to be in line with what you calculated
    – nope
    Dec 18 at 16:11






  • 2




    @nope, quietly corrected the answer. sneaky
    – Aksakal
    Dec 18 at 16:26










  • Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
    – Kodiologist
    Dec 18 at 19:40










  • "The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
    – Acccumulation
    Dec 18 at 20:49










  • This is known as the false positive paradox
    – BlueRaja - Danny Pflughoeft
    Dec 18 at 23:18














  • 4




    looks like they corrected the solution on their website to be in line with what you calculated
    – nope
    Dec 18 at 16:11






  • 2




    @nope, quietly corrected the answer. sneaky
    – Aksakal
    Dec 18 at 16:26










  • Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
    – Kodiologist
    Dec 18 at 19:40










  • "The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
    – Acccumulation
    Dec 18 at 20:49










  • This is known as the false positive paradox
    – BlueRaja - Danny Pflughoeft
    Dec 18 at 23:18








4




4




looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11




looks like they corrected the solution on their website to be in line with what you calculated
– nope
Dec 18 at 16:11




2




2




@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26




@nope, quietly corrected the answer. sneaky
– Aksakal
Dec 18 at 16:26












Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40




Trivia: in behavioral decision-making, this problem is often referred to as the "mammogram problem", since its usual presentation is about the chance of a patient having cancer given a positive mammogram.
– Kodiologist
Dec 18 at 19:40












"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49




"The good news is, our system classifies 90% of fraud as fraud. The bad news is, it classifies 80% of non-fraud as fraud." Note the the 11% they calculate is only slightly higher than the 10% base rate. A machine learning model where the fraud rate in the flagged cases is only 10% more than the base rate is quite terrible.
– Acccumulation
Dec 18 at 20:49












This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18




This is known as the false positive paradox
– BlueRaja - Danny Pflughoeft
Dec 18 at 23:18










2 Answers
2






active

oldest

votes


















41














I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.






share|cite|improve this answer





























    11














    You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.






    share|cite|improve this answer





















      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "65"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383605%2fdid-statistics-com-publish-the-wrong-answer%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      41














      I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.






      share|cite|improve this answer


























        41














        I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.






        share|cite|improve this answer
























          41












          41








          41






          I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.






          share|cite|improve this answer












          I believe that you and your colleague are correct. Statistics.com has the correct line of thinking, but makes a simple mistake. Out of the 90 "OK" claims, we expect 20% of them to be incorrectly classified as fraud, not 80%. 20% of 90 is 18, leading to 9 correctly identified claims and 18 incorrect claims, with a ratio of 1/3, exactly what Bayes' rule yields.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Dec 18 at 16:15









          James Otto

          55335




          55335

























              11














              You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.






              share|cite|improve this answer


























                11














                You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.






                share|cite|improve this answer
























                  11












                  11








                  11






                  You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.






                  share|cite|improve this answer












                  You are correct. The solution that the website posted is based on a misreading of the problem in that 80% of the nonfraudulent claims are classified as fraudulent instead of the given 20%.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Dec 18 at 16:15









                  Dilip Sarwate

                  29.8k252147




                  29.8k252147






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383605%2fdid-statistics-com-publish-the-wrong-answer%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Morgemoulin

                      Scott Moir

                      Souastre