How do I create a sum row and sum column in pandas?











up vote
8
down vote

favorite
1












I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



I've got a table that looks like this from Khan Academy:



             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's | 240 | 60 | 300
-------------+---------------+----------+------
Not | 3,760 | 440 | 4,200
-------------+---------------+----------+------
Total | 4,000 | 500 | 4,500


I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



"Graduate": {...},
"Undergraduate": {...},
"Total": {...},


But that seems like a naive approach that would both fall over quickly and just not really be extensible.



I've got the non-totals part of the table like this:



df = pd.DataFrame(
{
"Undergraduate": {"Straight A's": 240, "Not": 3_760},
"Graduate": {"Straight A's": 60, "Not": 440},
}
)
df


I've been looking and found a couple of promising things, like:



df['Total'] = df.sum(axis=1)


But I didn't find anything terribly elegant.



I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



totals(df, rows=True, columns=True)


or something.



Does this exist in pandas, or do I have to just cobble together my own approach?










share|improve this question


























    up vote
    8
    down vote

    favorite
    1












    I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



    I've got a table that looks like this from Khan Academy:



                 | Undergraduate | Graduate | Total
    -------------+---------------+----------+------
    Straight A's | 240 | 60 | 300
    -------------+---------------+----------+------
    Not | 3,760 | 440 | 4,200
    -------------+---------------+----------+------
    Total | 4,000 | 500 | 4,500


    I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



    "Graduate": {...},
    "Undergraduate": {...},
    "Total": {...},


    But that seems like a naive approach that would both fall over quickly and just not really be extensible.



    I've got the non-totals part of the table like this:



    df = pd.DataFrame(
    {
    "Undergraduate": {"Straight A's": 240, "Not": 3_760},
    "Graduate": {"Straight A's": 60, "Not": 440},
    }
    )
    df


    I've been looking and found a couple of promising things, like:



    df['Total'] = df.sum(axis=1)


    But I didn't find anything terribly elegant.



    I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



    I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



    totals(df, rows=True, columns=True)


    or something.



    Does this exist in pandas, or do I have to just cobble together my own approach?










    share|improve this question
























      up vote
      8
      down vote

      favorite
      1









      up vote
      8
      down vote

      favorite
      1






      1





      I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



      I've got a table that looks like this from Khan Academy:



                   | Undergraduate | Graduate | Total
      -------------+---------------+----------+------
      Straight A's | 240 | 60 | 300
      -------------+---------------+----------+------
      Not | 3,760 | 440 | 4,200
      -------------+---------------+----------+------
      Total | 4,000 | 500 | 4,500


      I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



      "Graduate": {...},
      "Undergraduate": {...},
      "Total": {...},


      But that seems like a naive approach that would both fall over quickly and just not really be extensible.



      I've got the non-totals part of the table like this:



      df = pd.DataFrame(
      {
      "Undergraduate": {"Straight A's": 240, "Not": 3_760},
      "Graduate": {"Straight A's": 60, "Not": 440},
      }
      )
      df


      I've been looking and found a couple of promising things, like:



      df['Total'] = df.sum(axis=1)


      But I didn't find anything terribly elegant.



      I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



      I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



      totals(df, rows=True, columns=True)


      or something.



      Does this exist in pandas, or do I have to just cobble together my own approach?










      share|improve this question













      I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



      I've got a table that looks like this from Khan Academy:



                   | Undergraduate | Graduate | Total
      -------------+---------------+----------+------
      Straight A's | 240 | 60 | 300
      -------------+---------------+----------+------
      Not | 3,760 | 440 | 4,200
      -------------+---------------+----------+------
      Total | 4,000 | 500 | 4,500


      I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



      "Graduate": {...},
      "Undergraduate": {...},
      "Total": {...},


      But that seems like a naive approach that would both fall over quickly and just not really be extensible.



      I've got the non-totals part of the table like this:



      df = pd.DataFrame(
      {
      "Undergraduate": {"Straight A's": 240, "Not": 3_760},
      "Graduate": {"Straight A's": 60, "Not": 440},
      }
      )
      df


      I've been looking and found a couple of promising things, like:



      df['Total'] = df.sum(axis=1)


      But I didn't find anything terribly elegant.



      I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



      I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



      totals(df, rows=True, columns=True)


      or something.



      Does this exist in pandas, or do I have to just cobble together my own approach?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 15:07









      Wayne Werner

      26.4k13109191




      26.4k13109191
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          8
          down vote













          Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



          import pandas as pd

          df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


          df.loc['Total',:]= df.sum(axis=0)
          df.loc[:,'Total'] = df.sum(axis=1)


          Output:



                        Graduate  Undergraduate  Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500





          share|improve this answer























          • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
            – Wayne Werner
            Nov 21 at 15:20










          • That's weird, I get 4200 as it is supposed to? Maybe a typo?
            – Archie
            Nov 21 at 15:22






          • 5




            @WayneWerner that is because this is an in place operation. It seems you've run it twice
            – piRSquared
            Nov 21 at 15:23










          • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
            – Wayne Werner
            Nov 21 at 15:27


















          up vote
          7
          down vote














          append and assign



          The point of this answer is to provide an in line and not an in place solution.



          append



          I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



          assign



          I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





          df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

          Graduate Undergraduate Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500




          Fun alternative



          Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



          Also, still in line.



          def tc(d):
          return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

          df.pipe(tc).T.pipe(tc).T

          Graduate Undergraduate Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500





          share|improve this answer






























            up vote
            4
            down vote













            From the original data using crosstab, if just base on your input, you just need melt before crosstab



            s=df.reset_index().melt('index')
            pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
            Out[33]:
            variable Graduate Undergraduate All
            index
            Not 440 3760 4200
            Straight A's 60 240 300
            All 500 4000 4500




            Toy data



            df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
            # before `agg`, I think your input is the result after `groupby`
            df
            Out[37]:
            c1 c2 c3
            0 1 2 1
            1 2 2 2
            2 2 3 3
            3 3 3 4
            4 4 3 5


            pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
            =True)
            Out[38]:
            c2 2 3 All
            c1
            1 1.0 NaN 1
            2 2.0 3.0 5
            3 NaN 4.0 4
            4 NaN 5.0 5
            All 3.0 12.0 15





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414960%2fhow-do-i-create-a-sum-row-and-sum-column-in-pandas%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              8
              down vote













              Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



              import pandas as pd

              df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


              df.loc['Total',:]= df.sum(axis=0)
              df.loc[:,'Total'] = df.sum(axis=1)


              Output:



                            Graduate  Undergraduate  Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500





              share|improve this answer























              • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20










              • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22






              • 5




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23










              • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27















              up vote
              8
              down vote













              Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



              import pandas as pd

              df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


              df.loc['Total',:]= df.sum(axis=0)
              df.loc[:,'Total'] = df.sum(axis=1)


              Output:



                            Graduate  Undergraduate  Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500





              share|improve this answer























              • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20










              • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22






              • 5




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23










              • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27













              up vote
              8
              down vote










              up vote
              8
              down vote









              Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



              import pandas as pd

              df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


              df.loc['Total',:]= df.sum(axis=0)
              df.loc[:,'Total'] = df.sum(axis=1)


              Output:



                            Graduate  Undergraduate  Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500





              share|improve this answer














              Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



              import pandas as pd

              df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


              df.loc['Total',:]= df.sum(axis=0)
              df.loc[:,'Total'] = df.sum(axis=1)


              Output:



                            Graduate  Undergraduate  Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 21 at 15:25

























              answered Nov 21 at 15:12









              Archie

              516721




              516721












              • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20










              • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22






              • 5




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23










              • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27


















              • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20










              • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22






              • 5




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23










              • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27
















              Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
              – Wayne Werner
              Nov 21 at 15:20




              Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
              – Wayne Werner
              Nov 21 at 15:20












              That's weird, I get 4200 as it is supposed to? Maybe a typo?
              – Archie
              Nov 21 at 15:22




              That's weird, I get 4200 as it is supposed to? Maybe a typo?
              – Archie
              Nov 21 at 15:22




              5




              5




              @WayneWerner that is because this is an in place operation. It seems you've run it twice
              – piRSquared
              Nov 21 at 15:23




              @WayneWerner that is because this is an in place operation. It seems you've run it twice
              – piRSquared
              Nov 21 at 15:23












              Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
              – Wayne Werner
              Nov 21 at 15:27




              Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
              – Wayne Werner
              Nov 21 at 15:27












              up vote
              7
              down vote














              append and assign



              The point of this answer is to provide an in line and not an in place solution.



              append



              I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



              assign



              I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





              df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

              Graduate Undergraduate Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500




              Fun alternative



              Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



              Also, still in line.



              def tc(d):
              return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

              df.pipe(tc).T.pipe(tc).T

              Graduate Undergraduate Total
              Not 440 3760 4200
              Straight A's 60 240 300
              Total 500 4000 4500





              share|improve this answer



























                up vote
                7
                down vote














                append and assign



                The point of this answer is to provide an in line and not an in place solution.



                append



                I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                assign



                I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                Graduate Undergraduate Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500




                Fun alternative



                Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                Also, still in line.



                def tc(d):
                return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                df.pipe(tc).T.pipe(tc).T

                Graduate Undergraduate Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500





                share|improve this answer

























                  up vote
                  7
                  down vote










                  up vote
                  7
                  down vote










                  append and assign



                  The point of this answer is to provide an in line and not an in place solution.



                  append



                  I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                  assign



                  I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                  df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500




                  Fun alternative



                  Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                  Also, still in line.



                  def tc(d):
                  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                  df.pipe(tc).T.pipe(tc).T

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500





                  share|improve this answer















                  append and assign



                  The point of this answer is to provide an in line and not an in place solution.



                  append



                  I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                  assign



                  I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                  df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500




                  Fun alternative



                  Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                  Also, still in line.



                  def tc(d):
                  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                  df.pipe(tc).T.pipe(tc).T

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 21 at 15:27

























                  answered Nov 21 at 15:09









                  piRSquared

                  149k21135275




                  149k21135275






















                      up vote
                      4
                      down vote













                      From the original data using crosstab, if just base on your input, you just need melt before crosstab



                      s=df.reset_index().melt('index')
                      pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                      Out[33]:
                      variable Graduate Undergraduate All
                      index
                      Not 440 3760 4200
                      Straight A's 60 240 300
                      All 500 4000 4500




                      Toy data



                      df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                      # before `agg`, I think your input is the result after `groupby`
                      df
                      Out[37]:
                      c1 c2 c3
                      0 1 2 1
                      1 2 2 2
                      2 2 3 3
                      3 3 3 4
                      4 4 3 5


                      pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                      =True)
                      Out[38]:
                      c2 2 3 All
                      c1
                      1 1.0 NaN 1
                      2 2.0 3.0 5
                      3 NaN 4.0 4
                      4 NaN 5.0 5
                      All 3.0 12.0 15





                      share|improve this answer



























                        up vote
                        4
                        down vote













                        From the original data using crosstab, if just base on your input, you just need melt before crosstab



                        s=df.reset_index().melt('index')
                        pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                        Out[33]:
                        variable Graduate Undergraduate All
                        index
                        Not 440 3760 4200
                        Straight A's 60 240 300
                        All 500 4000 4500




                        Toy data



                        df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                        # before `agg`, I think your input is the result after `groupby`
                        df
                        Out[37]:
                        c1 c2 c3
                        0 1 2 1
                        1 2 2 2
                        2 2 3 3
                        3 3 3 4
                        4 4 3 5


                        pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                        =True)
                        Out[38]:
                        c2 2 3 All
                        c1
                        1 1.0 NaN 1
                        2 2.0 3.0 5
                        3 NaN 4.0 4
                        4 NaN 5.0 5
                        All 3.0 12.0 15





                        share|improve this answer

























                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          From the original data using crosstab, if just base on your input, you just need melt before crosstab



                          s=df.reset_index().melt('index')
                          pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                          Out[33]:
                          variable Graduate Undergraduate All
                          index
                          Not 440 3760 4200
                          Straight A's 60 240 300
                          All 500 4000 4500




                          Toy data



                          df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                          # before `agg`, I think your input is the result after `groupby`
                          df
                          Out[37]:
                          c1 c2 c3
                          0 1 2 1
                          1 2 2 2
                          2 2 3 3
                          3 3 3 4
                          4 4 3 5


                          pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                          =True)
                          Out[38]:
                          c2 2 3 All
                          c1
                          1 1.0 NaN 1
                          2 2.0 3.0 5
                          3 NaN 4.0 4
                          4 NaN 5.0 5
                          All 3.0 12.0 15





                          share|improve this answer














                          From the original data using crosstab, if just base on your input, you just need melt before crosstab



                          s=df.reset_index().melt('index')
                          pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                          Out[33]:
                          variable Graduate Undergraduate All
                          index
                          Not 440 3760 4200
                          Straight A's 60 240 300
                          All 500 4000 4500




                          Toy data



                          df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                          # before `agg`, I think your input is the result after `groupby`
                          df
                          Out[37]:
                          c1 c2 c3
                          0 1 2 1
                          1 2 2 2
                          2 2 3 3
                          3 3 3 4
                          4 4 3 5


                          pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                          =True)
                          Out[38]:
                          c2 2 3 All
                          c1
                          1 1.0 NaN 1
                          2 2.0 3.0 5
                          3 NaN 4.0 4
                          4 NaN 5.0 5
                          All 3.0 12.0 15






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 21 at 15:21

























                          answered Nov 21 at 15:16









                          W-B

                          93.4k72755




                          93.4k72755






























                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414960%2fhow-do-i-create-a-sum-row-and-sum-column-in-pandas%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Morgemoulin

                              Scott Moir

                              Souastre