Count appearances of a value until it changes to another value











up vote
7
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    2 days ago















up vote
7
down vote

favorite
1












I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question
























  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    2 days ago













up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?










share|improve this question















I have the following DataFrame:



df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])


I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.



I tried:



df['values'].value_counts()


but it gives me



10    6
9 3
23 2
12 1


The desired output is



10:2 
23:2
9:3
10:4
12:1


How can I do this?







python pandas count frequency






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Alex Riley

75.5k20155159




75.5k20155159










asked 2 days ago









Mischa

666




666












  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    2 days ago


















  • You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
    – Buhb
    2 days ago
















You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
2 days ago




You might want to have a look at "run-length encoding", since that's basically what you want to be doing.
– Buhb
2 days ago












5 Answers
5






active

oldest

votes

















up vote
12
down vote













Use:



df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


Or:



df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




print (df)
values values
1 10 2
2 23 2
3 9 3
4 10 4
5 12 1
Name: values, dtype: int64


Last for remove first level:



df = df.reset_index(level=0, drop=True)
print (df)
values
10 2
23 2
9 3
10 4
12 1
dtype: int64


Explanation:



Compare original column by shifted with not equal ne and then add cumsum for helper Series:



print (pd.concat([df['values'], a, b, c], 
keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
orig shifted not_equal cumsum
0 10 NaN True 1
1 10 10.0 False 1
2 23 10.0 True 2
3 23 23.0 False 2
4 9 23.0 True 3
5 9 9.0 False 3
6 9 9.0 False 3
7 10 9.0 True 4
8 10 10.0 False 4
9 10 10.0 False 4
10 10 10.0 False 4
11 12 10.0 True 5





share|improve this answer























  • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
    – Mischa
    2 days ago






  • 1




    @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
    – jezrael
    2 days ago










  • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
    – RavinderSingh13
    yesterday


















up vote
5
down vote













You can keep track of where the changes in df['values'] occur:



changes = df['values'].diff().ne(0).cumsum()
print(changes)

0 1
1 1
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 4
10 4
11 5


And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

values
10 2
23 2
9 3
10 4
12 1
dtype: int64





share|improve this answer






























    up vote
    5
    down vote













    itertools.groupby



    from itertools import groupby

    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64




    It's a generator



    def f(x):
    count = 1
    for this, that in zip(x, x[1:]):
    if this == that:
    count += 1
    else:
    yield count, this
    count = 1
    yield count, [*x][-1]

    pd.Series(*zip(*f(df['values'])))

    10 2
    23 2
    9 3
    10 4
    12 1
    dtype: int64





    share|improve this answer






























      up vote
      4
      down vote













      Using crosstab



      df['key']=df['values'].diff().ne(0).cumsum()
      pd.crosstab(df['key'],df['values'])
      Out[353]:
      values 9 10 12 23
      key
      1 0 2 0 0
      2 0 0 0 2
      3 3 0 0 0
      4 0 4 0 0
      5 0 0 1 0


      Slightly modify the result above



      pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
      Out[355]:
      key values
      1 10 2
      2 23 2
      3 9 3
      4 10 4
      5 12 1
      dtype: int64




      Base on python groupby



      from itertools import groupby

      [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
      Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





      share|improve this answer






























        up vote
        0
        down vote













        This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



        import pandas as pd

        df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

        dict_count = {}
        for v in df['values'].unique():
        dict_count[v] = 0

        curr_val = df.iloc[0]['values']
        count = 1
        for i in range(1, len(df)):
        if df.iloc[i]['values'] == curr_val:
        count += 1
        else:
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count
        curr_val = df.iloc[i]['values']
        count = 1
        if count > dict_count[curr_val]:
        dict_count[curr_val] = count

        df_count = pd.DataFrame(dict_count, index=[0])
        print(df_count)





        share|improve this answer





















          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          12
          down vote













          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer























          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            2 days ago






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            2 days ago










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            yesterday















          up vote
          12
          down vote













          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer























          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            2 days ago






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            2 days ago










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            yesterday













          up vote
          12
          down vote










          up vote
          12
          down vote









          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5





          share|improve this answer














          Use:



          df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()


          Or:



          df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()




          print (df)
          values values
          1 10 2
          2 23 2
          3 9 3
          4 10 4
          5 12 1
          Name: values, dtype: int64


          Last for remove first level:



          df = df.reset_index(level=0, drop=True)
          print (df)
          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64


          Explanation:



          Compare original column by shifted with not equal ne and then add cumsum for helper Series:



          print (pd.concat([df['values'], a, b, c], 
          keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
          orig shifted not_equal cumsum
          0 10 NaN True 1
          1 10 10.0 False 1
          2 23 10.0 True 2
          3 23 23.0 False 2
          4 9 23.0 True 3
          5 9 9.0 False 3
          6 9 9.0 False 3
          7 10 9.0 True 4
          8 10 10.0 False 4
          9 10 10.0 False 4
          10 10 10.0 False 4
          11 12 10.0 True 5






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered 2 days ago









          jezrael

          311k21247322




          311k21247322












          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            2 days ago






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            2 days ago










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            yesterday


















          • i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
            – Mischa
            2 days ago






          • 1




            @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
            – jezrael
            2 days ago










          • @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
            – RavinderSingh13
            yesterday
















          i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
          – Mischa
          2 days ago




          i got an error : Duplicated level name: "values", assigned to level 1, is already used for level 0.
          – Mischa
          2 days ago




          1




          1




          @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
          – jezrael
          2 days ago




          @Mischa - Then add .rename like df['values'].ne(df['values'].shift()).cumsum().rename('val1')
          – jezrael
          2 days ago












          @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
          – RavinderSingh13
          yesterday




          @jezrael, ++ve for nice code sir, could you please explain it by dividing it into parts df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size() as it is not clear, will be grateful to you.
          – RavinderSingh13
          yesterday












          up vote
          5
          down vote













          You can keep track of where the changes in df['values'] occur:



          changes = df['values'].diff().ne(0).cumsum()
          print(changes)

          0 1
          1 1
          2 2
          3 2
          4 3
          5 3
          6 3
          7 4
          8 4
          9 4
          10 4
          11 5


          And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



          df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

          values
          10 2
          23 2
          9 3
          10 4
          12 1
          dtype: int64





          share|improve this answer



























            up vote
            5
            down vote













            You can keep track of where the changes in df['values'] occur:



            changes = df['values'].diff().ne(0).cumsum()
            print(changes)

            0 1
            1 1
            2 2
            3 2
            4 3
            5 3
            6 3
            7 4
            8 4
            9 4
            10 4
            11 5


            And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



            df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

            values
            10 2
            23 2
            9 3
            10 4
            12 1
            dtype: int64





            share|improve this answer

























              up vote
              5
              down vote










              up vote
              5
              down vote









              You can keep track of where the changes in df['values'] occur:



              changes = df['values'].diff().ne(0).cumsum()
              print(changes)

              0 1
              1 1
              2 2
              3 2
              4 3
              5 3
              6 3
              7 4
              8 4
              9 4
              10 4
              11 5


              And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



              df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

              values
              10 2
              23 2
              9 3
              10 4
              12 1
              dtype: int64





              share|improve this answer














              You can keep track of where the changes in df['values'] occur:



              changes = df['values'].diff().ne(0).cumsum()
              print(changes)

              0 1
              1 1
              2 2
              3 2
              4 3
              5 3
              6 3
              7 4
              8 4
              9 4
              10 4
              11 5


              And groupby the changes and also df['values'] (to keep them as index) computing the size of each group



              df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

              values
              10 2
              23 2
              9 3
              10 4
              12 1
              dtype: int64






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 2 days ago

























              answered 2 days ago









              nixon

              1,42316




              1,42316






















                  up vote
                  5
                  down vote













                  itertools.groupby



                  from itertools import groupby

                  pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                  10 2
                  23 2
                  9 3
                  10 4
                  12 1
                  dtype: int64




                  It's a generator



                  def f(x):
                  count = 1
                  for this, that in zip(x, x[1:]):
                  if this == that:
                  count += 1
                  else:
                  yield count, this
                  count = 1
                  yield count, [*x][-1]

                  pd.Series(*zip(*f(df['values'])))

                  10 2
                  23 2
                  9 3
                  10 4
                  12 1
                  dtype: int64





                  share|improve this answer



























                    up vote
                    5
                    down vote













                    itertools.groupby



                    from itertools import groupby

                    pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64




                    It's a generator



                    def f(x):
                    count = 1
                    for this, that in zip(x, x[1:]):
                    if this == that:
                    count += 1
                    else:
                    yield count, this
                    count = 1
                    yield count, [*x][-1]

                    pd.Series(*zip(*f(df['values'])))

                    10 2
                    23 2
                    9 3
                    10 4
                    12 1
                    dtype: int64





                    share|improve this answer

























                      up vote
                      5
                      down vote










                      up vote
                      5
                      down vote









                      itertools.groupby



                      from itertools import groupby

                      pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64




                      It's a generator



                      def f(x):
                      count = 1
                      for this, that in zip(x, x[1:]):
                      if this == that:
                      count += 1
                      else:
                      yield count, this
                      count = 1
                      yield count, [*x][-1]

                      pd.Series(*zip(*f(df['values'])))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64





                      share|improve this answer














                      itertools.groupby



                      from itertools import groupby

                      pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64




                      It's a generator



                      def f(x):
                      count = 1
                      for this, that in zip(x, x[1:]):
                      if this == that:
                      count += 1
                      else:
                      yield count, this
                      count = 1
                      yield count, [*x][-1]

                      pd.Series(*zip(*f(df['values'])))

                      10 2
                      23 2
                      9 3
                      10 4
                      12 1
                      dtype: int64






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 2 days ago

























                      answered 2 days ago









                      piRSquared

                      150k21135277




                      150k21135277






















                          up vote
                          4
                          down vote













                          Using crosstab



                          df['key']=df['values'].diff().ne(0).cumsum()
                          pd.crosstab(df['key'],df['values'])
                          Out[353]:
                          values 9 10 12 23
                          key
                          1 0 2 0 0
                          2 0 0 0 2
                          3 3 0 0 0
                          4 0 4 0 0
                          5 0 0 1 0


                          Slightly modify the result above



                          pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                          Out[355]:
                          key values
                          1 10 2
                          2 23 2
                          3 9 3
                          4 10 4
                          5 12 1
                          dtype: int64




                          Base on python groupby



                          from itertools import groupby

                          [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                          Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                          share|improve this answer



























                            up vote
                            4
                            down vote













                            Using crosstab



                            df['key']=df['values'].diff().ne(0).cumsum()
                            pd.crosstab(df['key'],df['values'])
                            Out[353]:
                            values 9 10 12 23
                            key
                            1 0 2 0 0
                            2 0 0 0 2
                            3 3 0 0 0
                            4 0 4 0 0
                            5 0 0 1 0


                            Slightly modify the result above



                            pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                            Out[355]:
                            key values
                            1 10 2
                            2 23 2
                            3 9 3
                            4 10 4
                            5 12 1
                            dtype: int64




                            Base on python groupby



                            from itertools import groupby

                            [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                            Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                            share|improve this answer

























                              up vote
                              4
                              down vote










                              up vote
                              4
                              down vote









                              Using crosstab



                              df['key']=df['values'].diff().ne(0).cumsum()
                              pd.crosstab(df['key'],df['values'])
                              Out[353]:
                              values 9 10 12 23
                              key
                              1 0 2 0 0
                              2 0 0 0 2
                              3 3 0 0 0
                              4 0 4 0 0
                              5 0 0 1 0


                              Slightly modify the result above



                              pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                              Out[355]:
                              key values
                              1 10 2
                              2 23 2
                              3 9 3
                              4 10 4
                              5 12 1
                              dtype: int64




                              Base on python groupby



                              from itertools import groupby

                              [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                              Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]





                              share|improve this answer














                              Using crosstab



                              df['key']=df['values'].diff().ne(0).cumsum()
                              pd.crosstab(df['key'],df['values'])
                              Out[353]:
                              values 9 10 12 23
                              key
                              1 0 2 0 0
                              2 0 0 0 2
                              3 3 0 0 0
                              4 0 4 0 0
                              5 0 0 1 0


                              Slightly modify the result above



                              pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
                              Out[355]:
                              key values
                              1 10 2
                              2 23 2
                              3 9 3
                              4 10 4
                              5 12 1
                              dtype: int64




                              Base on python groupby



                              from itertools import groupby

                              [ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
                              Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited 2 days ago

























                              answered 2 days ago









                              W-B

                              95k72860




                              95k72860






















                                  up vote
                                  0
                                  down vote













                                  This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                  import pandas as pd

                                  df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                  dict_count = {}
                                  for v in df['values'].unique():
                                  dict_count[v] = 0

                                  curr_val = df.iloc[0]['values']
                                  count = 1
                                  for i in range(1, len(df)):
                                  if df.iloc[i]['values'] == curr_val:
                                  count += 1
                                  else:
                                  if count > dict_count[curr_val]:
                                  dict_count[curr_val] = count
                                  curr_val = df.iloc[i]['values']
                                  count = 1
                                  if count > dict_count[curr_val]:
                                  dict_count[curr_val] = count

                                  df_count = pd.DataFrame(dict_count, index=[0])
                                  print(df_count)





                                  share|improve this answer

























                                    up vote
                                    0
                                    down vote













                                    This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                    import pandas as pd

                                    df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                    dict_count = {}
                                    for v in df['values'].unique():
                                    dict_count[v] = 0

                                    curr_val = df.iloc[0]['values']
                                    count = 1
                                    for i in range(1, len(df)):
                                    if df.iloc[i]['values'] == curr_val:
                                    count += 1
                                    else:
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count
                                    curr_val = df.iloc[i]['values']
                                    count = 1
                                    if count > dict_count[curr_val]:
                                    dict_count[curr_val] = count

                                    df_count = pd.DataFrame(dict_count, index=[0])
                                    print(df_count)





                                    share|improve this answer























                                      up vote
                                      0
                                      down vote










                                      up vote
                                      0
                                      down vote









                                      This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                      import pandas as pd

                                      df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                      dict_count = {}
                                      for v in df['values'].unique():
                                      dict_count[v] = 0

                                      curr_val = df.iloc[0]['values']
                                      count = 1
                                      for i in range(1, len(df)):
                                      if df.iloc[i]['values'] == curr_val:
                                      count += 1
                                      else:
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count
                                      curr_val = df.iloc[i]['values']
                                      count = 1
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count

                                      df_count = pd.DataFrame(dict_count, index=[0])
                                      print(df_count)





                                      share|improve this answer












                                      This is far from the most time/memory efficient method that in this thread but here's an iterative approach that is pretty straightforward. Please feel encouraged to suggest improvements on this method.



                                      import pandas as pd

                                      df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

                                      dict_count = {}
                                      for v in df['values'].unique():
                                      dict_count[v] = 0

                                      curr_val = df.iloc[0]['values']
                                      count = 1
                                      for i in range(1, len(df)):
                                      if df.iloc[i]['values'] == curr_val:
                                      count += 1
                                      else:
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count
                                      curr_val = df.iloc[i]['values']
                                      count = 1
                                      if count > dict_count[curr_val]:
                                      dict_count[curr_val] = count

                                      df_count = pd.DataFrame(dict_count, index=[0])
                                      print(df_count)






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered yesterday









                                      UBears

                                      104111




                                      104111






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.





                                          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                          Please pay close attention to the following guidance:


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53542668%2fcount-appearances-of-a-value-until-it-changes-to-another-value%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          List directoties down one level, excluding some named directories and files

                                          list processes belonging to a network namespace

                                          list systemd RuntimeDirectory mounts