What is the best C++ data structure that could be used for storing and managing a collection of integers?












16














this is my first StackOverflow question so please let me know if I didn't follow community guidelines with this question and if I should delete it.



I got my first ever interview question and I got rejected because of my implementation.



The question is:



Design and implement a C++ class that stores a collection of integers. On construction, the collection should be empty. The same number may be stored more than once.



Implement the following methods:




  1. Insert(int x). Insert an entry for the value “x”.


  2. Erase(int x). Remove one entry with the value “x” (if one exists) from the collection.


  3. Erase(int from, int to). Remove all the entries with a value in the range [from, to).


  4. Count(int from, int to). Count how many entries have a value in the range [from, to).



I thought a good implementation would be to use linked lists since it uses non-contiguous memory and removing entries would not require shuffling a lot of data (like in vectors or arrays). However, I got feedback from the company saying my implementation was O(n^2) time complexity and was very inefficient so I was rejected. I don't want to repeat the same mistake if a similar question pops up in another interview so I'd like to know what is the best way to approach this question (a friend suggested using maps but he is also unsure).



My code is:



void IntegerCollector::insert(int x)
{
entries.push_back(x);
}

void IntegerCollector::erase(int x)
{
list<int>::iterator position = find(entries.begin(), entries.end(), x);
if (position != entries.end())
entries.erase(position);
}

void IntegerCollector::erase(int from, int to)
{
list<int>::iterator position = entries.begin();

while (position != entries.end())
{
if (*position >= from && *position <= to)
position = entries.erase(position);
else
position++;
}
}

int IntegerCollector::count(int from, int to)
{
list<int>::iterator position = entries.begin();
int count = 0;

while (position != entries.end())
{
if (*position >= from && *position <= to)
count++;

position++;
}

return count;
}


The feedback mentioned that they would only hire candidates that can implement solutions with O(nlogn) complexity.










share|improve this question






















  • FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
    – NathanOliver
    Dec 17 at 14:20






  • 5




    Yours appears to be O(n) which is better than O(nlogn)
    – Ben
    Dec 17 at 14:22






  • 1




    @ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
    – 0x499602D2
    Dec 17 at 14:26








  • 1




    This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
    – Drew Dormann
    Dec 17 at 14:35








  • 1




    I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
    – Ben
    Dec 17 at 14:35


















16














this is my first StackOverflow question so please let me know if I didn't follow community guidelines with this question and if I should delete it.



I got my first ever interview question and I got rejected because of my implementation.



The question is:



Design and implement a C++ class that stores a collection of integers. On construction, the collection should be empty. The same number may be stored more than once.



Implement the following methods:




  1. Insert(int x). Insert an entry for the value “x”.


  2. Erase(int x). Remove one entry with the value “x” (if one exists) from the collection.


  3. Erase(int from, int to). Remove all the entries with a value in the range [from, to).


  4. Count(int from, int to). Count how many entries have a value in the range [from, to).



I thought a good implementation would be to use linked lists since it uses non-contiguous memory and removing entries would not require shuffling a lot of data (like in vectors or arrays). However, I got feedback from the company saying my implementation was O(n^2) time complexity and was very inefficient so I was rejected. I don't want to repeat the same mistake if a similar question pops up in another interview so I'd like to know what is the best way to approach this question (a friend suggested using maps but he is also unsure).



My code is:



void IntegerCollector::insert(int x)
{
entries.push_back(x);
}

void IntegerCollector::erase(int x)
{
list<int>::iterator position = find(entries.begin(), entries.end(), x);
if (position != entries.end())
entries.erase(position);
}

void IntegerCollector::erase(int from, int to)
{
list<int>::iterator position = entries.begin();

while (position != entries.end())
{
if (*position >= from && *position <= to)
position = entries.erase(position);
else
position++;
}
}

int IntegerCollector::count(int from, int to)
{
list<int>::iterator position = entries.begin();
int count = 0;

while (position != entries.end())
{
if (*position >= from && *position <= to)
count++;

position++;
}

return count;
}


The feedback mentioned that they would only hire candidates that can implement solutions with O(nlogn) complexity.










share|improve this question






















  • FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
    – NathanOliver
    Dec 17 at 14:20






  • 5




    Yours appears to be O(n) which is better than O(nlogn)
    – Ben
    Dec 17 at 14:22






  • 1




    @ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
    – 0x499602D2
    Dec 17 at 14:26








  • 1




    This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
    – Drew Dormann
    Dec 17 at 14:35








  • 1




    I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
    – Ben
    Dec 17 at 14:35
















16












16








16


5





this is my first StackOverflow question so please let me know if I didn't follow community guidelines with this question and if I should delete it.



I got my first ever interview question and I got rejected because of my implementation.



The question is:



Design and implement a C++ class that stores a collection of integers. On construction, the collection should be empty. The same number may be stored more than once.



Implement the following methods:




  1. Insert(int x). Insert an entry for the value “x”.


  2. Erase(int x). Remove one entry with the value “x” (if one exists) from the collection.


  3. Erase(int from, int to). Remove all the entries with a value in the range [from, to).


  4. Count(int from, int to). Count how many entries have a value in the range [from, to).



I thought a good implementation would be to use linked lists since it uses non-contiguous memory and removing entries would not require shuffling a lot of data (like in vectors or arrays). However, I got feedback from the company saying my implementation was O(n^2) time complexity and was very inefficient so I was rejected. I don't want to repeat the same mistake if a similar question pops up in another interview so I'd like to know what is the best way to approach this question (a friend suggested using maps but he is also unsure).



My code is:



void IntegerCollector::insert(int x)
{
entries.push_back(x);
}

void IntegerCollector::erase(int x)
{
list<int>::iterator position = find(entries.begin(), entries.end(), x);
if (position != entries.end())
entries.erase(position);
}

void IntegerCollector::erase(int from, int to)
{
list<int>::iterator position = entries.begin();

while (position != entries.end())
{
if (*position >= from && *position <= to)
position = entries.erase(position);
else
position++;
}
}

int IntegerCollector::count(int from, int to)
{
list<int>::iterator position = entries.begin();
int count = 0;

while (position != entries.end())
{
if (*position >= from && *position <= to)
count++;

position++;
}

return count;
}


The feedback mentioned that they would only hire candidates that can implement solutions with O(nlogn) complexity.










share|improve this question













this is my first StackOverflow question so please let me know if I didn't follow community guidelines with this question and if I should delete it.



I got my first ever interview question and I got rejected because of my implementation.



The question is:



Design and implement a C++ class that stores a collection of integers. On construction, the collection should be empty. The same number may be stored more than once.



Implement the following methods:




  1. Insert(int x). Insert an entry for the value “x”.


  2. Erase(int x). Remove one entry with the value “x” (if one exists) from the collection.


  3. Erase(int from, int to). Remove all the entries with a value in the range [from, to).


  4. Count(int from, int to). Count how many entries have a value in the range [from, to).



I thought a good implementation would be to use linked lists since it uses non-contiguous memory and removing entries would not require shuffling a lot of data (like in vectors or arrays). However, I got feedback from the company saying my implementation was O(n^2) time complexity and was very inefficient so I was rejected. I don't want to repeat the same mistake if a similar question pops up in another interview so I'd like to know what is the best way to approach this question (a friend suggested using maps but he is also unsure).



My code is:



void IntegerCollector::insert(int x)
{
entries.push_back(x);
}

void IntegerCollector::erase(int x)
{
list<int>::iterator position = find(entries.begin(), entries.end(), x);
if (position != entries.end())
entries.erase(position);
}

void IntegerCollector::erase(int from, int to)
{
list<int>::iterator position = entries.begin();

while (position != entries.end())
{
if (*position >= from && *position <= to)
position = entries.erase(position);
else
position++;
}
}

int IntegerCollector::count(int from, int to)
{
list<int>::iterator position = entries.begin();
int count = 0;

while (position != entries.end())
{
if (*position >= from && *position <= to)
count++;

position++;
}

return count;
}


The feedback mentioned that they would only hire candidates that can implement solutions with O(nlogn) complexity.







c++ collections integer






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 17 at 14:15









Muhamad Gafar

865




865












  • FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
    – NathanOliver
    Dec 17 at 14:20






  • 5




    Yours appears to be O(n) which is better than O(nlogn)
    – Ben
    Dec 17 at 14:22






  • 1




    @ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
    – 0x499602D2
    Dec 17 at 14:26








  • 1




    This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
    – Drew Dormann
    Dec 17 at 14:35








  • 1




    I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
    – Ben
    Dec 17 at 14:35




















  • FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
    – NathanOliver
    Dec 17 at 14:20






  • 5




    Yours appears to be O(n) which is better than O(nlogn)
    – Ben
    Dec 17 at 14:22






  • 1




    @ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
    – 0x499602D2
    Dec 17 at 14:26








  • 1




    This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
    – Drew Dormann
    Dec 17 at 14:35








  • 1




    I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
    – Ben
    Dec 17 at 14:35


















FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
– NathanOliver
Dec 17 at 14:20




FWIW, std::list is normally not the data structure you want to use. For the use case you have described a std::multiset sounds like what you want. Almost all the operation will be O(logN) which isn't to bad.
– NathanOliver
Dec 17 at 14:20




5




5




Yours appears to be O(n) which is better than O(nlogn)
– Ben
Dec 17 at 14:22




Yours appears to be O(n) which is better than O(nlogn)
– Ben
Dec 17 at 14:22




1




1




@ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
– 0x499602D2
Dec 17 at 14:26






@ben Yes but imagine he has an IntegerCollection of n items and he wants to erase all of that by calling erase for each element. That's a linear operation for each element which makes the time complexity quadratic.
– 0x499602D2
Dec 17 at 14:26






1




1




This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
– Drew Dormann
Dec 17 at 14:35






This is a well-composed question, but I suspect that it is primarily opinion based. I'm not sure any of us could tell you what implementation your interviewer would have considered "best".
– Drew Dormann
Dec 17 at 14:35






1




1




I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
– Ben
Dec 17 at 14:35






I mean, big O is misleading here because it's a toy example, but the point is I suspect that the answer was rejected because it's naïve rather than because it's actually wrong as such. using a list to store integers is inefficient in terms of memory and in modern architectures that's likely to dominate the big O.
– Ben
Dec 17 at 14:35














3 Answers
3






active

oldest

votes


















21














The key consideration here is that integers of the same value are indistinguishable. Thus, all you need to do is store a count of each distinct value in the container.



Then, you can just use a std::map<int, size_t> as backing structure that maps each integer (key) to the number of times it exists in your data structure (value = count).



Inserting and erasing single elements is just incrementing and decrementing (possibly removing in the latter case) values for the given key (both O(log(distinct_values_in_container)) for finding the key).



Since std::map is ordered, you can use lower_bound and upper_bound to do binary search, so finding the keys in [from, to) is very efficient (finding the range is also O(log(distinct_values_in_container))). Erasing them or summing their counts is easy then (runtime is more complicated here).





If you want to gain extra credit, it will pay to understand the limitations of asymptotic runtimes. Consider these points:



What these asymptotic runtimes mean in practice depends a lot on the usage pattern. If no duplicates are ever inserted, we are at O(n), but you can also get arbitrarily good times (in terms of n = number of insertions) if there are lots of identical elements (for example, if each key has O(exp(n)) values then O(distinct_values_in_container) = O(log(n))). In the extreme case that all involved integers are the same, all operations are O(1).



As an interviewee, I would also talk about whether these asymptotic runtimes are meaningful in practice. It may be that the map's tree structure (which is toxic for the cache and branch predictor) loses to a simple std::vector<std::pair<int, size_t>> (if erasure is always in bulk) or even a std::vector<size_t> (if the keys are "dense") for the given application.





I think your main mistake (and why you were rejected) is not realizing that there is no need to store each inserted integer separately. You unfortunately also seem to have missed the possibility of keeping the list sorted, but I don't see where the O(n^2) comes from.






share|improve this answer































    10














    If you were being hired for a role that didn't require any previous programming experience then I would not have rejected you on that code sample alone.



    Using a std::list was an interesting choice and showed you had thought about this. The fact that you used a C++ standard library container rather than trying to build this from a lower level is a yes-hire flag for me. With your approach (1) is fast, but (2), (3), and (4) will be slow. In the absence of any other information you ought to arrange things so that reading (including querying) data is faster than writing. Your approach has this the other way round. Sometimes though that is what you want - for example when taking measurements real-time you’d want the data dump stage to be as fast as possible at the expense of anything else. For that application your solution would be difficult to beat!



    Reservations, but by no means red lines:



    An integer does not mean an int. In the absence of being able to clarify, build your class on



    template<typename Y> std::map<Y, std::size_t>


    where Y is an integral type. Note the use of std::size_t for the counter. It counts the number of times a particular Y is present.



    Include some program comments next time.



    Don't use using namespace std;. Although tutorials do for clarity, professional programmers don't.






    share|improve this answer























    • Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
      – Muhamad Gafar
      Dec 17 at 15:42








    • 3




      Also, see this answer for why using namespace std; is bad practice.
      – CK.
      Dec 17 at 16:00










    • Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
      – TechnoSam
      Dec 17 at 16:23






    • 1




      @TechnoSam it says 'hire flag', that's a good thing not a bad.
      – Jared Smith
      Dec 17 at 16:31










    • Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
      – Bathsheba
      Dec 17 at 17:15





















    4














    You should use map<int,size_t> the int is for value, the size_t is for count.



    If you need to implement the code, you should choose a Balanced Binary Tree to get to 'log N' complexity.



    so you have the following node:



    struct Node
    {
    int key;
    size_t count;
    Node *left, *right;
    size_t leftNodesCount, rightNodesCount;
    };


    leftNodesCount and rightNodesCount are ment to indicate how good is the balance, so any insertion and deletion is changing it all the way to the root. a balanced tree is when all over the tree leftNodesCount and rightNodesCount are almost equal (mean difference is not more than 1. but you can set the tolerance to some higher value like 2 or 3)



    Now you should implement Insert, Delete, and Balance methods.



    To balance a Balanced Tree, you should rotate the unbalanced nodes, Rotate left means replace node by the node's right, and add the node to left, Rotate right is the same operation in the other direction.



    Balance complicity is also 'log N'. note that after insertions and deletions you should call to balance in manner to keep the complicity of the tree something about 'log N'






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53817062%2fwhat-is-the-best-c-data-structure-that-could-be-used-for-storing-and-managing%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      21














      The key consideration here is that integers of the same value are indistinguishable. Thus, all you need to do is store a count of each distinct value in the container.



      Then, you can just use a std::map<int, size_t> as backing structure that maps each integer (key) to the number of times it exists in your data structure (value = count).



      Inserting and erasing single elements is just incrementing and decrementing (possibly removing in the latter case) values for the given key (both O(log(distinct_values_in_container)) for finding the key).



      Since std::map is ordered, you can use lower_bound and upper_bound to do binary search, so finding the keys in [from, to) is very efficient (finding the range is also O(log(distinct_values_in_container))). Erasing them or summing their counts is easy then (runtime is more complicated here).





      If you want to gain extra credit, it will pay to understand the limitations of asymptotic runtimes. Consider these points:



      What these asymptotic runtimes mean in practice depends a lot on the usage pattern. If no duplicates are ever inserted, we are at O(n), but you can also get arbitrarily good times (in terms of n = number of insertions) if there are lots of identical elements (for example, if each key has O(exp(n)) values then O(distinct_values_in_container) = O(log(n))). In the extreme case that all involved integers are the same, all operations are O(1).



      As an interviewee, I would also talk about whether these asymptotic runtimes are meaningful in practice. It may be that the map's tree structure (which is toxic for the cache and branch predictor) loses to a simple std::vector<std::pair<int, size_t>> (if erasure is always in bulk) or even a std::vector<size_t> (if the keys are "dense") for the given application.





      I think your main mistake (and why you were rejected) is not realizing that there is no need to store each inserted integer separately. You unfortunately also seem to have missed the possibility of keeping the list sorted, but I don't see where the O(n^2) comes from.






      share|improve this answer




























        21














        The key consideration here is that integers of the same value are indistinguishable. Thus, all you need to do is store a count of each distinct value in the container.



        Then, you can just use a std::map<int, size_t> as backing structure that maps each integer (key) to the number of times it exists in your data structure (value = count).



        Inserting and erasing single elements is just incrementing and decrementing (possibly removing in the latter case) values for the given key (both O(log(distinct_values_in_container)) for finding the key).



        Since std::map is ordered, you can use lower_bound and upper_bound to do binary search, so finding the keys in [from, to) is very efficient (finding the range is also O(log(distinct_values_in_container))). Erasing them or summing their counts is easy then (runtime is more complicated here).





        If you want to gain extra credit, it will pay to understand the limitations of asymptotic runtimes. Consider these points:



        What these asymptotic runtimes mean in practice depends a lot on the usage pattern. If no duplicates are ever inserted, we are at O(n), but you can also get arbitrarily good times (in terms of n = number of insertions) if there are lots of identical elements (for example, if each key has O(exp(n)) values then O(distinct_values_in_container) = O(log(n))). In the extreme case that all involved integers are the same, all operations are O(1).



        As an interviewee, I would also talk about whether these asymptotic runtimes are meaningful in practice. It may be that the map's tree structure (which is toxic for the cache and branch predictor) loses to a simple std::vector<std::pair<int, size_t>> (if erasure is always in bulk) or even a std::vector<size_t> (if the keys are "dense") for the given application.





        I think your main mistake (and why you were rejected) is not realizing that there is no need to store each inserted integer separately. You unfortunately also seem to have missed the possibility of keeping the list sorted, but I don't see where the O(n^2) comes from.






        share|improve this answer


























          21












          21








          21






          The key consideration here is that integers of the same value are indistinguishable. Thus, all you need to do is store a count of each distinct value in the container.



          Then, you can just use a std::map<int, size_t> as backing structure that maps each integer (key) to the number of times it exists in your data structure (value = count).



          Inserting and erasing single elements is just incrementing and decrementing (possibly removing in the latter case) values for the given key (both O(log(distinct_values_in_container)) for finding the key).



          Since std::map is ordered, you can use lower_bound and upper_bound to do binary search, so finding the keys in [from, to) is very efficient (finding the range is also O(log(distinct_values_in_container))). Erasing them or summing their counts is easy then (runtime is more complicated here).





          If you want to gain extra credit, it will pay to understand the limitations of asymptotic runtimes. Consider these points:



          What these asymptotic runtimes mean in practice depends a lot on the usage pattern. If no duplicates are ever inserted, we are at O(n), but you can also get arbitrarily good times (in terms of n = number of insertions) if there are lots of identical elements (for example, if each key has O(exp(n)) values then O(distinct_values_in_container) = O(log(n))). In the extreme case that all involved integers are the same, all operations are O(1).



          As an interviewee, I would also talk about whether these asymptotic runtimes are meaningful in practice. It may be that the map's tree structure (which is toxic for the cache and branch predictor) loses to a simple std::vector<std::pair<int, size_t>> (if erasure is always in bulk) or even a std::vector<size_t> (if the keys are "dense") for the given application.





          I think your main mistake (and why you were rejected) is not realizing that there is no need to store each inserted integer separately. You unfortunately also seem to have missed the possibility of keeping the list sorted, but I don't see where the O(n^2) comes from.






          share|improve this answer














          The key consideration here is that integers of the same value are indistinguishable. Thus, all you need to do is store a count of each distinct value in the container.



          Then, you can just use a std::map<int, size_t> as backing structure that maps each integer (key) to the number of times it exists in your data structure (value = count).



          Inserting and erasing single elements is just incrementing and decrementing (possibly removing in the latter case) values for the given key (both O(log(distinct_values_in_container)) for finding the key).



          Since std::map is ordered, you can use lower_bound and upper_bound to do binary search, so finding the keys in [from, to) is very efficient (finding the range is also O(log(distinct_values_in_container))). Erasing them or summing their counts is easy then (runtime is more complicated here).





          If you want to gain extra credit, it will pay to understand the limitations of asymptotic runtimes. Consider these points:



          What these asymptotic runtimes mean in practice depends a lot on the usage pattern. If no duplicates are ever inserted, we are at O(n), but you can also get arbitrarily good times (in terms of n = number of insertions) if there are lots of identical elements (for example, if each key has O(exp(n)) values then O(distinct_values_in_container) = O(log(n))). In the extreme case that all involved integers are the same, all operations are O(1).



          As an interviewee, I would also talk about whether these asymptotic runtimes are meaningful in practice. It may be that the map's tree structure (which is toxic for the cache and branch predictor) loses to a simple std::vector<std::pair<int, size_t>> (if erasure is always in bulk) or even a std::vector<size_t> (if the keys are "dense") for the given application.





          I think your main mistake (and why you were rejected) is not realizing that there is no need to store each inserted integer separately. You unfortunately also seem to have missed the possibility of keeping the list sorted, but I don't see where the O(n^2) comes from.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 19 at 5:41









          T.C.

          106k13216321




          106k13216321










          answered Dec 17 at 14:40









          Max Langhof

          8,8071436




          8,8071436

























              10














              If you were being hired for a role that didn't require any previous programming experience then I would not have rejected you on that code sample alone.



              Using a std::list was an interesting choice and showed you had thought about this. The fact that you used a C++ standard library container rather than trying to build this from a lower level is a yes-hire flag for me. With your approach (1) is fast, but (2), (3), and (4) will be slow. In the absence of any other information you ought to arrange things so that reading (including querying) data is faster than writing. Your approach has this the other way round. Sometimes though that is what you want - for example when taking measurements real-time you’d want the data dump stage to be as fast as possible at the expense of anything else. For that application your solution would be difficult to beat!



              Reservations, but by no means red lines:



              An integer does not mean an int. In the absence of being able to clarify, build your class on



              template<typename Y> std::map<Y, std::size_t>


              where Y is an integral type. Note the use of std::size_t for the counter. It counts the number of times a particular Y is present.



              Include some program comments next time.



              Don't use using namespace std;. Although tutorials do for clarity, professional programmers don't.






              share|improve this answer























              • Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
                – Muhamad Gafar
                Dec 17 at 15:42








              • 3




                Also, see this answer for why using namespace std; is bad practice.
                – CK.
                Dec 17 at 16:00










              • Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
                – TechnoSam
                Dec 17 at 16:23






              • 1




                @TechnoSam it says 'hire flag', that's a good thing not a bad.
                – Jared Smith
                Dec 17 at 16:31










              • Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
                – Bathsheba
                Dec 17 at 17:15


















              10














              If you were being hired for a role that didn't require any previous programming experience then I would not have rejected you on that code sample alone.



              Using a std::list was an interesting choice and showed you had thought about this. The fact that you used a C++ standard library container rather than trying to build this from a lower level is a yes-hire flag for me. With your approach (1) is fast, but (2), (3), and (4) will be slow. In the absence of any other information you ought to arrange things so that reading (including querying) data is faster than writing. Your approach has this the other way round. Sometimes though that is what you want - for example when taking measurements real-time you’d want the data dump stage to be as fast as possible at the expense of anything else. For that application your solution would be difficult to beat!



              Reservations, but by no means red lines:



              An integer does not mean an int. In the absence of being able to clarify, build your class on



              template<typename Y> std::map<Y, std::size_t>


              where Y is an integral type. Note the use of std::size_t for the counter. It counts the number of times a particular Y is present.



              Include some program comments next time.



              Don't use using namespace std;. Although tutorials do for clarity, professional programmers don't.






              share|improve this answer























              • Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
                – Muhamad Gafar
                Dec 17 at 15:42








              • 3




                Also, see this answer for why using namespace std; is bad practice.
                – CK.
                Dec 17 at 16:00










              • Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
                – TechnoSam
                Dec 17 at 16:23






              • 1




                @TechnoSam it says 'hire flag', that's a good thing not a bad.
                – Jared Smith
                Dec 17 at 16:31










              • Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
                – Bathsheba
                Dec 17 at 17:15
















              10












              10








              10






              If you were being hired for a role that didn't require any previous programming experience then I would not have rejected you on that code sample alone.



              Using a std::list was an interesting choice and showed you had thought about this. The fact that you used a C++ standard library container rather than trying to build this from a lower level is a yes-hire flag for me. With your approach (1) is fast, but (2), (3), and (4) will be slow. In the absence of any other information you ought to arrange things so that reading (including querying) data is faster than writing. Your approach has this the other way round. Sometimes though that is what you want - for example when taking measurements real-time you’d want the data dump stage to be as fast as possible at the expense of anything else. For that application your solution would be difficult to beat!



              Reservations, but by no means red lines:



              An integer does not mean an int. In the absence of being able to clarify, build your class on



              template<typename Y> std::map<Y, std::size_t>


              where Y is an integral type. Note the use of std::size_t for the counter. It counts the number of times a particular Y is present.



              Include some program comments next time.



              Don't use using namespace std;. Although tutorials do for clarity, professional programmers don't.






              share|improve this answer














              If you were being hired for a role that didn't require any previous programming experience then I would not have rejected you on that code sample alone.



              Using a std::list was an interesting choice and showed you had thought about this. The fact that you used a C++ standard library container rather than trying to build this from a lower level is a yes-hire flag for me. With your approach (1) is fast, but (2), (3), and (4) will be slow. In the absence of any other information you ought to arrange things so that reading (including querying) data is faster than writing. Your approach has this the other way round. Sometimes though that is what you want - for example when taking measurements real-time you’d want the data dump stage to be as fast as possible at the expense of anything else. For that application your solution would be difficult to beat!



              Reservations, but by no means red lines:



              An integer does not mean an int. In the absence of being able to clarify, build your class on



              template<typename Y> std::map<Y, std::size_t>


              where Y is an integral type. Note the use of std::size_t for the counter. It counts the number of times a particular Y is present.



              Include some program comments next time.



              Don't use using namespace std;. Although tutorials do for clarity, professional programmers don't.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Dec 17 at 17:22

























              answered Dec 17 at 15:38









              Bathsheba

              174k27247370




              174k27247370












              • Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
                – Muhamad Gafar
                Dec 17 at 15:42








              • 3




                Also, see this answer for why using namespace std; is bad practice.
                – CK.
                Dec 17 at 16:00










              • Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
                – TechnoSam
                Dec 17 at 16:23






              • 1




                @TechnoSam it says 'hire flag', that's a good thing not a bad.
                – Jared Smith
                Dec 17 at 16:31










              • Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
                – Bathsheba
                Dec 17 at 17:15




















              • Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
                – Muhamad Gafar
                Dec 17 at 15:42








              • 3




                Also, see this answer for why using namespace std; is bad practice.
                – CK.
                Dec 17 at 16:00










              • Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
                – TechnoSam
                Dec 17 at 16:23






              • 1




                @TechnoSam it says 'hire flag', that's a good thing not a bad.
                – Jared Smith
                Dec 17 at 16:31










              • Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
                – Bathsheba
                Dec 17 at 17:15


















              Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
              – Muhamad Gafar
              Dec 17 at 15:42






              Thanks for the advice. I didn't include comments here but I included them in the code I submitted. I won't use namespace std from now on. Could you clarify what you mean by "big hire flag"? Should I have done a linked list using structs?
              – Muhamad Gafar
              Dec 17 at 15:42






              3




              3




              Also, see this answer for why using namespace std; is bad practice.
              – CK.
              Dec 17 at 16:00




              Also, see this answer for why using namespace std; is bad practice.
              – CK.
              Dec 17 at 16:00












              Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
              – TechnoSam
              Dec 17 at 16:23




              Can you qualify why using an STL container is a flag? Seems like the smart thing to do here. From the problem statement, nothing other than operational parameters are given, so why re-invent the wheel? You save the company time and write better code with this. I would only reinvent the wheel if I knew I could save resources on a limited system or make it faster for a niche use case.
              – TechnoSam
              Dec 17 at 16:23




              1




              1




              @TechnoSam it says 'hire flag', that's a good thing not a bad.
              – Jared Smith
              Dec 17 at 16:31




              @TechnoSam it says 'hire flag', that's a good thing not a bad.
              – Jared Smith
              Dec 17 at 16:31












              Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
              – Bathsheba
              Dec 17 at 17:15






              Yes indeed: a flag for hire is a good thing! Sorry to have caused confusion.
              – Bathsheba
              Dec 17 at 17:15













              4














              You should use map<int,size_t> the int is for value, the size_t is for count.



              If you need to implement the code, you should choose a Balanced Binary Tree to get to 'log N' complexity.



              so you have the following node:



              struct Node
              {
              int key;
              size_t count;
              Node *left, *right;
              size_t leftNodesCount, rightNodesCount;
              };


              leftNodesCount and rightNodesCount are ment to indicate how good is the balance, so any insertion and deletion is changing it all the way to the root. a balanced tree is when all over the tree leftNodesCount and rightNodesCount are almost equal (mean difference is not more than 1. but you can set the tolerance to some higher value like 2 or 3)



              Now you should implement Insert, Delete, and Balance methods.



              To balance a Balanced Tree, you should rotate the unbalanced nodes, Rotate left means replace node by the node's right, and add the node to left, Rotate right is the same operation in the other direction.



              Balance complicity is also 'log N'. note that after insertions and deletions you should call to balance in manner to keep the complicity of the tree something about 'log N'






              share|improve this answer




























                4














                You should use map<int,size_t> the int is for value, the size_t is for count.



                If you need to implement the code, you should choose a Balanced Binary Tree to get to 'log N' complexity.



                so you have the following node:



                struct Node
                {
                int key;
                size_t count;
                Node *left, *right;
                size_t leftNodesCount, rightNodesCount;
                };


                leftNodesCount and rightNodesCount are ment to indicate how good is the balance, so any insertion and deletion is changing it all the way to the root. a balanced tree is when all over the tree leftNodesCount and rightNodesCount are almost equal (mean difference is not more than 1. but you can set the tolerance to some higher value like 2 or 3)



                Now you should implement Insert, Delete, and Balance methods.



                To balance a Balanced Tree, you should rotate the unbalanced nodes, Rotate left means replace node by the node's right, and add the node to left, Rotate right is the same operation in the other direction.



                Balance complicity is also 'log N'. note that after insertions and deletions you should call to balance in manner to keep the complicity of the tree something about 'log N'






                share|improve this answer


























                  4












                  4








                  4






                  You should use map<int,size_t> the int is for value, the size_t is for count.



                  If you need to implement the code, you should choose a Balanced Binary Tree to get to 'log N' complexity.



                  so you have the following node:



                  struct Node
                  {
                  int key;
                  size_t count;
                  Node *left, *right;
                  size_t leftNodesCount, rightNodesCount;
                  };


                  leftNodesCount and rightNodesCount are ment to indicate how good is the balance, so any insertion and deletion is changing it all the way to the root. a balanced tree is when all over the tree leftNodesCount and rightNodesCount are almost equal (mean difference is not more than 1. but you can set the tolerance to some higher value like 2 or 3)



                  Now you should implement Insert, Delete, and Balance methods.



                  To balance a Balanced Tree, you should rotate the unbalanced nodes, Rotate left means replace node by the node's right, and add the node to left, Rotate right is the same operation in the other direction.



                  Balance complicity is also 'log N'. note that after insertions and deletions you should call to balance in manner to keep the complicity of the tree something about 'log N'






                  share|improve this answer














                  You should use map<int,size_t> the int is for value, the size_t is for count.



                  If you need to implement the code, you should choose a Balanced Binary Tree to get to 'log N' complexity.



                  so you have the following node:



                  struct Node
                  {
                  int key;
                  size_t count;
                  Node *left, *right;
                  size_t leftNodesCount, rightNodesCount;
                  };


                  leftNodesCount and rightNodesCount are ment to indicate how good is the balance, so any insertion and deletion is changing it all the way to the root. a balanced tree is when all over the tree leftNodesCount and rightNodesCount are almost equal (mean difference is not more than 1. but you can set the tolerance to some higher value like 2 or 3)



                  Now you should implement Insert, Delete, and Balance methods.



                  To balance a Balanced Tree, you should rotate the unbalanced nodes, Rotate left means replace node by the node's right, and add the node to left, Rotate right is the same operation in the other direction.



                  Balance complicity is also 'log N'. note that after insertions and deletions you should call to balance in manner to keep the complicity of the tree something about 'log N'







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Dec 17 at 14:58

























                  answered Dec 17 at 14:52









                  SHR

                  5,73552341




                  5,73552341






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53817062%2fwhat-is-the-best-c-data-structure-that-could-be-used-for-storing-and-managing%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Morgemoulin

                      Scott Moir

                      Souastre