Getting first n unique elements from Python list












35














I have a python list where elements can repeat.



>>> a = [1,2,2,3,3,4,5,6]


I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:



[1,2,3,4,5]


I have come up with a solution using generators:



def iterate(itr, upper=5):

count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element

elif element not in itr[:index] and count<upper:
count += 1
yield element


In use:



>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]


I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?










share|improve this question




















  • 5




    Try: set(a)[:n]
    – Tony Pellerin
    Dec 21 '18 at 16:11






  • 9




    @TonyPellerin does not guarantee you get the first 5 elements
    – juanpa.arrivillaga
    Dec 21 '18 at 16:14






  • 2




    Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
    – juanpa.arrivillaga
    Dec 21 '18 at 16:16








  • 2




    Is the list always sorted?
    – user8408080
    Dec 21 '18 at 16:16






  • 5




    for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
    – Azat Ibrakov
    Dec 21 '18 at 16:49
















35














I have a python list where elements can repeat.



>>> a = [1,2,2,3,3,4,5,6]


I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:



[1,2,3,4,5]


I have come up with a solution using generators:



def iterate(itr, upper=5):

count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element

elif element not in itr[:index] and count<upper:
count += 1
yield element


In use:



>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]


I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?










share|improve this question




















  • 5




    Try: set(a)[:n]
    – Tony Pellerin
    Dec 21 '18 at 16:11






  • 9




    @TonyPellerin does not guarantee you get the first 5 elements
    – juanpa.arrivillaga
    Dec 21 '18 at 16:14






  • 2




    Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
    – juanpa.arrivillaga
    Dec 21 '18 at 16:16








  • 2




    Is the list always sorted?
    – user8408080
    Dec 21 '18 at 16:16






  • 5




    for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
    – Azat Ibrakov
    Dec 21 '18 at 16:49














35












35








35


3





I have a python list where elements can repeat.



>>> a = [1,2,2,3,3,4,5,6]


I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:



[1,2,3,4,5]


I have come up with a solution using generators:



def iterate(itr, upper=5):

count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element

elif element not in itr[:index] and count<upper:
count += 1
yield element


In use:



>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]


I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?










share|improve this question















I have a python list where elements can repeat.



>>> a = [1,2,2,3,3,4,5,6]


I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:



[1,2,3,4,5]


I have come up with a solution using generators:



def iterate(itr, upper=5):

count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element

elif element not in itr[:index] and count<upper:
count += 1
yield element


In use:



>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]


I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?







python python-3.x generator






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 27 '18 at 11:59









jorijnsmit

544320




544320










asked Dec 21 '18 at 16:10









xssChauhan

1,361925




1,361925








  • 5




    Try: set(a)[:n]
    – Tony Pellerin
    Dec 21 '18 at 16:11






  • 9




    @TonyPellerin does not guarantee you get the first 5 elements
    – juanpa.arrivillaga
    Dec 21 '18 at 16:14






  • 2




    Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
    – juanpa.arrivillaga
    Dec 21 '18 at 16:16








  • 2




    Is the list always sorted?
    – user8408080
    Dec 21 '18 at 16:16






  • 5




    for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
    – Azat Ibrakov
    Dec 21 '18 at 16:49














  • 5




    Try: set(a)[:n]
    – Tony Pellerin
    Dec 21 '18 at 16:11






  • 9




    @TonyPellerin does not guarantee you get the first 5 elements
    – juanpa.arrivillaga
    Dec 21 '18 at 16:14






  • 2




    Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
    – juanpa.arrivillaga
    Dec 21 '18 at 16:16








  • 2




    Is the list always sorted?
    – user8408080
    Dec 21 '18 at 16:16






  • 5




    for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
    – Azat Ibrakov
    Dec 21 '18 at 16:49








5




5




Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11




Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11




9




9




@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14




@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14




2




2




Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16






Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16






2




2




Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16




Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16




5




5




for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49




for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49












10 Answers
10






active

oldest

votes


















32














I would use a set to remember what was seen and return from the generator when you have seen enough:



a = [1,2,2,3,3,4,5,6]

def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return

k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))


Output:



[1,2,3,4]




According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.



With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration





Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.



Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:



For 6 uniques (in longer list):




  • you would have lookups of O(1)+O(2)+...+O(5001)

  • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})






share|improve this answer



















  • 5




    You can just return. You don't need to raise anything.
    – khelwood
    Dec 21 '18 at 16:15






  • 1




    See PEP 479.
    – khelwood
    Dec 21 '18 at 16:17








  • 1




    Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
    – mkrieger1
    Dec 21 '18 at 16:19








  • 2




    @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
    – khelwood
    Dec 21 '18 at 16:20






  • 2




    yielding in order :) list(set) not
    – Patrick Artner
    Dec 21 '18 at 16:22



















19














You can adapt the popular itertools unique_everseen recipe:



def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break

a = [1,2,2,3,3,4,5,6]

res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]


Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:



from itertools import islice

def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element

res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]


Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:



from itertools import islice
from more_itertools import unique_everseen
from toolz import unique

res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]





share|improve this answer



















  • 1




    The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
    – Chris_Rands
    Dec 21 '18 at 16:33










  • Why not drop line 3 in you first block of code and do seen.add(element) instead?
    – jorijnsmit
    Dec 27 '18 at 11:04












  • @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
    – jpp
    Dec 27 '18 at 11:28



















7














If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like



from collections import OrderedDict


def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()


and then implementation of iterate can be simplified to



from itertools import islice


def iterate(itr, upper=5):
return islice(nub(itr), upper)


or if you want always a list as an output



def iterate(itr, upper=5):
return list(nub(itr))[:upper]


Improvements



As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:



def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)





share|improve this answer



















  • 1




    I was thinking of this, definitely short, but it is O(N)
    – Chris_Rands
    Dec 21 '18 at 16:21










  • @Chris_Rands: updated
    – Azat Ibrakov
    Dec 21 '18 at 16:38



















5














You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.



N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]





share|improve this answer



















  • 1




    In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
    – glibdud
    Dec 21 '18 at 16:32





















5














Here is a Pythonic approach using itertools.takewhile():



In [95]: from itertools import takewhile

In [96]: seen = set()

In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}





share|improve this answer

















  • 4




    By which definition is this abuse of the or operator considered Pythonic?
    – cdlane
    Dec 21 '18 at 16:42






  • 2




    @cdlane By the definition in which this use of or is misuse.
    – Kasrâmvd
    Dec 21 '18 at 16:47










  • I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
    – xssChauhan
    Dec 21 '18 at 16:50








  • 3




    We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
    – cdlane
    Dec 21 '18 at 17:08






  • 1




    I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
    – juanpa.arrivillaga
    Dec 21 '18 at 17:38



















4














Using set with sorted+ key



sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]





share|improve this answer

















  • 1




    This is inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:25






  • 3




    @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
    – juanpa.arrivillaga
    Dec 21 '18 at 16:29



















3














Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:



from itertools import groupby, islice

def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)

a = [1, 2, 2, 3, 3, 4, 5, 6]

print(list(first_unique(a, 5)))


Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.






share|improve this answer























  • You might as well use islice
    – juanpa.arrivillaga
    Dec 21 '18 at 17:39










  • So groupby retains order, nice, but is it an implementation detail or a feature?
    – kubanczyk
    Dec 22 '18 at 9:39










  • @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
    – cdlane
    Dec 22 '18 at 22:15



















2














There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.



a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)





share|improve this answer



















  • 2




    Use set rather than list for O(1) lookup.
    – jpp
    Dec 21 '18 at 16:20






  • 3




    @teng ... inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:20






  • 1




    @teng similarly inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:23






  • 1




    in is no easier to read than in {}, bad justification
    – Chris_Rands
    Dec 21 '18 at 16:25






  • 1




    @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:30





















2














Given



import itertools as it


a = [1, 2, 2, 3, 3, 4, 5, 6]


Code



A simple list comprehension (similar to @cdlane's answer).



[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]


Alternatively, in Python 3.6+:



list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]





share|improve this answer































    1














    Why not use something like this?



    >>> a = [1, 2, 2, 3, 3, 4, 5, 6]
    >>> list(set(a))[:5]
    [1, 2, 3, 4, 5]





    share|improve this answer





















    • If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
      – pylang
      Dec 25 '18 at 2:17










    • This is wrong as it may or may not return the first five unique elements.
      – Baum mit Augen
      yesterday











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53887803%2fgetting-first-n-unique-elements-from-python-list%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    10 Answers
    10






    active

    oldest

    votes








    10 Answers
    10






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    32














    I would use a set to remember what was seen and return from the generator when you have seen enough:



    a = [1,2,2,3,3,4,5,6]

    def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable.
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
    if e in seen:
    continue
    seen.add(e)
    yield e
    if len(seen) == N:
    return

    k = get_unique_N([1,2,2,3,3,4,5,6], 4)
    print(list(k))


    Output:



    [1,2,3,4]




    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.



    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration





    Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.



    Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:



    For 6 uniques (in longer list):




    • you would have lookups of O(1)+O(2)+...+O(5001)

    • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})






    share|improve this answer



















    • 5




      You can just return. You don't need to raise anything.
      – khelwood
      Dec 21 '18 at 16:15






    • 1




      See PEP 479.
      – khelwood
      Dec 21 '18 at 16:17








    • 1




      Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
      – mkrieger1
      Dec 21 '18 at 16:19








    • 2




      @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
      – khelwood
      Dec 21 '18 at 16:20






    • 2




      yielding in order :) list(set) not
      – Patrick Artner
      Dec 21 '18 at 16:22
















    32














    I would use a set to remember what was seen and return from the generator when you have seen enough:



    a = [1,2,2,3,3,4,5,6]

    def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable.
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
    if e in seen:
    continue
    seen.add(e)
    yield e
    if len(seen) == N:
    return

    k = get_unique_N([1,2,2,3,3,4,5,6], 4)
    print(list(k))


    Output:



    [1,2,3,4]




    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.



    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration





    Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.



    Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:



    For 6 uniques (in longer list):




    • you would have lookups of O(1)+O(2)+...+O(5001)

    • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})






    share|improve this answer



















    • 5




      You can just return. You don't need to raise anything.
      – khelwood
      Dec 21 '18 at 16:15






    • 1




      See PEP 479.
      – khelwood
      Dec 21 '18 at 16:17








    • 1




      Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
      – mkrieger1
      Dec 21 '18 at 16:19








    • 2




      @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
      – khelwood
      Dec 21 '18 at 16:20






    • 2




      yielding in order :) list(set) not
      – Patrick Artner
      Dec 21 '18 at 16:22














    32












    32








    32






    I would use a set to remember what was seen and return from the generator when you have seen enough:



    a = [1,2,2,3,3,4,5,6]

    def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable.
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
    if e in seen:
    continue
    seen.add(e)
    yield e
    if len(seen) == N:
    return

    k = get_unique_N([1,2,2,3,3,4,5,6], 4)
    print(list(k))


    Output:



    [1,2,3,4]




    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.



    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration





    Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.



    Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:



    For 6 uniques (in longer list):




    • you would have lookups of O(1)+O(2)+...+O(5001)

    • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})






    share|improve this answer














    I would use a set to remember what was seen and return from the generator when you have seen enough:



    a = [1,2,2,3,3,4,5,6]

    def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable.
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
    if e in seen:
    continue
    seen.add(e)
    yield e
    if len(seen) == N:
    return

    k = get_unique_N([1,2,2,3,3,4,5,6], 4)
    print(list(k))


    Output:



    [1,2,3,4]




    According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.



    With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration





    Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.



    Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:



    For 6 uniques (in longer list):




    • you would have lookups of O(1)+O(2)+...+O(5001)

    • mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 16:42

























    answered Dec 21 '18 at 16:14









    Patrick Artner

    21.7k62143




    21.7k62143








    • 5




      You can just return. You don't need to raise anything.
      – khelwood
      Dec 21 '18 at 16:15






    • 1




      See PEP 479.
      – khelwood
      Dec 21 '18 at 16:17








    • 1




      Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
      – mkrieger1
      Dec 21 '18 at 16:19








    • 2




      @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
      – khelwood
      Dec 21 '18 at 16:20






    • 2




      yielding in order :) list(set) not
      – Patrick Artner
      Dec 21 '18 at 16:22














    • 5




      You can just return. You don't need to raise anything.
      – khelwood
      Dec 21 '18 at 16:15






    • 1




      See PEP 479.
      – khelwood
      Dec 21 '18 at 16:17








    • 1




      Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
      – mkrieger1
      Dec 21 '18 at 16:19








    • 2




      @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
      – khelwood
      Dec 21 '18 at 16:20






    • 2




      yielding in order :) list(set) not
      – Patrick Artner
      Dec 21 '18 at 16:22








    5




    5




    You can just return. You don't need to raise anything.
    – khelwood
    Dec 21 '18 at 16:15




    You can just return. You don't need to raise anything.
    – khelwood
    Dec 21 '18 at 16:15




    1




    1




    See PEP 479.
    – khelwood
    Dec 21 '18 at 16:17






    See PEP 479.
    – khelwood
    Dec 21 '18 at 16:17






    1




    1




    Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
    – mkrieger1
    Dec 21 '18 at 16:19






    Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
    – mkrieger1
    Dec 21 '18 at 16:19






    2




    2




    @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
    – khelwood
    Dec 21 '18 at 16:20




    @mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
    – khelwood
    Dec 21 '18 at 16:20




    2




    2




    yielding in order :) list(set) not
    – Patrick Artner
    Dec 21 '18 at 16:22




    yielding in order :) list(set) not
    – Patrick Artner
    Dec 21 '18 at 16:22













    19














    You can adapt the popular itertools unique_everseen recipe:



    def unique_everseen_limit(iterable, limit=5):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element
    if len(seen) == limit:
    break

    a = [1,2,2,3,3,4,5,6]

    res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]


    Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:



    from itertools import islice

    def unique_everseen(iterable):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]


    Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:



    from itertools import islice
    from more_itertools import unique_everseen
    from toolz import unique

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
    res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]





    share|improve this answer



















    • 1




      The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
      – Chris_Rands
      Dec 21 '18 at 16:33










    • Why not drop line 3 in you first block of code and do seen.add(element) instead?
      – jorijnsmit
      Dec 27 '18 at 11:04












    • @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
      – jpp
      Dec 27 '18 at 11:28
















    19














    You can adapt the popular itertools unique_everseen recipe:



    def unique_everseen_limit(iterable, limit=5):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element
    if len(seen) == limit:
    break

    a = [1,2,2,3,3,4,5,6]

    res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]


    Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:



    from itertools import islice

    def unique_everseen(iterable):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]


    Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:



    from itertools import islice
    from more_itertools import unique_everseen
    from toolz import unique

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
    res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]





    share|improve this answer



















    • 1




      The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
      – Chris_Rands
      Dec 21 '18 at 16:33










    • Why not drop line 3 in you first block of code and do seen.add(element) instead?
      – jorijnsmit
      Dec 27 '18 at 11:04












    • @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
      – jpp
      Dec 27 '18 at 11:28














    19












    19








    19






    You can adapt the popular itertools unique_everseen recipe:



    def unique_everseen_limit(iterable, limit=5):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element
    if len(seen) == limit:
    break

    a = [1,2,2,3,3,4,5,6]

    res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]


    Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:



    from itertools import islice

    def unique_everseen(iterable):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]


    Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:



    from itertools import islice
    from more_itertools import unique_everseen
    from toolz import unique

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
    res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]





    share|improve this answer














    You can adapt the popular itertools unique_everseen recipe:



    def unique_everseen_limit(iterable, limit=5):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element
    if len(seen) == limit:
    break

    a = [1,2,2,3,3,4,5,6]

    res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]


    Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:



    from itertools import islice

    def unique_everseen(iterable):
    seen = set()
    seen_add = seen.add
    for element in iterable:
    if element not in seen:
    seen_add(element)
    yield element

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]


    Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:



    from itertools import islice
    from more_itertools import unique_everseen
    from toolz import unique

    res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
    res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 17:01

























    answered Dec 21 '18 at 16:16









    jpp

    92.2k2053103




    92.2k2053103








    • 1




      The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
      – Chris_Rands
      Dec 21 '18 at 16:33










    • Why not drop line 3 in you first block of code and do seen.add(element) instead?
      – jorijnsmit
      Dec 27 '18 at 11:04












    • @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
      – jpp
      Dec 27 '18 at 11:28














    • 1




      The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
      – Chris_Rands
      Dec 21 '18 at 16:33










    • Why not drop line 3 in you first block of code and do seen.add(element) instead?
      – jorijnsmit
      Dec 27 '18 at 11:04












    • @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
      – jpp
      Dec 27 '18 at 11:28








    1




    1




    The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
    – Chris_Rands
    Dec 21 '18 at 16:33




    The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
    – Chris_Rands
    Dec 21 '18 at 16:33












    Why not drop line 3 in you first block of code and do seen.add(element) instead?
    – jorijnsmit
    Dec 27 '18 at 11:04






    Why not drop line 3 in you first block of code and do seen.add(element) instead?
    – jorijnsmit
    Dec 27 '18 at 11:04














    @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
    – jpp
    Dec 27 '18 at 11:28




    @jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
    – jpp
    Dec 27 '18 at 11:28











    7














    If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like



    from collections import OrderedDict


    def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()


    and then implementation of iterate can be simplified to



    from itertools import islice


    def iterate(itr, upper=5):
    return islice(nub(itr), upper)


    or if you want always a list as an output



    def iterate(itr, upper=5):
    return list(nub(itr))[:upper]


    Improvements



    As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:



    def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
    if element in seen:
    continue
    yield element
    add_seen(element)





    share|improve this answer



















    • 1




      I was thinking of this, definitely short, but it is O(N)
      – Chris_Rands
      Dec 21 '18 at 16:21










    • @Chris_Rands: updated
      – Azat Ibrakov
      Dec 21 '18 at 16:38
















    7














    If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like



    from collections import OrderedDict


    def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()


    and then implementation of iterate can be simplified to



    from itertools import islice


    def iterate(itr, upper=5):
    return islice(nub(itr), upper)


    or if you want always a list as an output



    def iterate(itr, upper=5):
    return list(nub(itr))[:upper]


    Improvements



    As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:



    def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
    if element in seen:
    continue
    yield element
    add_seen(element)





    share|improve this answer



















    • 1




      I was thinking of this, definitely short, but it is O(N)
      – Chris_Rands
      Dec 21 '18 at 16:21










    • @Chris_Rands: updated
      – Azat Ibrakov
      Dec 21 '18 at 16:38














    7












    7








    7






    If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like



    from collections import OrderedDict


    def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()


    and then implementation of iterate can be simplified to



    from itertools import islice


    def iterate(itr, upper=5):
    return islice(nub(itr), upper)


    or if you want always a list as an output



    def iterate(itr, upper=5):
    return list(nub(itr))[:upper]


    Improvements



    As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:



    def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
    if element in seen:
    continue
    yield element
    add_seen(element)





    share|improve this answer














    If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like



    from collections import OrderedDict


    def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()


    and then implementation of iterate can be simplified to



    from itertools import islice


    def iterate(itr, upper=5):
    return islice(nub(itr), upper)


    or if you want always a list as an output



    def iterate(itr, upper=5):
    return list(nub(itr))[:upper]


    Improvements



    As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:



    def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
    if element in seen:
    continue
    yield element
    add_seen(element)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 16:44

























    answered Dec 21 '18 at 16:20









    Azat Ibrakov

    3,91171330




    3,91171330








    • 1




      I was thinking of this, definitely short, but it is O(N)
      – Chris_Rands
      Dec 21 '18 at 16:21










    • @Chris_Rands: updated
      – Azat Ibrakov
      Dec 21 '18 at 16:38














    • 1




      I was thinking of this, definitely short, but it is O(N)
      – Chris_Rands
      Dec 21 '18 at 16:21










    • @Chris_Rands: updated
      – Azat Ibrakov
      Dec 21 '18 at 16:38








    1




    1




    I was thinking of this, definitely short, but it is O(N)
    – Chris_Rands
    Dec 21 '18 at 16:21




    I was thinking of this, definitely short, but it is O(N)
    – Chris_Rands
    Dec 21 '18 at 16:21












    @Chris_Rands: updated
    – Azat Ibrakov
    Dec 21 '18 at 16:38




    @Chris_Rands: updated
    – Azat Ibrakov
    Dec 21 '18 at 16:38











    5














    You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.



    N = 3
    a = [1, 2, 2, 3, 3, 3, 4]
    d = {x: True for x in a}
    list(d.keys())[:N]





    share|improve this answer



















    • 1




      In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
      – glibdud
      Dec 21 '18 at 16:32


















    5














    You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.



    N = 3
    a = [1, 2, 2, 3, 3, 3, 4]
    d = {x: True for x in a}
    list(d.keys())[:N]





    share|improve this answer



















    • 1




      In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
      – glibdud
      Dec 21 '18 at 16:32
















    5












    5








    5






    You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.



    N = 3
    a = [1, 2, 2, 3, 3, 3, 4]
    d = {x: True for x in a}
    list(d.keys())[:N]





    share|improve this answer














    You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.



    N = 3
    a = [1, 2, 2, 3, 3, 3, 4]
    d = {x: True for x in a}
    list(d.keys())[:N]






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 16:33

























    answered Dec 21 '18 at 16:19









    Jindra Helcl

    1,6391218




    1,6391218








    • 1




      In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
      – glibdud
      Dec 21 '18 at 16:32
















    • 1




      In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
      – glibdud
      Dec 21 '18 at 16:32










    1




    1




    In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
    – glibdud
    Dec 21 '18 at 16:32






    In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
    – glibdud
    Dec 21 '18 at 16:32













    5














    Here is a Pythonic approach using itertools.takewhile():



    In [95]: from itertools import takewhile

    In [96]: seen = set()

    In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
    Out[97]: {1, 2, 3, 4}





    share|improve this answer

















    • 4




      By which definition is this abuse of the or operator considered Pythonic?
      – cdlane
      Dec 21 '18 at 16:42






    • 2




      @cdlane By the definition in which this use of or is misuse.
      – Kasrâmvd
      Dec 21 '18 at 16:47










    • I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
      – xssChauhan
      Dec 21 '18 at 16:50








    • 3




      We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
      – cdlane
      Dec 21 '18 at 17:08






    • 1




      I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
      – juanpa.arrivillaga
      Dec 21 '18 at 17:38
















    5














    Here is a Pythonic approach using itertools.takewhile():



    In [95]: from itertools import takewhile

    In [96]: seen = set()

    In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
    Out[97]: {1, 2, 3, 4}





    share|improve this answer

















    • 4




      By which definition is this abuse of the or operator considered Pythonic?
      – cdlane
      Dec 21 '18 at 16:42






    • 2




      @cdlane By the definition in which this use of or is misuse.
      – Kasrâmvd
      Dec 21 '18 at 16:47










    • I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
      – xssChauhan
      Dec 21 '18 at 16:50








    • 3




      We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
      – cdlane
      Dec 21 '18 at 17:08






    • 1




      I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
      – juanpa.arrivillaga
      Dec 21 '18 at 17:38














    5












    5








    5






    Here is a Pythonic approach using itertools.takewhile():



    In [95]: from itertools import takewhile

    In [96]: seen = set()

    In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
    Out[97]: {1, 2, 3, 4}





    share|improve this answer












    Here is a Pythonic approach using itertools.takewhile():



    In [95]: from itertools import takewhile

    In [96]: seen = set()

    In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
    Out[97]: {1, 2, 3, 4}






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Dec 21 '18 at 16:37









    Kasrâmvd

    77.8k1089124




    77.8k1089124








    • 4




      By which definition is this abuse of the or operator considered Pythonic?
      – cdlane
      Dec 21 '18 at 16:42






    • 2




      @cdlane By the definition in which this use of or is misuse.
      – Kasrâmvd
      Dec 21 '18 at 16:47










    • I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
      – xssChauhan
      Dec 21 '18 at 16:50








    • 3




      We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
      – cdlane
      Dec 21 '18 at 17:08






    • 1




      I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
      – juanpa.arrivillaga
      Dec 21 '18 at 17:38














    • 4




      By which definition is this abuse of the or operator considered Pythonic?
      – cdlane
      Dec 21 '18 at 16:42






    • 2




      @cdlane By the definition in which this use of or is misuse.
      – Kasrâmvd
      Dec 21 '18 at 16:47










    • I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
      – xssChauhan
      Dec 21 '18 at 16:50








    • 3




      We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
      – cdlane
      Dec 21 '18 at 17:08






    • 1




      I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
      – juanpa.arrivillaga
      Dec 21 '18 at 17:38








    4




    4




    By which definition is this abuse of the or operator considered Pythonic?
    – cdlane
    Dec 21 '18 at 16:42




    By which definition is this abuse of the or operator considered Pythonic?
    – cdlane
    Dec 21 '18 at 16:42




    2




    2




    @cdlane By the definition in which this use of or is misuse.
    – Kasrâmvd
    Dec 21 '18 at 16:47




    @cdlane By the definition in which this use of or is misuse.
    – Kasrâmvd
    Dec 21 '18 at 16:47












    I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
    – xssChauhan
    Dec 21 '18 at 16:50






    I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
    – xssChauhan
    Dec 21 '18 at 16:50






    3




    3




    We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
    – cdlane
    Dec 21 '18 at 17:08




    We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
    – cdlane
    Dec 21 '18 at 17:08




    1




    1




    I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
    – juanpa.arrivillaga
    Dec 21 '18 at 17:38




    I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
    – juanpa.arrivillaga
    Dec 21 '18 at 17:38











    4














    Using set with sorted+ key



    sorted(set(a), key=list(a).index)[:5]
    Out[136]: [1, 2, 3, 4, 5]





    share|improve this answer

















    • 1




      This is inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:25






    • 3




      @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
      – juanpa.arrivillaga
      Dec 21 '18 at 16:29
















    4














    Using set with sorted+ key



    sorted(set(a), key=list(a).index)[:5]
    Out[136]: [1, 2, 3, 4, 5]





    share|improve this answer

















    • 1




      This is inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:25






    • 3




      @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
      – juanpa.arrivillaga
      Dec 21 '18 at 16:29














    4












    4








    4






    Using set with sorted+ key



    sorted(set(a), key=list(a).index)[:5]
    Out[136]: [1, 2, 3, 4, 5]





    share|improve this answer












    Using set with sorted+ key



    sorted(set(a), key=list(a).index)[:5]
    Out[136]: [1, 2, 3, 4, 5]






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Dec 21 '18 at 16:17









    W-B

    102k73163




    102k73163








    • 1




      This is inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:25






    • 3




      @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
      – juanpa.arrivillaga
      Dec 21 '18 at 16:29














    • 1




      This is inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:25






    • 3




      @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
      – juanpa.arrivillaga
      Dec 21 '18 at 16:29








    1




    1




    This is inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:25




    This is inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:25




    3




    3




    @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
    – juanpa.arrivillaga
    Dec 21 '18 at 16:29




    @xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
    – juanpa.arrivillaga
    Dec 21 '18 at 16:29











    3














    Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:



    from itertools import groupby, islice

    def first_unique(data, upper):
    return islice((key for (key, _) in groupby(data)), 0, upper)

    a = [1, 2, 2, 3, 3, 4, 5, 6]

    print(list(first_unique(a, 5)))


    Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.






    share|improve this answer























    • You might as well use islice
      – juanpa.arrivillaga
      Dec 21 '18 at 17:39










    • So groupby retains order, nice, but is it an implementation detail or a feature?
      – kubanczyk
      Dec 22 '18 at 9:39










    • @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
      – cdlane
      Dec 22 '18 at 22:15
















    3














    Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:



    from itertools import groupby, islice

    def first_unique(data, upper):
    return islice((key for (key, _) in groupby(data)), 0, upper)

    a = [1, 2, 2, 3, 3, 4, 5, 6]

    print(list(first_unique(a, 5)))


    Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.






    share|improve this answer























    • You might as well use islice
      – juanpa.arrivillaga
      Dec 21 '18 at 17:39










    • So groupby retains order, nice, but is it an implementation detail or a feature?
      – kubanczyk
      Dec 22 '18 at 9:39










    • @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
      – cdlane
      Dec 22 '18 at 22:15














    3












    3








    3






    Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:



    from itertools import groupby, islice

    def first_unique(data, upper):
    return islice((key for (key, _) in groupby(data)), 0, upper)

    a = [1, 2, 2, 3, 3, 4, 5, 6]

    print(list(first_unique(a, 5)))


    Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.






    share|improve this answer














    Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:



    from itertools import groupby, islice

    def first_unique(data, upper):
    return islice((key for (key, _) in groupby(data)), 0, upper)

    a = [1, 2, 2, 3, 3, 4, 5, 6]

    print(list(first_unique(a, 5)))


    Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 17:49

























    answered Dec 21 '18 at 17:01









    cdlane

    17.3k21043




    17.3k21043












    • You might as well use islice
      – juanpa.arrivillaga
      Dec 21 '18 at 17:39










    • So groupby retains order, nice, but is it an implementation detail or a feature?
      – kubanczyk
      Dec 22 '18 at 9:39










    • @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
      – cdlane
      Dec 22 '18 at 22:15


















    • You might as well use islice
      – juanpa.arrivillaga
      Dec 21 '18 at 17:39










    • So groupby retains order, nice, but is it an implementation detail or a feature?
      – kubanczyk
      Dec 22 '18 at 9:39










    • @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
      – cdlane
      Dec 22 '18 at 22:15
















    You might as well use islice
    – juanpa.arrivillaga
    Dec 21 '18 at 17:39




    You might as well use islice
    – juanpa.arrivillaga
    Dec 21 '18 at 17:39












    So groupby retains order, nice, but is it an implementation detail or a feature?
    – kubanczyk
    Dec 22 '18 at 9:39




    So groupby retains order, nice, but is it an implementation detail or a feature?
    – kubanczyk
    Dec 22 '18 at 9:39












    @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
    – cdlane
    Dec 22 '18 at 22:15




    @kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
    – cdlane
    Dec 22 '18 at 22:15











    2














    There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.



    a = [1,2,2,3,3,4,5,6]
    res =
    for x in a:
    if x not in res: # yes, not optimal, but doesnt need additional dict
    res.append(x)
    if len(res) == 5:
    break
    print(res)





    share|improve this answer



















    • 2




      Use set rather than list for O(1) lookup.
      – jpp
      Dec 21 '18 at 16:20






    • 3




      @teng ... inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:20






    • 1




      @teng similarly inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:23






    • 1




      in is no easier to read than in {}, bad justification
      – Chris_Rands
      Dec 21 '18 at 16:25






    • 1




      @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:30


















    2














    There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.



    a = [1,2,2,3,3,4,5,6]
    res =
    for x in a:
    if x not in res: # yes, not optimal, but doesnt need additional dict
    res.append(x)
    if len(res) == 5:
    break
    print(res)





    share|improve this answer



















    • 2




      Use set rather than list for O(1) lookup.
      – jpp
      Dec 21 '18 at 16:20






    • 3




      @teng ... inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:20






    • 1




      @teng similarly inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:23






    • 1




      in is no easier to read than in {}, bad justification
      – Chris_Rands
      Dec 21 '18 at 16:25






    • 1




      @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:30
















    2












    2








    2






    There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.



    a = [1,2,2,3,3,4,5,6]
    res =
    for x in a:
    if x not in res: # yes, not optimal, but doesnt need additional dict
    res.append(x)
    if len(res) == 5:
    break
    print(res)





    share|improve this answer














    There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.



    a = [1,2,2,3,3,4,5,6]
    res =
    for x in a:
    if x not in res: # yes, not optimal, but doesnt need additional dict
    res.append(x)
    if len(res) == 5:
    break
    print(res)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 16:47

























    answered Dec 21 '18 at 16:18









    grapes

    2,6321115




    2,6321115








    • 2




      Use set rather than list for O(1) lookup.
      – jpp
      Dec 21 '18 at 16:20






    • 3




      @teng ... inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:20






    • 1




      @teng similarly inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:23






    • 1




      in is no easier to read than in {}, bad justification
      – Chris_Rands
      Dec 21 '18 at 16:25






    • 1




      @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:30
















    • 2




      Use set rather than list for O(1) lookup.
      – jpp
      Dec 21 '18 at 16:20






    • 3




      @teng ... inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:20






    • 1




      @teng similarly inefficient.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:23






    • 1




      in is no easier to read than in {}, bad justification
      – Chris_Rands
      Dec 21 '18 at 16:25






    • 1




      @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
      – juanpa.arrivillaga
      Dec 21 '18 at 16:30










    2




    2




    Use set rather than list for O(1) lookup.
    – jpp
    Dec 21 '18 at 16:20




    Use set rather than list for O(1) lookup.
    – jpp
    Dec 21 '18 at 16:20




    3




    3




    @teng ... inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:20




    @teng ... inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:20




    1




    1




    @teng similarly inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:23




    @teng similarly inefficient.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:23




    1




    1




    in is no easier to read than in {}, bad justification
    – Chris_Rands
    Dec 21 '18 at 16:25




    in is no easier to read than in {}, bad justification
    – Chris_Rands
    Dec 21 '18 at 16:25




    1




    1




    @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:30






    @grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
    – juanpa.arrivillaga
    Dec 21 '18 at 16:30













    2














    Given



    import itertools as it


    a = [1, 2, 2, 3, 3, 4, 5, 6]


    Code



    A simple list comprehension (similar to @cdlane's answer).



    [k for k, _ in it.groupby(a)][:5]
    # [1, 2, 3, 4, 5]


    Alternatively, in Python 3.6+:



    list(dict.fromkeys(a))[:5]
    # [1, 2, 3, 4, 5]





    share|improve this answer




























      2














      Given



      import itertools as it


      a = [1, 2, 2, 3, 3, 4, 5, 6]


      Code



      A simple list comprehension (similar to @cdlane's answer).



      [k for k, _ in it.groupby(a)][:5]
      # [1, 2, 3, 4, 5]


      Alternatively, in Python 3.6+:



      list(dict.fromkeys(a))[:5]
      # [1, 2, 3, 4, 5]





      share|improve this answer


























        2












        2








        2






        Given



        import itertools as it


        a = [1, 2, 2, 3, 3, 4, 5, 6]


        Code



        A simple list comprehension (similar to @cdlane's answer).



        [k for k, _ in it.groupby(a)][:5]
        # [1, 2, 3, 4, 5]


        Alternatively, in Python 3.6+:



        list(dict.fromkeys(a))[:5]
        # [1, 2, 3, 4, 5]





        share|improve this answer














        Given



        import itertools as it


        a = [1, 2, 2, 3, 3, 4, 5, 6]


        Code



        A simple list comprehension (similar to @cdlane's answer).



        [k for k, _ in it.groupby(a)][:5]
        # [1, 2, 3, 4, 5]


        Alternatively, in Python 3.6+:



        list(dict.fromkeys(a))[:5]
        # [1, 2, 3, 4, 5]






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 22 '18 at 8:59

























        answered Dec 21 '18 at 23:32









        pylang

        13.1k23752




        13.1k23752























            1














            Why not use something like this?



            >>> a = [1, 2, 2, 3, 3, 4, 5, 6]
            >>> list(set(a))[:5]
            [1, 2, 3, 4, 5]





            share|improve this answer





















            • If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
              – pylang
              Dec 25 '18 at 2:17










            • This is wrong as it may or may not return the first five unique elements.
              – Baum mit Augen
              yesterday
















            1














            Why not use something like this?



            >>> a = [1, 2, 2, 3, 3, 4, 5, 6]
            >>> list(set(a))[:5]
            [1, 2, 3, 4, 5]





            share|improve this answer





















            • If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
              – pylang
              Dec 25 '18 at 2:17










            • This is wrong as it may or may not return the first five unique elements.
              – Baum mit Augen
              yesterday














            1












            1








            1






            Why not use something like this?



            >>> a = [1, 2, 2, 3, 3, 4, 5, 6]
            >>> list(set(a))[:5]
            [1, 2, 3, 4, 5]





            share|improve this answer












            Why not use something like this?



            >>> a = [1, 2, 2, 3, 3, 4, 5, 6]
            >>> list(set(a))[:5]
            [1, 2, 3, 4, 5]






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Dec 22 '18 at 12:01









            Александр Трубилин

            191




            191












            • If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
              – pylang
              Dec 25 '18 at 2:17










            • This is wrong as it may or may not return the first five unique elements.
              – Baum mit Augen
              yesterday


















            • If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
              – pylang
              Dec 25 '18 at 2:17










            • This is wrong as it may or may not return the first five unique elements.
              – Baum mit Augen
              yesterday
















            If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
            – pylang
            Dec 25 '18 at 2:17




            If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
            – pylang
            Dec 25 '18 at 2:17












            This is wrong as it may or may not return the first five unique elements.
            – Baum mit Augen
            yesterday




            This is wrong as it may or may not return the first five unique elements.
            – Baum mit Augen
            yesterday


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53887803%2fgetting-first-n-unique-elements-from-python-list%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Morgemoulin

            Scott Moir

            Souastre