Getting first n unique elements from Python list

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

I have come up with a solution using generators:

def iterate(itr, upper=5):



    count = 0

    for index, element in enumerate(itr):

        if index==0:

            count += 1

            yield element



        elif element not in itr[:index] and count<upper:

            count += 1

            yield element

In use:

>>> i = iterate(a, 5)

>>> [e for e in i]

[1,2,3,4,5]

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

5

Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11

9

@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14

2

Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16

2

Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16

5

for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49

|
show 5 more comments

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

I have come up with a solution using generators:

def iterate(itr, upper=5):



    count = 0

    for index, element in enumerate(itr):

        if index==0:

            count += 1

            yield element



        elif element not in itr[:index] and count<upper:

            count += 1

            yield element

In use:

>>> i = iterate(a, 5)

>>> [e for e in i]

[1,2,3,4,5]

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

5

Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11

9

@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14

2

Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16

2

Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16

5

for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49

|
show 5 more comments

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

I have come up with a solution using generators:

def iterate(itr, upper=5):



    count = 0

    for index, element in enumerate(itr):

        if index==0:

            count += 1

            yield element



        elif element not in itr[:index] and count<upper:

            count += 1

            yield element

In use:

>>> i = iterate(a, 5)

>>> [e for e in i]

[1,2,3,4,5]

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

I have a python list where elements can repeat.

>>> a = [1,2,2,3,3,4,5,6]

I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:

[1,2,3,4,5]

I have come up with a solution using generators:

def iterate(itr, upper=5):



    count = 0

    for index, element in enumerate(itr):

        if index==0:

            count += 1

            yield element



        elif element not in itr[:index] and count<upper:

            count += 1

            yield element

In use:

>>> i = iterate(a, 5)

>>> [e for e in i]

[1,2,3,4,5]

I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?

python python-3.x generator

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

edited Dec 27 '18 at 11:59

jorijnsmit

544320

edited Dec 27 '18 at 11:59

jorijnsmit

544320

edited Dec 27 '18 at 11:59

jorijnsmit

544320

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

asked Dec 21 '18 at 16:10

xssChauhan

1,361925

5

Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11

9

@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14

2

Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16

2

Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16

5

for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49

|
show 5 more comments

5

Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11

9

@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14

2

Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16

2

Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16

5

for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49

Try: set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11

@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14

Your code is Pythonic enough, it is just inefficient. element not in itr[:index] is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16

Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16

for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49

|
show 5 more comments

10 Answers
10

active

oldest

votes

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]



def get_unique_N(iterable, N):

    """Yields (in order) the first N unique elements of iterable. 

    Might yield less if data too short."""

    seen = set()

    for e in iterable:

        if e in seen:

            continue

        seen.add(e)

        yield e

        if len(seen) == N:

            return



k = get_unique_N([1,2,2,3,3,4,5,6], 4)

print(list(k))

Output:

[1,2,3,4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)

mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

5

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

1

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

1

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

2

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

2

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

|
show 4 more comments

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element

        if len(seen) == limit:

            break



a = [1,2,2,3,3,4,5,6]



res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice



def unique_everseen(iterable):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice

from more_itertools import unique_everseen

from toolz import unique



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

1

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

add a comment |

If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like

from collections import OrderedDict





def nub(iterable):

    """Returns unique elements preserving order."""

    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice





def iterate(itr, upper=5):

    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):

    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):

    seen = set()

    add_seen = seen.add

    for element in iterable:

        if element in seen:

            continue

        yield element

        add_seen(element)

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

1

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

add a comment |

You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

N = 3

a = [1, 2, 2, 3, 3, 3, 4]

d = {x: True for x in a}

list(d.keys())[:N]

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

1

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

add a comment |

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile



In [96]: seen = set()



In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))

Out[97]: {1, 2, 3, 4}

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

4

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

2

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

3

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

1

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

add a comment |

Using set with sorted+ key

sorted(set(a), key=list(a).index)[:5]

Out[136]: [1, 2, 3, 4, 5]

answered Dec 21 '18 at 16:17

W-B

102k73163

1

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

3

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

add a comment |

Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

from itertools import groupby, islice



def first_unique(data, upper):

    return islice((key for (key, _) in groupby(data)), 0, upper)



a = [1, 2, 2, 3, 3, 4, 5, 6]



print(list(first_unique(a, 5)))

Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

add a comment |

There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.

a = [1,2,2,3,3,4,5,6]

res = 

for x in a:

    if x not in res:  # yes, not optimal, but doesnt need additional dict

        res.append(x)

        if len(res) == 5:

            break

print(res)

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

2

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

3

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

1

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

1

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

1

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

|
show 7 more comments

Given

import itertools as it





a = [1, 2, 2, 3, 3, 4, 5, 6]

Code

A simple list comprehension (similar to @cdlane's answer).

[k for k, _ in it.groupby(a)][:5]

# [1, 2, 3, 4, 5]

Alternatively, in Python 3.6+:

list(dict.fromkeys(a))[:5]

# [1, 2, 3, 4, 5]

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

add a comment |

Why not use something like this?

>>> a = [1, 2, 2, 3, 3, 4, 5, 6]

>>> list(set(a))[:5]

[1, 2, 3, 4, 5]

answered Dec 22 '18 at 12:01

Александр Трубилин

191

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53887803%2fgetting-first-n-unique-elements-from-python-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

10 Answers
10

active

oldest

votes

10 Answers
10

active

oldest

votes

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]



def get_unique_N(iterable, N):

    """Yields (in order) the first N unique elements of iterable. 

    Might yield less if data too short."""

    seen = set()

    for e in iterable:

        if e in seen:

            continue

        seen.add(e)

        yield e

        if len(seen) == N:

            return



k = get_unique_N([1,2,2,3,3,4,5,6], 4)

print(list(k))

Output:

[1,2,3,4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)

mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

5

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

1

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

1

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

2

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

2

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

|
show 4 more comments

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]



def get_unique_N(iterable, N):

    """Yields (in order) the first N unique elements of iterable. 

    Might yield less if data too short."""

    seen = set()

    for e in iterable:

        if e in seen:

            continue

        seen.add(e)

        yield e

        if len(seen) == N:

            return



k = get_unique_N([1,2,2,3,3,4,5,6], 4)

print(list(k))

Output:

[1,2,3,4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)

mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

5

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

1

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

1

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

2

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

2

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

|
show 4 more comments

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]



def get_unique_N(iterable, N):

    """Yields (in order) the first N unique elements of iterable. 

    Might yield less if data too short."""

    seen = set()

    for e in iterable:

        if e in seen:

            continue

        seen.add(e)

        yield e

        if len(seen) == N:

            return



k = get_unique_N([1,2,2,3,3,4,5,6], 4)

print(list(k))

Output:

[1,2,3,4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)

mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1,2,2,3,3,4,5,6]



def get_unique_N(iterable, N):

    """Yields (in order) the first N unique elements of iterable. 

    Might yield less if data too short."""

    seen = set()

    for e in iterable:

        if e in seen:

            continue

        seen.add(e)

        yield e

        if len(seen) == N:

            return



k = get_unique_N([1,2,2,3,3,4,5,6], 4)

print(list(k))

Output:

[1,2,3,4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Consider [1,2,3,4,4,4,4,5] vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)

mine would have 5001*O(1) lookup + memory for set( {1,2,3,4,5,6})

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

edited Dec 21 '18 at 16:42

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

answered Dec 21 '18 at 16:14

Patrick Artner

21.7k62143

5

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

1

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

1

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

2

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

2

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

|
show 4 more comments

5

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

1

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

1

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

2

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

2

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

You can just return. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15

See PEP 479.
– khelwood
Dec 21 '18 at 16:17

Instead of if e in seen: continue, yield e and return, you could also just return list(seen) at the end.
– mkrieger1
Dec 21 '18 at 16:19

@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20

yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22

|
show 4 more comments

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element

        if len(seen) == limit:

            break



a = [1,2,2,3,3,4,5,6]



res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice



def unique_everseen(iterable):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice

from more_itertools import unique_everseen

from toolz import unique



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

1

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

add a comment |

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element

        if len(seen) == limit:

            break



a = [1,2,2,3,3,4,5,6]



res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice



def unique_everseen(iterable):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice

from more_itertools import unique_everseen

from toolz import unique



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

1

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

add a comment |

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element

        if len(seen) == limit:

            break



a = [1,2,2,3,3,4,5,6]



res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice



def unique_everseen(iterable):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice

from more_itertools import unique_everseen

from toolz import unique



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element

        if len(seen) == limit:

            break



a = [1,2,2,3,3,4,5,6]



res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice



def unique_everseen(iterable):

    seen = set()

    seen_add = seen.add

    for element in iterable:

        if element not in seen:

            seen_add(element)

            yield element



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice

from more_itertools import unique_everseen

from toolz import unique



res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

edited Dec 21 '18 at 17:01

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

answered Dec 21 '18 at 16:16

jpp

92.2k2053103

1

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

add a comment |

1

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

The alternative would be creating an infinite generator and then itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33

Why not drop line 3 in you first block of code and do seen.add(element) instead?
– jorijnsmit
Dec 27 '18 at 11:04

@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28

add a comment |

from collections import OrderedDict





def nub(iterable):

    """Returns unique elements preserving order."""

    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice





def iterate(itr, upper=5):

    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):

    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):

    seen = set()

    add_seen = seen.add

    for element in iterable:

        if element in seen:

            continue

        yield element

        add_seen(element)

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

1

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

add a comment |

from collections import OrderedDict





def nub(iterable):

    """Returns unique elements preserving order."""

    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice





def iterate(itr, upper=5):

    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):

    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):

    seen = set()

    add_seen = seen.add

    for element in iterable:

        if element in seen:

            continue

        yield element

        add_seen(element)

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

1

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

add a comment |

from collections import OrderedDict





def nub(iterable):

    """Returns unique elements preserving order."""

    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice





def iterate(itr, upper=5):

    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):

    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):

    seen = set()

    add_seen = seen.add

    for element in iterable:

        if element in seen:

            continue

        yield element

        add_seen(element)

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

from collections import OrderedDict





def nub(iterable):

    """Returns unique elements preserving order."""

    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice





def iterate(itr, upper=5):

    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):

    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):

    seen = set()

    add_seen = seen.add

    for element in iterable:

        if element in seen:

            continue

        yield element

        add_seen(element)

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

edited Dec 21 '18 at 16:44

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

answered Dec 21 '18 at 16:20

Azat Ibrakov

3,91171330

1

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

add a comment |

1

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21

@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38

add a comment |

You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

N = 3

a = [1, 2, 2, 3, 3, 3, 4]

d = {x: True for x in a}

list(d.keys())[:N]

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

1

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

add a comment |

You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

N = 3

a = [1, 2, 2, 3, 3, 3, 4]

d = {x: True for x in a}

list(d.keys())[:N]

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

1

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

add a comment |

You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

N = 3

a = [1, 2, 2, 3, 3, 3, 4]

d = {x: True for x in a}

list(d.keys())[:N]

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.

N = 3

a = [1, 2, 2, 3, 3, 3, 4]

d = {x: True for x in a}

list(d.keys())[:N]

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

edited Dec 21 '18 at 16:33

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

answered Dec 21 '18 at 16:19

Jindra Helcl

1,6391218

1

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

add a comment |

1

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

In 3.6 order-preserving dicts were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32

add a comment |

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile



In [96]: seen = set()



In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))

Out[97]: {1, 2, 3, 4}

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

4

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

2

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

3

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

1

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

add a comment |

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile



In [96]: seen = set()



In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))

Out[97]: {1, 2, 3, 4}

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

4

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

2

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

3

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

1

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

add a comment |

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile



In [96]: seen = set()



In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))

Out[97]: {1, 2, 3, 4}

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile



In [96]: seen = set()



In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))

Out[97]: {1, 2, 3, 4}

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

answered Dec 21 '18 at 16:37

Kasrâmvd

77.8k1089124

4

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

2

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

3

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

1

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

add a comment |

4

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

2

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

3

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

1

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

By which definition is this abuse of the or operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42

@cdlane By the definition in which this use of or is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47

I think a proper function should be used instead of a lambda. Here the seen.add is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50

We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08

I disagree this is Pythonic, seen.add or len(seen) <= 4 shouldn't be used in a function like takewhile, for the smae reasons you wouldn't use it in map or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38

add a comment |

Using set with sorted+ key

sorted(set(a), key=list(a).index)[:5]

Out[136]: [1, 2, 3, 4, 5]

answered Dec 21 '18 at 16:17

W-B

102k73163

1

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

3

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

add a comment |

Using set with sorted+ key

sorted(set(a), key=list(a).index)[:5]

Out[136]: [1, 2, 3, 4, 5]

answered Dec 21 '18 at 16:17

W-B

102k73163

1

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

3

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

add a comment |

Using set with sorted+ key

sorted(set(a), key=list(a).index)[:5]

Out[136]: [1, 2, 3, 4, 5]

answered Dec 21 '18 at 16:17

W-B

102k73163

Using set with sorted+ key

sorted(set(a), key=list(a).index)[:5]

Out[136]: [1, 2, 3, 4, 5]

answered Dec 21 '18 at 16:17

W-B

102k73163

answered Dec 21 '18 at 16:17

W-B

102k73163

answered Dec 21 '18 at 16:17

W-B

102k73163

answered Dec 21 '18 at 16:17

W-B

102k73163

1

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

3

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

add a comment |

1

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

3

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25

@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29

add a comment |

Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

from itertools import groupby, islice



def first_unique(data, upper):

    return islice((key for (key, _) in groupby(data)), 0, upper)



a = [1, 2, 2, 3, 3, 4, 5, 6]



print(list(first_unique(a, 5)))

Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

add a comment |

Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

from itertools import groupby, islice



def first_unique(data, upper):

    return islice((key for (key, _) in groupby(data)), 0, upper)



a = [1, 2, 2, 3, 3, 4, 5, 6]



print(list(first_unique(a, 5)))

Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

add a comment |

Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

from itertools import groupby, islice



def first_unique(data, upper):

    return islice((key for (key, _) in groupby(data)), 0, upper)



a = [1, 2, 2, 3, 3, 4, 5, 6]



print(list(first_unique(a, 5)))

Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:

from itertools import groupby, islice



def first_unique(data, upper):

    return islice((key for (key, _) in groupby(data)), 0, upper)



a = [1, 2, 2, 3, 3, 4, 5, 6]



print(list(first_unique(a, 5)))

Updated to use islice instead of enumerate per @juanpa.arrivillaga. You don't even need a set to keep track of duplicates.

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

edited Dec 21 '18 at 17:49

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

answered Dec 21 '18 at 17:01

cdlane

17.3k21043

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

add a comment |

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

You might as well use islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39

So groupby retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39

@kubanczyk, yes groupby is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby wouldn't work for this problem. However, groupy can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15

add a comment |

a = [1,2,2,3,3,4,5,6]

res = 

for x in a:

    if x not in res:  # yes, not optimal, but doesnt need additional dict

        res.append(x)

        if len(res) == 5:

            break

print(res)

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

2

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

3

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

1

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

1

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

1

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

|
show 7 more comments

a = [1,2,2,3,3,4,5,6]

res = 

for x in a:

    if x not in res:  # yes, not optimal, but doesnt need additional dict

        res.append(x)

        if len(res) == 5:

            break

print(res)

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

2

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

3

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

1

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

1

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

1

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

|
show 7 more comments

a = [1,2,2,3,3,4,5,6]

res = 

for x in a:

    if x not in res:  # yes, not optimal, but doesnt need additional dict

        res.append(x)

        if len(res) == 5:

            break

print(res)

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

a = [1,2,2,3,3,4,5,6]

res = 

for x in a:

    if x not in res:  # yes, not optimal, but doesnt need additional dict

        res.append(x)

        if len(res) == 5:

            break

print(res)

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

edited Dec 21 '18 at 16:47

answered Dec 21 '18 at 16:18

grapes

2,6321115

answered Dec 21 '18 at 16:18

grapes

2,6321115

answered Dec 21 '18 at 16:18

grapes

2,6321115

2

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

3

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

1

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

1

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

1

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

|
show 7 more comments

2

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

3

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

1

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

1

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

1

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

Use set rather than list for O(1) lookup.
– jpp
Dec 21 '18 at 16:20

@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20

@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23

in is no easier to read than in {}, bad justification
– Chris_Rands
Dec 21 '18 at 16:25

@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30

|
show 7 more comments

Given

import itertools as it





a = [1, 2, 2, 3, 3, 4, 5, 6]

Code

A simple list comprehension (similar to @cdlane's answer).

[k for k, _ in it.groupby(a)][:5]

# [1, 2, 3, 4, 5]

Alternatively, in Python 3.6+:

list(dict.fromkeys(a))[:5]

# [1, 2, 3, 4, 5]

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

add a comment |

Given

import itertools as it





a = [1, 2, 2, 3, 3, 4, 5, 6]

Code

A simple list comprehension (similar to @cdlane's answer).

[k for k, _ in it.groupby(a)][:5]

# [1, 2, 3, 4, 5]

Alternatively, in Python 3.6+:

list(dict.fromkeys(a))[:5]

# [1, 2, 3, 4, 5]

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

add a comment |

Given

import itertools as it





a = [1, 2, 2, 3, 3, 4, 5, 6]

Code

A simple list comprehension (similar to @cdlane's answer).

[k for k, _ in it.groupby(a)][:5]

# [1, 2, 3, 4, 5]

Alternatively, in Python 3.6+:

list(dict.fromkeys(a))[:5]

# [1, 2, 3, 4, 5]

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

Given

import itertools as it





a = [1, 2, 2, 3, 3, 4, 5, 6]

Code

A simple list comprehension (similar to @cdlane's answer).

[k for k, _ in it.groupby(a)][:5]

# [1, 2, 3, 4, 5]

Alternatively, in Python 3.6+:

list(dict.fromkeys(a))[:5]

# [1, 2, 3, 4, 5]

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

edited Dec 22 '18 at 8:59

answered Dec 21 '18 at 23:32

pylang

13.1k23752

answered Dec 21 '18 at 23:32

pylang

13.1k23752

answered Dec 21 '18 at 23:32

pylang

13.1k23752

add a comment |

Why not use something like this?

>>> a = [1, 2, 2, 3, 3, 4, 5, 6]

>>> list(set(a))[:5]

[1, 2, 3, 4, 5]

answered Dec 22 '18 at 12:01

Александр Трубилин

191

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

add a comment |

Why not use something like this?

>>> a = [1, 2, 2, 3, 3, 4, 5, 6]

>>> list(set(a))[:5]

[1, 2, 3, 4, 5]

answered Dec 22 '18 at 12:01

Александр Трубилин

191

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

add a comment |

Why not use something like this?

>>> a = [1, 2, 2, 3, 3, 4, 5, 6]

>>> list(set(a))[:5]

[1, 2, 3, 4, 5]

answered Dec 22 '18 at 12:01

Александр Трубилин

191

Why not use something like this?

>>> a = [1, 2, 2, 3, 3, 4, 5, 6]

>>> list(set(a))[:5]

[1, 2, 3, 4, 5]

answered Dec 22 '18 at 12:01

Александр Трубилин

191

answered Dec 22 '18 at 12:01

Александр Трубилин

191

answered Dec 22 '18 at 12:01

Александр Трубилин

191

answered Dec 22 '18 at 12:01

Александр Трубилин

191

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

add a comment |

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17

This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk

Getting first n unique elements from Python list

10 Answers
10

Improvements

Your Answer

Post as a guest

10 Answers
10

10 Answers
10

Improvements

Improvements

Improvements

Improvements

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

Getting first n unique elements from Python list

10 Answers 10

Improvements

Your Answer

Sign up or log in

Post as a guest

Post as a guest

10 Answers 10

10 Answers 10

Improvements

Improvements

Improvements

Improvements

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

10 Answers
10

10 Answers
10

10 Answers
10