Getting first n unique elements from Python list
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
python python-3.x generator
|
show 5 more comments
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
python python-3.x generator
5
Try:set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
9
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
2
Your code is Pythonic enough, it is just inefficient.element not in itr[:index]
is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16
2
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
5
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49
|
show 5 more comments
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
python python-3.x generator
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n
unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
python python-3.x generator
python python-3.x generator
edited Dec 27 '18 at 11:59
jorijnsmit
544320
544320
asked Dec 21 '18 at 16:10
xssChauhan
1,361925
1,361925
5
Try:set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
9
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
2
Your code is Pythonic enough, it is just inefficient.element not in itr[:index]
is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16
2
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
5
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49
|
show 5 more comments
5
Try:set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
9
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
2
Your code is Pythonic enough, it is just inefficient.element not in itr[:index]
is not efficient, use a set
– juanpa.arrivillaga
Dec 21 '18 at 16:16
2
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
5
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49
5
5
Try:
set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
Try:
set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
9
9
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
2
2
Your code is Pythonic enough, it is just inefficient.
element not in itr[:index]
is not efficient, use a set– juanpa.arrivillaga
Dec 21 '18 at 16:16
Your code is Pythonic enough, it is just inefficient.
element not in itr[:index]
is not efficient, use a set– juanpa.arrivillaga
Dec 21 '18 at 16:16
2
2
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
5
5
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49
|
show 5 more comments
10 Answers
10
active
oldest
votes
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
Output:
[1,2,3,4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1,2,3,4,4,4,4,5]
vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
5
You can justreturn
. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
Instead ofif e in seen: continue
,yield e
andreturn
, you could also justreturn list(seen)
at the end.
– mkrieger1
Dec 21 '18 at 16:19
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
|
show 4 more comments
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
1
The alternative would be creating an infinite generator and thenitertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and doseen.add(element)
instead?
– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
add a comment |
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
add a comment |
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
1
In 3.6 order-preservingdict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32
add a comment |
Here is a Pythonic approach using itertools.takewhile()
:
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
4
By which definition is this abuse of theor
operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42
2
@cdlane By the definition in which this use ofor
is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here theseen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But thepredicate
function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
I disagree this is Pythonic,seen.add or len(seen) <= 4
shouldn't be used in a function liketakewhile
, for the smae reasons you wouldn't use it inmap
orfilter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
add a comment |
Using set
with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
add a comment |
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
You might as well useislice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
Sogroupby
retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yesgroupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted,groupby
wouldn't work for this problem. However,groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15
add a comment |
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
2
Useset
rather thanlist
for O(1) lookup.
– jpp
Dec 21 '18 at 16:20
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
in
is no easier to read thanin {}
, bad justification
– Chris_Rands
Dec 21 '18 at 16:25
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
|
show 7 more comments
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
add a comment |
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53887803%2fgetting-first-n-unique-elements-from-python-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
10 Answers
10
active
oldest
votes
10 Answers
10
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
Output:
[1,2,3,4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1,2,3,4,4,4,4,5]
vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
5
You can justreturn
. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
Instead ofif e in seen: continue
,yield e
andreturn
, you could also justreturn list(seen)
at the end.
– mkrieger1
Dec 21 '18 at 16:19
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
|
show 4 more comments
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
Output:
[1,2,3,4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1,2,3,4,4,4,4,5]
vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
5
You can justreturn
. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
Instead ofif e in seen: continue
,yield e
andreturn
, you could also justreturn list(seen)
at the end.
– mkrieger1
Dec 21 '18 at 16:19
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
|
show 4 more comments
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
Output:
[1,2,3,4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1,2,3,4,4,4,4,5]
vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1,2,2,3,3,4,5,6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1,2,2,3,3,4,5,6], 4)
print(list(k))
Output:
[1,2,3,4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1,2,3,4,4,4,4,5]
vs [1]*1000+[2]*1000+[3]*1000+[4]*1000+[5]*1000+[6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1,2,3,4,5,6})
edited Dec 21 '18 at 16:42
answered Dec 21 '18 at 16:14
Patrick Artner
21.7k62143
21.7k62143
5
You can justreturn
. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
Instead ofif e in seen: continue
,yield e
andreturn
, you could also justreturn list(seen)
at the end.
– mkrieger1
Dec 21 '18 at 16:19
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
|
show 4 more comments
5
You can justreturn
. You don't need to raise anything.
– khelwood
Dec 21 '18 at 16:15
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
Instead ofif e in seen: continue
,yield e
andreturn
, you could also justreturn list(seen)
at the end.
– mkrieger1
Dec 21 '18 at 16:19
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
5
5
You can just
return
. You don't need to raise anything.– khelwood
Dec 21 '18 at 16:15
You can just
return
. You don't need to raise anything.– khelwood
Dec 21 '18 at 16:15
1
1
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
See PEP 479.
– khelwood
Dec 21 '18 at 16:17
1
1
Instead of
if e in seen: continue
, yield e
and return
, you could also just return list(seen)
at the end.– mkrieger1
Dec 21 '18 at 16:19
Instead of
if e in seen: continue
, yield e
and return
, you could also just return list(seen)
at the end.– mkrieger1
Dec 21 '18 at 16:19
2
2
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
@mkrieger1 That would not guarantee that the items returned would be in the same order they were encountered.
– khelwood
Dec 21 '18 at 16:20
2
2
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
yielding in order :) list(set) not
– Patrick Artner
Dec 21 '18 at 16:22
|
show 4 more comments
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
1
The alternative would be creating an infinite generator and thenitertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and doseen.add(element)
instead?
– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
add a comment |
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
1
The alternative would be creating an infinite generator and thenitertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and doseen.add(element)
instead?
– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
add a comment |
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
edited Dec 21 '18 at 17:01
answered Dec 21 '18 at 16:16
jpp
92.2k2053103
92.2k2053103
1
The alternative would be creating an infinite generator and thenitertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and doseen.add(element)
instead?
– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
add a comment |
1
The alternative would be creating an infinite generator and thenitertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and doseen.add(element)
instead?
– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
1
1
The alternative would be creating an infinite generator and then
itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
The alternative would be creating an infinite generator and then
itertools.islice(gen, limit)
– Chris_Rands
Dec 21 '18 at 16:33
Why not drop line 3 in you first block of code and do
seen.add(element)
instead?– jorijnsmit
Dec 27 '18 at 11:04
Why not drop line 3 in you first block of code and do
seen.add(element)
instead?– jorijnsmit
Dec 27 '18 at 11:04
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
@jorijnsmit, It's an optimosation. One less lookup in each iteration of the for loop. You should notice the difference in very large loops.
– jpp
Dec 27 '18 at 11:28
add a comment |
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
add a comment |
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
add a comment |
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
edited Dec 21 '18 at 16:44
answered Dec 21 '18 at 16:20
Azat Ibrakov
3,91171330
3,91171330
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
add a comment |
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
1
1
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
I was thinking of this, definitely short, but it is O(N)
– Chris_Rands
Dec 21 '18 at 16:21
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
@Chris_Rands: updated
– Azat Ibrakov
Dec 21 '18 at 16:38
add a comment |
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
1
In 3.6 order-preservingdict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32
add a comment |
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
1
In 3.6 order-preservingdict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32
add a comment |
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
You can use OrderedDict
or, since Python 3.7, an ordinary dict
, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
edited Dec 21 '18 at 16:33
answered Dec 21 '18 at 16:19
Jindra Helcl
1,6391218
1,6391218
1
In 3.6 order-preservingdict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32
add a comment |
1
In 3.6 order-preservingdict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.
– glibdud
Dec 21 '18 at 16:32
1
1
In 3.6 order-preserving
dict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.– glibdud
Dec 21 '18 at 16:32
In 3.6 order-preserving
dict
s were an implementation detail (in the reference implementation... not sure how alternative interpreters handled it). It wasn't official until 3.7.– glibdud
Dec 21 '18 at 16:32
add a comment |
Here is a Pythonic approach using itertools.takewhile()
:
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
4
By which definition is this abuse of theor
operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42
2
@cdlane By the definition in which this use ofor
is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here theseen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But thepredicate
function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
I disagree this is Pythonic,seen.add or len(seen) <= 4
shouldn't be used in a function liketakewhile
, for the smae reasons you wouldn't use it inmap
orfilter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
add a comment |
Here is a Pythonic approach using itertools.takewhile()
:
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
4
By which definition is this abuse of theor
operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42
2
@cdlane By the definition in which this use ofor
is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here theseen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But thepredicate
function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
I disagree this is Pythonic,seen.add or len(seen) <= 4
shouldn't be used in a function liketakewhile
, for the smae reasons you wouldn't use it inmap
orfilter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
add a comment |
Here is a Pythonic approach using itertools.takewhile()
:
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
Here is a Pythonic approach using itertools.takewhile()
:
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
answered Dec 21 '18 at 16:37
Kasrâmvd
77.8k1089124
77.8k1089124
4
By which definition is this abuse of theor
operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42
2
@cdlane By the definition in which this use ofor
is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here theseen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But thepredicate
function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
I disagree this is Pythonic,seen.add or len(seen) <= 4
shouldn't be used in a function liketakewhile
, for the smae reasons you wouldn't use it inmap
orfilter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
add a comment |
4
By which definition is this abuse of theor
operator considered Pythonic?
– cdlane
Dec 21 '18 at 16:42
2
@cdlane By the definition in which this use ofor
is misuse.
– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here theseen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But thepredicate
function should be more explicit.
– xssChauhan
Dec 21 '18 at 16:50
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
I disagree this is Pythonic,seen.add or len(seen) <= 4
shouldn't be used in a function liketakewhile
, for the smae reasons you wouldn't use it inmap
orfilter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
4
4
By which definition is this abuse of the
or
operator considered Pythonic?– cdlane
Dec 21 '18 at 16:42
By which definition is this abuse of the
or
operator considered Pythonic?– cdlane
Dec 21 '18 at 16:42
2
2
@cdlane By the definition in which this use of
or
is misuse.– Kasrâmvd
Dec 21 '18 at 16:47
@cdlane By the definition in which this use of
or
is misuse.– Kasrâmvd
Dec 21 '18 at 16:47
I think a proper function should be used instead of a lambda. Here the
seen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate
function should be more explicit.– xssChauhan
Dec 21 '18 at 16:50
I think a proper function should be used instead of a lambda. Here the
seen.add
is not returning a boolean value, and still being used for truth checking. Your implementation saves us writing a generator function, which is welcome suggestion. But the predicate
function should be more explicit.– xssChauhan
Dec 21 '18 at 16:50
3
3
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
We've different concepts of Pythonic: To be Pythonic is to use the Python constructs and data structures with clean, readable idioms.
– cdlane
Dec 21 '18 at 17:08
1
1
I disagree this is Pythonic,
seen.add or len(seen) <= 4
shouldn't be used in a function like takewhile
, for the smae reasons you wouldn't use it in map
or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
I disagree this is Pythonic,
seen.add or len(seen) <= 4
shouldn't be used in a function like takewhile
, for the smae reasons you wouldn't use it in map
or filter
– juanpa.arrivillaga
Dec 21 '18 at 17:38
add a comment |
Using set
with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
add a comment |
Using set
with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
add a comment |
Using set
with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
Using set
with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
answered Dec 21 '18 at 16:17
W-B
102k73163
102k73163
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
add a comment |
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
1
1
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
This is inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:25
3
3
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
@xssChauhan this will return it in order, but this is inefficient O(n^2 * log n) I believe. You can do this in O(N)
– juanpa.arrivillaga
Dec 21 '18 at 16:29
add a comment |
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
You might as well useislice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
Sogroupby
retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yesgroupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted,groupby
wouldn't work for this problem. However,groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15
add a comment |
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
You might as well useislice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
Sogroupby
retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yesgroupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted,groupby
wouldn't work for this problem. However,groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15
add a comment |
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby
function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice
instead of enumerate
per @juanpa.arrivillaga. You don't even need a set
to keep track of duplicates.
edited Dec 21 '18 at 17:49
answered Dec 21 '18 at 17:01
cdlane
17.3k21043
17.3k21043
You might as well useislice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
Sogroupby
retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yesgroupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted,groupby
wouldn't work for this problem. However,groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15
add a comment |
You might as well useislice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
Sogroupby
retains order, nice, but is it an implementation detail or a feature?
– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yesgroupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted,groupby
wouldn't work for this problem. However,groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.
– cdlane
Dec 22 '18 at 22:15
You might as well use
islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
You might as well use
islice
– juanpa.arrivillaga
Dec 21 '18 at 17:39
So
groupby
retains order, nice, but is it an implementation detail or a feature?– kubanczyk
Dec 22 '18 at 9:39
So
groupby
retains order, nice, but is it an implementation detail or a feature?– kubanczyk
Dec 22 '18 at 9:39
@kubanczyk, yes
groupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby
wouldn't work for this problem. However, groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.– cdlane
Dec 22 '18 at 22:15
@kubanczyk, yes
groupby
is mostly used with sorted data, where it becomes an aggregator. If the OP's data weren't sorted, groupby
wouldn't work for this problem. However, groupy
can be used with unsorted data to solve some other problems. In that case it can be used to detect when data changes.– cdlane
Dec 22 '18 at 22:15
add a comment |
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
2
Useset
rather thanlist
for O(1) lookup.
– jpp
Dec 21 '18 at 16:20
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
in
is no easier to read thanin {}
, bad justification
– Chris_Rands
Dec 21 '18 at 16:25
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
|
show 7 more comments
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
2
Useset
rather thanlist
for O(1) lookup.
– jpp
Dec 21 '18 at 16:20
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
in
is no easier to read thanin {}
, bad justification
– Chris_Rands
Dec 21 '18 at 16:25
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
|
show 7 more comments
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res =
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
edited Dec 21 '18 at 16:47
answered Dec 21 '18 at 16:18
grapes
2,6321115
2,6321115
2
Useset
rather thanlist
for O(1) lookup.
– jpp
Dec 21 '18 at 16:20
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
in
is no easier to read thanin {}
, bad justification
– Chris_Rands
Dec 21 '18 at 16:25
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
|
show 7 more comments
2
Useset
rather thanlist
for O(1) lookup.
– jpp
Dec 21 '18 at 16:20
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
in
is no easier to read thanin {}
, bad justification
– Chris_Rands
Dec 21 '18 at 16:25
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
2
2
Use
set
rather than list
for O(1) lookup.– jpp
Dec 21 '18 at 16:20
Use
set
rather than list
for O(1) lookup.– jpp
Dec 21 '18 at 16:20
3
3
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
@teng ... inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:20
1
1
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
@teng similarly inefficient.
– juanpa.arrivillaga
Dec 21 '18 at 16:23
1
1
in
is no easier to read than in {}
, bad justification– Chris_Rands
Dec 21 '18 at 16:25
in
is no easier to read than in {}
, bad justification– Chris_Rands
Dec 21 '18 at 16:25
1
1
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
@grapes but this is time-inefficient. Also, who cares about the line numbers? Do you suffer from a lack of lines? Didn'tsee your response to me. Yes, I agree, this implementation would work and is at least correct. I didn't downvote, btw.
– juanpa.arrivillaga
Dec 21 '18 at 16:30
|
show 7 more comments
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
add a comment |
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
add a comment |
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to @cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
edited Dec 22 '18 at 8:59
answered Dec 21 '18 at 23:32
pylang
13.1k23752
13.1k23752
add a comment |
add a comment |
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
add a comment |
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
add a comment |
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
answered Dec 22 '18 at 12:01
Александр Трубилин
191
191
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
add a comment |
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
If order is not a strict requirement, then this works. Keep in mind, sets are unordered.
– pylang
Dec 25 '18 at 2:17
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
This is wrong as it may or may not return the first five unique elements.
– Baum mit Augen
yesterday
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53887803%2fgetting-first-n-unique-elements-from-python-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Try:
set(a)[:n]
– Tony Pellerin
Dec 21 '18 at 16:11
9
@TonyPellerin does not guarantee you get the first 5 elements
– juanpa.arrivillaga
Dec 21 '18 at 16:14
2
Your code is Pythonic enough, it is just inefficient.
element not in itr[:index]
is not efficient, use a set– juanpa.arrivillaga
Dec 21 '18 at 16:16
2
Is the list always sorted?
– user8408080
Dec 21 '18 at 16:16
5
for the future: if your code works and you need to improve it, it is better to post it on codereview.stackexchange.com
– Azat Ibrakov
Dec 21 '18 at 16:49