Custom implementation of the linq Zip operator for different length lists
up vote
6
down vote
favorite
Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.
My implementation:
using System;
using System.Collections.Generic;
using System.Linq;
namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};
var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);
foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();
if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}
if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}
if (!(i1 && i2))
{
break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}
And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.
c# linq
add a comment |
up vote
6
down vote
favorite
Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.
My implementation:
using System;
using System.Collections.Generic;
using System.Linq;
namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};
var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);
foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();
if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}
if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}
if (!(i1 && i2))
{
break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}
And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.
c# linq
2
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
4
NeverReset
an enumerator. Just get a new enumerator.Reset
was a misfeature intended for COM interop scenarios.
– Eric Lippert
Nov 21 at 19:45
add a comment |
up vote
6
down vote
favorite
up vote
6
down vote
favorite
Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.
My implementation:
using System;
using System.Collections.Generic;
using System.Linq;
namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};
var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);
foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();
if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}
if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}
if (!(i1 && i2))
{
break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}
And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.
c# linq
Based on my answer I have my implementation of linq Zip operator which operates on different length lists, and loops shortest list.
My implementation:
using System;
using System.Collections.Generic;
using System.Linq;
namespace SO
{
internal class Program
{
public static void Main(string args)
{
List<String> listA = new List<string> {"a", "b", "c", "d", "e", "f", "g"};
List<String> listB = new List<string> {"1", "2", "3"};
var mix = listA.ZipNew(listB, (l1, l2) => new {l1, l2}).SelectMany(x => x);
foreach (var m in mix)
{
Console.WriteLine(m);
}
}
}
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
while(true)
{
i1 = iterator1.MoveNext();
i2 = iterator2.MoveNext();
if (!i1 && (i1Shorter || firstRun))
{
iterator1.Reset();
i1 = iterator1.MoveNext();
i1Shorter = true;
firstRun = false;
}
if (!i2 && (i2Shorter || firstRun))
{
iterator2.Reset();
i2 = iterator2.MoveNext();
i2Shorter = true;
firstRun = false;
}
if (!(i1 && i2))
{
break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
}
}
And I wonder if this implementation could be improved somehow, what could be improved, for better readability or speed.
c# linq
c# linq
edited Nov 21 at 14:34
asked Nov 21 at 14:24
BWA
16627
16627
2
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
4
NeverReset
an enumerator. Just get a new enumerator.Reset
was a misfeature intended for COM interop scenarios.
– Eric Lippert
Nov 21 at 19:45
add a comment |
2
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
4
NeverReset
an enumerator. Just get a new enumerator.Reset
was a misfeature intended for COM interop scenarios.
– Eric Lippert
Nov 21 at 19:45
2
2
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
4
4
Never
Reset
an enumerator. Just get a new enumerator. Reset
was a misfeature intended for COM interop scenarios.– Eric Lippert
Nov 21 at 19:45
Never
Reset
an enumerator. Just get a new enumerator. Reset
was a misfeature intended for COM interop scenarios.– Eric Lippert
Nov 21 at 19:45
add a comment |
3 Answers
3
active
oldest
votes
up vote
8
down vote
I noticed a few things that can be improved:
- Not all enumerators support
Reset
. Generator methods don't, for example, so callingZipNew
on the result of aZipNew
call will fail with aNotSupportedException
. Obtaining a new enumerator should work, at the cost of having to replace the convenientusing
statements withtry/finally
constructions. Edit: As Eric pointed out,Reset
should not be used at all. It's been abandoned. - There's no need to call
Reset
(or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that. - Passing
null
causes either an unspecificNullReferenceException
or anArgumentNullException
with parameter namesource
to be thrown. ThrowingArgumentNullException
s with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.
i1
andi2
can be declared inside the while loop.
Addendum:
As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using
. I would take a slightly different approach by creating a repeatable enumerator class:
class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;
public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}
public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}
public bool MoveNext() => _enumerator.MoveNext();
public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}
Which can then be used for both enumerables (and possibly in other extension methods as well):
public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;
while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);
var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}
At this point it would be a good idea to add some documentation...
1
Worth adding that if you're going to throwArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.
– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at whatSystem.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
|
show 1 more comment
up vote
7
down vote
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
Names? The class would be more descriptive as something like LinqExtensions
; the method something like ZipLooped
.
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
The iterators have useful names, but what does i1
mean? And why five variables to track the state of two iterators? IMO it would be simpler as
var firstEnded = false;
var secondEnded = false;
while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
and the almost repeated code might be worth pulling out as an inner method:
var firstEnded = false;
var secondEnded = false;
bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}
while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}
I notice that you've decided to yield break
if either of the enumerables is empty. Would an exception be a better choice?
I'd renameadvance
inTryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break
) if both are empty.
– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare thisnew { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.
– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as perAverage
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
Nov 21 at 17:19
|
show 7 more comments
up vote
5
down vote
If you should respect that not all Enumerators
implement Reset()
then it is not possible to use using
statements for the two IEnumerators
. But you could introduce an IEnumerator<TResult>
for the zipped result and use it like this:
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return InnerZipNew(first, second, resultSelector);
}
private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}
As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.
In this way you're back on the using track.
The ZipEnumerator
it self could be something like:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();
if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}
if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}
return true;
}
}
return false;
}
public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}
It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator
itself is disposed off?
The MoveNext()
method went a little more complicated than I like, so feel free to edit or suggest improvements.
Edit
A refactored version of ZipEnumerator
:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
DoDispose();
}
private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}
private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private Func<bool> CurrentMover = null;
private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}
return false;
}
private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}
return true;
}
return false;
}
private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}
public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}
public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... althoughts
andss
were below that level ;-]
– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,tt
andss
on the other hand werepublic
.
– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using theyield return
lazy evaluation. Example.
– JAD
2 days ago
1
@HenrikHansen I saw that. The problem is that if a method is using theyield
method of returning aIEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point theZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
2 days ago
|
show 2 more comments
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
I noticed a few things that can be improved:
- Not all enumerators support
Reset
. Generator methods don't, for example, so callingZipNew
on the result of aZipNew
call will fail with aNotSupportedException
. Obtaining a new enumerator should work, at the cost of having to replace the convenientusing
statements withtry/finally
constructions. Edit: As Eric pointed out,Reset
should not be used at all. It's been abandoned. - There's no need to call
Reset
(or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that. - Passing
null
causes either an unspecificNullReferenceException
or anArgumentNullException
with parameter namesource
to be thrown. ThrowingArgumentNullException
s with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.
i1
andi2
can be declared inside the while loop.
Addendum:
As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using
. I would take a slightly different approach by creating a repeatable enumerator class:
class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;
public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}
public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}
public bool MoveNext() => _enumerator.MoveNext();
public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}
Which can then be used for both enumerables (and possibly in other extension methods as well):
public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;
while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);
var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}
At this point it would be a good idea to add some documentation...
1
Worth adding that if you're going to throwArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.
– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at whatSystem.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
|
show 1 more comment
up vote
8
down vote
I noticed a few things that can be improved:
- Not all enumerators support
Reset
. Generator methods don't, for example, so callingZipNew
on the result of aZipNew
call will fail with aNotSupportedException
. Obtaining a new enumerator should work, at the cost of having to replace the convenientusing
statements withtry/finally
constructions. Edit: As Eric pointed out,Reset
should not be used at all. It's been abandoned. - There's no need to call
Reset
(or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that. - Passing
null
causes either an unspecificNullReferenceException
or anArgumentNullException
with parameter namesource
to be thrown. ThrowingArgumentNullException
s with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.
i1
andi2
can be declared inside the while loop.
Addendum:
As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using
. I would take a slightly different approach by creating a repeatable enumerator class:
class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;
public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}
public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}
public bool MoveNext() => _enumerator.MoveNext();
public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}
Which can then be used for both enumerables (and possibly in other extension methods as well):
public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;
while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);
var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}
At this point it would be a good idea to add some documentation...
1
Worth adding that if you're going to throwArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.
– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at whatSystem.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
|
show 1 more comment
up vote
8
down vote
up vote
8
down vote
I noticed a few things that can be improved:
- Not all enumerators support
Reset
. Generator methods don't, for example, so callingZipNew
on the result of aZipNew
call will fail with aNotSupportedException
. Obtaining a new enumerator should work, at the cost of having to replace the convenientusing
statements withtry/finally
constructions. Edit: As Eric pointed out,Reset
should not be used at all. It's been abandoned. - There's no need to call
Reset
(or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that. - Passing
null
causes either an unspecificNullReferenceException
or anArgumentNullException
with parameter namesource
to be thrown. ThrowingArgumentNullException
s with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.
i1
andi2
can be declared inside the while loop.
Addendum:
As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using
. I would take a slightly different approach by creating a repeatable enumerator class:
class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;
public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}
public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}
public bool MoveNext() => _enumerator.MoveNext();
public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}
Which can then be used for both enumerables (and possibly in other extension methods as well):
public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;
while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);
var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}
At this point it would be a good idea to add some documentation...
I noticed a few things that can be improved:
- Not all enumerators support
Reset
. Generator methods don't, for example, so callingZipNew
on the result of aZipNew
call will fail with aNotSupportedException
. Obtaining a new enumerator should work, at the cost of having to replace the convenientusing
statements withtry/finally
constructions. Edit: As Eric pointed out,Reset
should not be used at all. It's been abandoned. - There's no need to call
Reset
(or rather, to get a new enumerator) when a collection is empty. I'd probably add a special case for that. - Passing
null
causes either an unspecificNullReferenceException
or anArgumentNullException
with parameter namesource
to be thrown. ThrowingArgumentNullException
s with accurate parameter names would be more helpful. Edit: As JAD pointed out, this is trickier than it looks. You'll have to split the method into an eager non-yielding method and a lazy yielding method. A local function should be useful here.
i1
andi2
can be declared inside the while loop.
Addendum:
As Henrik's answer shows, a helper class can be useful for properly repeating enumerators without having to give up on using
. I would take a slightly different approach by creating a repeatable enumerator class:
class RepeatableEnumerator<T> : IDisposable
{
private IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
public bool IsRepeating { get; private set; }
public T Current => _enumerator.Current;
public RepeatableEnumerator(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
_enumerator = enumerable.GetEnumerator();
}
public void Dispose()
{
_enumerator.Dispose();
_enumerator = null;
}
public bool MoveNext() => _enumerator.MoveNext();
public bool Repeat()
{
IsRepeating = true;
_enumerator.Dispose();
_enumerator = _enumerable.GetEnumerator();
return _enumerator.MoveNext();
}
}
Which can then be used for both enumerables (and possibly in other extension methods as well):
public static IEnumerable<TResult> ZipLongest<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
// Eager parameter validation:
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
// Local function for lazy zipping:
IEnumerable<TResult> ZipLongestImpl()
{
using (var enum1 = new RepeatableEnumerator<TFirst>(first))
using (var enum2 = new RepeatableEnumerator<TSecond>(second))
{
// Up-front check for empty collections:
if (!enum1.MoveNext() || !enum2.MoveNext())
yield break;
while (true)
{
yield return resultSelector(enum1.Current, enum2.Current);
var is1Empty = !enum1.MoveNext();
var is2Empty = !enum2.MoveNext();
if (is1Empty)
{
if (enum2.IsRepeating || is2Empty || !enum1.Repeat())
yield break;
}
else if (is2Empty)
{
if (enum1.IsRepeating || !enum2.Repeat())
yield break;
}
}
}
}
return ZipLongestImpl();
}
At this point it would be a good idea to add some documentation...
edited yesterday
answered Nov 21 at 15:12
Pieter Witvoet
4,971723
4,971723
1
Worth adding that if you're going to throwArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.
– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at whatSystem.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
|
show 1 more comment
1
Worth adding that if you're going to throwArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.
– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at whatSystem.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
1
1
Worth adding that if you're going to throw
ArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.– JAD
2 days ago
Worth adding that if you're going to throw
ArgumentNullExceptions
, make sure they're thrown eagerly, not only when iteration has started.– JAD
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
That's a good point! Much trickier than you'd expect it to be... looks like a good place to use a local function.
– Pieter Witvoet
2 days ago
Ah, so that's what you meant with a local function. If you look at what
System.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.– JAD
yesterday
Ah, so that's what you meant with a local function. If you look at what
System.Linq
does, they put the lazy part in a separate private method. But I don't think there's much difference between those two.– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Also, maybe it's worth adding a repetition counter to the repeater class. I can imagine some instances where it's worth knowing howmany time it has looped.
– JAD
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
Local functions get turned into private static methods, so it's basically the same thing. I think they would've been used in Linq if they had been available at that time. I'll update the example to use capturing though - contrary to what I expected, in this case it's actually slightly faster, and it simplifies the code a little.
– Pieter Witvoet
yesterday
|
show 1 more comment
up vote
7
down vote
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
Names? The class would be more descriptive as something like LinqExtensions
; the method something like ZipLooped
.
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
The iterators have useful names, but what does i1
mean? And why five variables to track the state of two iterators? IMO it would be simpler as
var firstEnded = false;
var secondEnded = false;
while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
and the almost repeated code might be worth pulling out as an inner method:
var firstEnded = false;
var secondEnded = false;
bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}
while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}
I notice that you've decided to yield break
if either of the enumerables is empty. Would an exception be a better choice?
I'd renameadvance
inTryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break
) if both are empty.
– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare thisnew { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.
– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as perAverage
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
Nov 21 at 17:19
|
show 7 more comments
up vote
7
down vote
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
Names? The class would be more descriptive as something like LinqExtensions
; the method something like ZipLooped
.
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
The iterators have useful names, but what does i1
mean? And why five variables to track the state of two iterators? IMO it would be simpler as
var firstEnded = false;
var secondEnded = false;
while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
and the almost repeated code might be worth pulling out as an inner method:
var firstEnded = false;
var secondEnded = false;
bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}
while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}
I notice that you've decided to yield break
if either of the enumerables is empty. Would an exception be a better choice?
I'd renameadvance
inTryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break
) if both are empty.
– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare thisnew { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.
– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as perAverage
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
Nov 21 at 17:19
|
show 7 more comments
up vote
7
down vote
up vote
7
down vote
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
Names? The class would be more descriptive as something like LinqExtensions
; the method something like ZipLooped
.
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
The iterators have useful names, but what does i1
mean? And why five variables to track the state of two iterators? IMO it would be simpler as
var firstEnded = false;
var secondEnded = false;
while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
and the almost repeated code might be worth pulling out as an inner method:
var firstEnded = false;
var secondEnded = false;
bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}
while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}
I notice that you've decided to yield break
if either of the enumerables is empty. Would an exception be a better choice?
public static class Impl
{
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
Names? The class would be more descriptive as something like LinqExtensions
; the method something like ZipLooped
.
using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
{
var i1 = true;
var i2 = true;
var i1Shorter = false;
var i2Shorter = false;
var firstRun = true;
The iterators have useful names, but what does i1
mean? And why five variables to track the state of two iterators? IMO it would be simpler as
var firstEnded = false;
var secondEnded = false;
while (true)
{
if (!iterator1.MoveNext())
{
if (secondEnded) yield break;
firstEnded = true;
iterator1.Reset();
if (!iterator1.MoveNext()) yield break;
}
if (!iterator2.MoveNext())
{
if (firstEnded) yield break;
secondEnded = true;
iterator2.Reset();
if (!iterator2.MoveNext()) yield break;
}
yield return resultSelector(iterator1.Current, iterator2.Current);
}
and the almost repeated code might be worth pulling out as an inner method:
var firstEnded = false;
var secondEnded = false;
bool advance<T>(IEnumerator<T> it, ref bool thisEnded, bool otherEnded)
{
if (it.MoveNext()) return true;
// `it` has done a full cycle; if the other one has too, we've finished
if (otherEnded) return false;
thisEnded = true;
// Start again, although if `it` is empty we need to abort
it.Reset();
return it.MoveNext();
}
while (true)
{
if (!advance(iterator1, ref firstEnded, secondEnded)) yield break;
if (!advance(iterator2, ref secondEnded, firstEnded)) yield break;
yield return resultSelector(iterator1.Current, iterator2.Current);
}
I notice that you've decided to yield break
if either of the enumerables is empty. Would an exception be a better choice?
answered Nov 21 at 14:54
Peter Taylor
15.4k2657
15.4k2657
I'd renameadvance
inTryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break
) if both are empty.
– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare thisnew { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.
– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as perAverage
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
Nov 21 at 17:19
|
show 7 more comments
I'd renameadvance
inTryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (yield break
) if both are empty.
– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare thisnew { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.
– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as perAverage
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P
– VisualMelon
Nov 21 at 17:19
I'd rename
advance
in TryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I'd rename
advance
in TryMoveNextOrLoop
– t3chb0t
Nov 21 at 16:44
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
I notice that you've decided to yield break if either of the enumerables is empty. - This is the expected behaviour. I'd be surprised if it was something else and this makes linq so reliable - nothing there so nothing happens - otherwise you would need to check everything for emptyness, not pretty ;-)
– t3chb0t
Nov 21 at 16:51
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (
yield break
) if both are empty.– VisualMelon
Nov 21 at 17:10
@t3chb0t only most LINQ methods don't explicitly loop over one of the inputs as it consumes the other. I would probably expect an exception here if only one of them was empty, and an empty enumerable (
yield break
) if both are empty.– VisualMelon
Nov 21 at 17:10
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this
new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.– t3chb0t
Nov 21 at 17:13
@VisualMelon no way :-P this is not how it should work. No collection returning LINQ extensions throw exceptions if the source is empty. Compare this
new { 1 }.Zip(Enumerable.Empty<int>(), (x, y) => (x, y)).Dump();
The result is an empty collection.– t3chb0t
Nov 21 at 17:13
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per
Average
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P– VisualMelon
Nov 21 at 17:19
@t3chb0t indeed, but that's because it's defined as zipping as far as the shortest. I'd argue that extracting values repeatedly from an empty collection is meaningless, and so it should throw (as per
Average
). Anyhow, I can see your argument, so I think we'll have to agree to disagree :P– VisualMelon
Nov 21 at 17:19
|
show 7 more comments
up vote
5
down vote
If you should respect that not all Enumerators
implement Reset()
then it is not possible to use using
statements for the two IEnumerators
. But you could introduce an IEnumerator<TResult>
for the zipped result and use it like this:
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return InnerZipNew(first, second, resultSelector);
}
private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}
As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.
In this way you're back on the using track.
The ZipEnumerator
it self could be something like:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();
if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}
if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}
return true;
}
}
return false;
}
public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}
It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator
itself is disposed off?
The MoveNext()
method went a little more complicated than I like, so feel free to edit or suggest improvements.
Edit
A refactored version of ZipEnumerator
:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
DoDispose();
}
private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}
private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private Func<bool> CurrentMover = null;
private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}
return false;
}
private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}
return true;
}
return false;
}
private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}
public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}
public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... althoughts
andss
were below that level ;-]
– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,tt
andss
on the other hand werepublic
.
– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using theyield return
lazy evaluation. Example.
– JAD
2 days ago
1
@HenrikHansen I saw that. The problem is that if a method is using theyield
method of returning aIEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point theZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
2 days ago
|
show 2 more comments
up vote
5
down vote
If you should respect that not all Enumerators
implement Reset()
then it is not possible to use using
statements for the two IEnumerators
. But you could introduce an IEnumerator<TResult>
for the zipped result and use it like this:
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return InnerZipNew(first, second, resultSelector);
}
private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}
As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.
In this way you're back on the using track.
The ZipEnumerator
it self could be something like:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();
if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}
if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}
return true;
}
}
return false;
}
public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}
It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator
itself is disposed off?
The MoveNext()
method went a little more complicated than I like, so feel free to edit or suggest improvements.
Edit
A refactored version of ZipEnumerator
:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
DoDispose();
}
private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}
private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private Func<bool> CurrentMover = null;
private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}
return false;
}
private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}
return true;
}
return false;
}
private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}
public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}
public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... althoughts
andss
were below that level ;-]
– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,tt
andss
on the other hand werepublic
.
– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using theyield return
lazy evaluation. Example.
– JAD
2 days ago
1
@HenrikHansen I saw that. The problem is that if a method is using theyield
method of returning aIEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point theZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
2 days ago
|
show 2 more comments
up vote
5
down vote
up vote
5
down vote
If you should respect that not all Enumerators
implement Reset()
then it is not possible to use using
statements for the two IEnumerators
. But you could introduce an IEnumerator<TResult>
for the zipped result and use it like this:
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return InnerZipNew(first, second, resultSelector);
}
private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}
As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.
In this way you're back on the using track.
The ZipEnumerator
it self could be something like:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();
if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}
if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}
return true;
}
}
return false;
}
public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}
It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator
itself is disposed off?
The MoveNext()
method went a little more complicated than I like, so feel free to edit or suggest improvements.
Edit
A refactored version of ZipEnumerator
:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
DoDispose();
}
private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}
private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private Func<bool> CurrentMover = null;
private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}
return false;
}
private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}
return true;
}
return false;
}
private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}
public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}
public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}
If you should respect that not all Enumerators
implement Reset()
then it is not possible to use using
statements for the two IEnumerators
. But you could introduce an IEnumerator<TResult>
for the zipped result and use it like this:
public static IEnumerable<TResult> ZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
if (first == null) throw new ArgumentNullException(nameof(first));
if (second == null) throw new ArgumentNullException(nameof(second));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return InnerZipNew(first, second, resultSelector);
}
private static IEnumerable<TResult> InnerZipNew<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> resultSelector)
{
using (ZipEnumerator<TFirst, TSecond, TResult> zipEnumerator = new ZipEnumerator<TFirst, TSecond, TResult>(first, second, resultSelector))
{
while (zipEnumerator.MoveNext())
{
yield return zipEnumerator.Current;
}
}
}
As JAD writes in his comment it is necessary to catch possible invalid input as the first thing and then call a private shadow method to do the actual iteration in order to make the exceptions be thrown when the extension is called rather than when the enumeration is performed.
In this way you're back on the using track.
The ZipEnumerator
it self could be something like:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumerT;
IEnumerator<S> m_enumerS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_first = true;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumerT.Current, m_enumerS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
m_disposables.Add(enumerator);
return enumerator;
}
public bool MoveNext()
{
m_enumerT = m_enumerT ?? GetTEnumerator();
m_enumerS = m_enumerS ?? GetSEnumerator();
if (m_first)
{
if (m_enumerT.MoveNext())
{
if (!m_enumerS.MoveNext())
{
m_enumerS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumerS.MoveNext())
return false;
}
return true;
}
else
{
m_first = false;
}
}
if (!m_first && !m_secondReloaded)
{
if (m_enumerS.MoveNext())
{
if (!m_enumerT.MoveNext())
{
m_enumerT = GetTEnumerator();
if (!m_enumerT.MoveNext())
return false;
}
return true;
}
}
return false;
}
public void Reset()
{
m_secondReloaded = false;
m_first = true;
m_enumerT = null;
m_enumerS = null;
Dispose();
}
}
It's a little more code than other suggestions, but it encapsulates the problems with the disposal of intermediate enumerators without the necessity of a try-catch-statement. You could discuss if the disposal should be immediately when the enumerator is done or as I do collect them for disposal when the ZipEnumerator
itself is disposed off?
The MoveNext()
method went a little more complicated than I like, so feel free to edit or suggest improvements.
Edit
A refactored version of ZipEnumerator
:
public class ZipEnumerator<T, S, TResult> : IEnumerator<TResult>
{
IEnumerable<T> m_dataT;
IEnumerable<S> m_dataS;
IEnumerator<T> m_enumeratorT;
IEnumerator<S> m_enumeratorS;
List<IDisposable> m_disposables = new List<IDisposable>();
Func<T, S, TResult> m_selector;
bool m_secondReloaded = false;
bool m_isInitilized = false;
public ZipEnumerator(IEnumerable<T> dataT, IEnumerable<S> dataS, Func<T, S, TResult> selector)
{
m_dataT = dataT ?? throw new ArgumentNullException(nameof(dataT));
m_dataS = dataS ?? throw new ArgumentNullException(nameof(dataS));
m_selector = selector ?? throw new ArgumentNullException(nameof(selector));
}
public TResult Current => m_selector(m_enumeratorT.Current, m_enumeratorS.Current);
object IEnumerator.Current => Current;
public void Dispose()
{
DoDispose();
}
private void RegisterDisposable(IDisposable disposable)
{
m_disposables.Add(disposable);
if (m_disposables.Count > 10)
{
DoDispose();
}
}
private void DoDispose()
{
foreach (IDisposable disposable in m_disposables)
{
disposable.Dispose();
}
m_disposables.Clear();
}
private IEnumerator<T> GetTEnumerator()
{
var enumerator = m_dataT.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private IEnumerator<S> GetSEnumerator()
{
var enumerator = m_dataS.GetEnumerator();
RegisterDisposable(enumerator);
return enumerator;
}
private Func<bool> CurrentMover = null;
private bool FirstMover()
{
if (m_enumeratorT.MoveNext())
{
if (!m_enumeratorS.MoveNext())
{
m_enumeratorS = GetSEnumerator();
m_secondReloaded = true;
if (!m_enumeratorS.MoveNext())
return false;
}
return true;
}
else if (!m_secondReloaded)
{
CurrentMover = SecondMover;
return CurrentMover();
}
return false;
}
private bool SecondMover()
{
if (m_enumeratorS.MoveNext())
{
if (!m_enumeratorT.MoveNext())
{
m_enumeratorT = GetTEnumerator();
if (!m_enumeratorT.MoveNext())
return false;
}
return true;
}
return false;
}
private void Initialize()
{
m_enumeratorT = GetTEnumerator();
m_enumeratorS = GetSEnumerator();
CurrentMover = FirstMover;
m_isInitilized = true;
}
public bool MoveNext()
{
if (!m_isInitilized)
{
Initialize();
}
return CurrentMover();
}
public void Reset()
{
m_isInitilized = false;
m_secondReloaded = false;
CurrentMover = null;
m_enumeratorT = null;
m_enumeratorS = null;
DoDispose();
}
}
edited 2 days ago
answered Nov 21 at 19:15
Henrik Hansen
6,2731722
6,2731722
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... althoughts
andss
were below that level ;-]
– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,tt
andss
on the other hand werepublic
.
– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using theyield return
lazy evaluation. Example.
– JAD
2 days ago
1
@HenrikHansen I saw that. The problem is that if a method is using theyield
method of returning aIEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point theZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
2 days ago
|
show 2 more comments
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... althoughts
andss
were below that level ;-]
– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,tt
andss
on the other hand werepublic
.
– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using theyield return
lazy evaluation. Example.
– JAD
2 days ago
1
@HenrikHansen I saw that. The problem is that if a method is using theyield
method of returning aIEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point theZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.
– JAD
2 days ago
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I'd like to click +1 but when I see the variable names it says -1 so at the and it's a 0 ;-)
– t3chb0t
Nov 21 at 19:22
I think the word unconventional describes them pretty good... although
ts
and ss
were below that level ;-]– t3chb0t
Nov 21 at 19:34
I think the word unconventional describes them pretty good... although
ts
and ss
were below that level ;-]– t3chb0t
Nov 21 at 19:34
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,
tt
and ss
on the other hand were public
.– t3chb0t
Nov 21 at 19:56
I can explain that ;-P It's local and in a very small scope, this is allowed in my world,
tt
and ss
on the other hand were public
.– t3chb0t
Nov 21 at 19:56
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the
yield return
lazy evaluation. Example.– JAD
2 days ago
You might want to refactor this so that the null-checks are eagerly performed instead of when iteration start. Doing so would be in line with other Linq implementations. Calling any Linq method on a null enumerable will throw immediately, not when iterating starts. To do this, have the public extension methods check for nulls, then from that call a private method that's using the
yield return
lazy evaluation. Example.– JAD
2 days ago
1
1
@HenrikHansen I saw that. The problem is that if a method is using the
yield
method of returning a IEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.– JAD
2 days ago
@HenrikHansen I saw that. The problem is that if a method is using the
yield
method of returning a IEnumerable
, the entire method is treated as lazy. So nothing in the method starts executing before the first element in the result sequence is accessed. This means that only at that point the ZipEnumerator
is constructed, and only then nullchecks are performed. Take a look at this blog by Jon Skeet for another (probably better) explanation.– JAD
2 days ago
|
show 2 more comments
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208152%2fcustom-implementation-of-the-linq-zip-operator-for-different-length-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Your question would benefit from some example usage within the question itself, so that we can see how you intend the method to be consumed, and quickly verify that it is working as intended.
– VisualMelon
Nov 21 at 14:33
@VisualMelon improved, full working code
– BWA
Nov 21 at 14:35
4
Never
Reset
an enumerator. Just get a new enumerator.Reset
was a misfeature intended for COM interop scenarios.– Eric Lippert
Nov 21 at 19:45