The Specialties of IEnumerable

The IEnumerable interface is often used in C#, usually the generic form IEnumerable<out T>. An IEnumerable instance is not just an abstraction of a list, it is more like a view on a data source and can even be an infinite enumeration of values. This article will go into the details of the interface and will discuss the specialties it brings.

What is the IEnumerable interface?

The IEnumerable interface has just a single method called GetEnumerator() that returns an IEnumerator instance. The IEnumerator instance is the object that manages the navigation thru the enumeration and provides a MoveNext() method and a Current property (it has also a Reset() method, but it is rarely used).

The generic IEnumerable<out T> uses covariance (marked by the out keyword for the generic type). This allows that an object with a more derived type argument can be assigned to an object with a less derived type argument.

IEnumerable<string> listOfStrings = new List<string>();
IEnumerable<string> enumerableOfStrings = listOfStrings;

// not possible without the out keyword
IEnumerable<object> enumerableOfObjects = listOfStrings;

The Magic Behind the Foreach Statement

To iterate an enumeration, we can use the foreach statement:

List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };

foreach (int number in numbers)
{
    DoSomething(number);
}

This foreach statement is syntactic sugar. In the background, it generates a call to GetEnumerator(), a while loop and disposes the IEnumerator instance again:

List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };

List<int>.Enumerator enumerator = numbers.GetEnumerator();

try
{
    while (enumerator.MoveNext())
    {
        int current = enumerator.Current;
        DoSomething(current);
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
}

Although manual management is possible and sometimes required (see my post about Enumerable.Index). The foreach statement actually works also with objects that only have a method called GetEnumerator() that returns an IEnumerator instance. The IEnumerable interface is not required but recommended for clarity of course.

How to Implement IEnumerable?

Collection types usually implement IEnumerable to allow iterating its items. List<T> returns a custom IEnumerator implementation. The implementation has a reference to the List<T> instance and since the list updates a version index with every change, the enumerator could throw an InvalidOperationException if the list changed while iterating it. Implementing a custom IEnumerator class is not required for regular use cases. List<T> uses a custom IEnumerator implementation because it has special requirements and focusses highly on performance.

The simpler option is to use the yield keyword. A method that either returns IEnumerable or IEnumerator can contain the yield keyword:

public static IEnumerable<int> GetEvenNumbers(List<int>? numbers)
{
    if (numbers == null)
    {
        yield break;
    }

    foreach (int number in numbers)
    {
        if (int.IsEvenInteger(number))
        {
            yield return number;
        }
    }
}

The yield break; statement stops the iteration and jumps out of the method (basically like return;). The yield return statement returns a single value for the iteration. For methods that contain a yield keyword, an IEnumerator implementation is generated in the background.

The Problem With the Deferred Execution

The foreach statement hides a call to GetEnumerator() and without calling this method, some code might either not get executed or multiple times with multiple calls:

private static T IncreaseCounterAndReturn<T>(T value, ref int counter)
{
    counter++;
    return value;
}

public static void EnumerableCountTest()
{
    int counter = 0;

    List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };

    IEnumerable<int> enumerable = numbers.Select(x => IncreaseCounterAndReturn(x, ref counter));
    Console.WriteLine(counter); // returns 0

    List<int> list1 = enumerable.ToList();
    Console.WriteLine(counter); // returns 5

    List<int> list2 = enumerable.ToList();
    Console.WriteLine(counter); // returns 10

    numbers.Remove(4);
    numbers.Remove(5);
    List<int> list3 = enumerable.ToList();
    Console.WriteLine(counter); // returns 13
}

In the example above, we can see, that just calling Select() does not execute the given delegate without using a method that executes GetEnumerator(). Calling ToList() multiple times will also execute Select() multiple times. Depending on the use case, this can lead to performance problems (multiple times the same work with the same result) or different results (something could change between the executions and return different results). A common mistake is, to check if the enumeration contains any items with Any(), and if so, iterating it again. This would not reduce the work but increase it.

When an IEnumerable instance is used multiple times, it could be stored in a list once and worked with that instead. This would allocate memory but won’t execute twice. To detect such cases, the analyzer CA1851 (Possible multiple enumerations of IEnumerable collection) in Microsoft.CodeAnalysis.CSharp.NetAnalyzers could be activated.

Checking Arguments in Extension Methods

Well written public methods should validate their arguments. This is also the case for methods with the yield keyword. But since the call to GetEnumerator() is deferred, the argument checks are also deferred.

In the following example the ArgumentNullException is thrown too late. We get the exception on the call to ToList() and not where the actual method was called:

public static IEnumerable<T> WhereIsInSearchValues<T>(this IEnumerable<T> items, List<T> searchValues)
{
    ArgumentNullException.ThrowIfNull(items);
    ArgumentNullException.ThrowIfNull(searchValues);

    foreach (T item in items)
    {
        if (searchValues.Contains(item))
        {
            yield return item;
        }
    }
}

public static void ArgumentChecksTest()
{
    List<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };

    // no ArgumentNullException
    IEnumerable<int> result = numbers.WhereIsInSearchValues(null!);

    // ArgumentNullException
    List<int> resultList = result.ToList();
}

There is an easy solution to improve the WhereIsInSearchValues() method. We can split the argument checks and the iteration in two separate methods. The iteration method can be named like the actual method, but with an Iterator suffix:

public static IEnumerable<T> WhereIsInSearchValues<T>(this IEnumerable<T> items, List<T> searchValues)
{
    ArgumentNullException.ThrowIfNull(items);
    ArgumentNullException.ThrowIfNull(searchValues);

    return WhereIsInSearchValuesIterator(items, searchValues);
}

private static IEnumerable<T> WhereIsInSearchValuesIterator<T>(this IEnumerable<T> items, List<T> searchValues)
{
    foreach (T item in items)
    {
        if (searchValues.Contains(item))
        {
            yield return item;
        }
    }
}

With this fix, we get the ArgumentNullException at the expected code line.

Infinite Enumerations

It is possible to create infinite enumerations and there are actual use cases for it. The following GetRandomNumbers() method is an endless loop of random numbers:

public static IEnumerable<int> GetRandomNumbers(int minValue, int maxValue)
{
    Random random = new Random();

    while (true)
    {
        yield return random.Next(minValue, maxValue);
    }
}

While it looks like an endless loop, it actually has an end when it is used the correct way:

foreach (int number in GetRandomNumbers(1, 10).Take(10))
{
    Console.WriteLine(number);
}

In this case, we use the Take() method to only take 10 items and then stop.

Although it is possible, endless enumerations are rarely a useful solution and can easily lead to unwanted endless loops since most methods won’t handle these cases. Or what would you think ToList() would do?

Conclusion

The IEnumerable interface is an important feature of C#. The usage of the interface is integrated in the language with the foreach and yield keywords. Using IEnumerable can lead to unexpected behavior, like multiple executions. Therefore, it is important to know how foreach and yield actually work.