Comparing LINQ’s maintainability

LINQ hides the gory details of how algorithms work, making code measurably more maintainable. Maintainability is correlated with both cost[pdf] and defects[pdf]. In the following example, we’ll use Visual Studio 2010 to measure maintainability.

Set Intersection

Imagine we have two sets of integers, and we need to find the intersection – that is, numbers that are in both sets. In addition, the intersection set should not contain duplicates. Below are four C# functions, followed by a comparison of their maintainability indexes.

Function 1

A junior developer might begin by taking each number in set A, and searching set B for that number. The C# code for this might look like:

int[] ForEach(List SetA, List SetB)
{
  var intersections = new List();
  foreach (int a in SetA)
    foreach (int b in SetB)
      if (a == b)
        if (Contains(intersections, a))
          intersections.Add(a);
  return intersections.ToArray();
}
bool Contains(List set, int target)
{
  foreach (int i in set)
    if (i == target)
      return true;
  return false;
}

Function 2

A more senior developer might review Function 1 and notice that it is slow when operating on large sets. There are three nested loops, including the .Contains, which is great than O(n2) complexity.
A more efficient solution would be to sort both sets first, so that we can find the intersection using a single loop. For example:

int[] SortFirst(List SetA, List SetB)
{
  //sort sets and use parallel pointers O(n)
  var sortedA = new List(SetA);
  var sortedB = new List(SetB);
  var intersect = new List();
  sortedA.Sort();
  sortedB.Sort();
  int indexA = 0;
  int indexB = 0;
  while (indexA < sortedA.Count && indexB < sortedB.Count)
  {
    if (sortedA[indexA] < sortedB[indexB])
      indexA++;
    else if (sortedB[indexB] < sortedA[indexA])
      indexB++;
    else
    {
      intersect.Add(sortedA[indexA]);
      indexA++;
      indexB++;
    }
  }
  return intersect.ToArray();
}

This is the type of code that gives the author pride, and the maintainer a headache. It may be fast, but the intent is not obvious.

Function 3

Now let’s try a query using LINQ.

int[] LinqQuery(List SetA, List SetB)
{
  return (from a in SetA
          where SetB.Contains(a)
          select a).Distinct().ToArray();
}

Note that this function creates a nested loop. For each number in set A, we must search for it in set B.

Function 4

Fortunately, LINQ already contains a number of basic math operations, including intersect. For example:

int[] LinqCall(List SetA, List SetB)
{
  return SetA.Intersect(SetB).ToArray();
}

Maintainability

The LINQ functions appear to be the most maintainble. However, we need a more systematic approach. Starting with Visual Studio 2008, the Visual Studio now includes a static code analysis tool, which we can use to compute a maintainability index. These numbers range from 0 to 100, where 100 is the most maintainable. I’ll discuss how this works in a future post.

Using Visual Studio 2010 Ultimate RC1, we get the following results:

Function Maintainability Index
1 64
2 54
3 73
3 83

These metrics show that LINQ does produce more maintainable code. In the next post, we’ll measure the performance of each algorithm.

Tags: ,