Use Appropriate Collections

.Net makes using fancy collections very easy. In fact, it can be almost too easy. It is so simple to just throw in a List<T> or a ConcurrentDictionary<K, T> that it’s tempting to do it at every opportunity.

Today’s tip is to stop and think critically about the type of collections you need.

Some examples:

  • I was doing a code review recently and saw that this person was using Dictionary<string, bool> where every value was true. This is not a dictionary—it’s a set with complicated accessors. So use HashSet<T> – it has a simpler API, which will lead to your own code being simpler and more correct semantically. Dictionary<K, T> has a meaning, and if you’re abusing that meaning, then your code is likely incorrect, or in the best case, misleading (and thus a maintenance problem, and thus incorrect!).
  • Performance characteristics are often non-intuitive. For example, if you’re doing a lot of searching for values, which would you use? Dictionary<K, T> or List<T>? Most people would say the Dictionary, and perhaps in 95% of cases, they’re right. On the other hand, if you only have a few values, you may get better performance because of cache locality with a binary search over a List (or even a linear search), which is usually implemented as an array.
  • Speaking of arrays, if you have a read-only vector of a known size, use arrays instead of Lists. It’s semantically more correct, simpler, and usually more performant.
  • Another code review I was doing had usage of ConcurrentDictionary<K, T>. This sounds like a great type to use when you need to modify and/or read a dictionary with multiple threads, but the usage of this type is not that straightforward, and in fact the official documentation is unclear on some things. In this case, it was better to redesign at a higher level to avoid use of this type.
  • I’ve seen this type of code in reviews a few times:
var list = new List<MyObject>();
for (int i =0;i < source.Count; ++i)

        In most apps maybe this doesn’t matter. If you care about performance, then you should care that there are going to be an unknown number of pointless memory allocations, plus multiple copies of the old data into the new arrays. In a high-performance system, this matters. Use the constructor of the collection that takes an initial size. (Guess how many items are in the default List<T> internal item array: None.)

Some questions you can ask yourself:

  • Is the collection semantically correct? Or am I abusing it because it can do what I want? Do I have restrictions on the use of the collection that are not obeyed by the API of the collection? Is there a more appropriate data structure I can use? If not, should I wrap this collection into an API with suitable restrictions?
  • Am I using the simplest collection possible?
  • Is this collection type performant for how it will be used? How can I measure to make sure? Am I making assumptions that may be unfounded because of usage patterns or hardware optimizations? Am I initializing the data structure correctly, to avoid unnecessary memory reallocations?
  • Is the collection type I’m considering too complex to use effectively? Would it be better to redesign something at a higher level to avoid needing to use this collection type?

Collections are usually the core of any application (if you don’t have data, what you are acting upon?). Getting these right means simpler code, better performance, higher readability, and long-term maintainability. It doesn’t take any more work (usually) – it just takes a few moments to think about what you’re really doing.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.