Category Archives: Tips

4 Essential Tips for High-Performance Garbage Collection on Servers

Update: If you find this useful, you can read a much more complete treatment of garbage collection and performance in my book Writing High-Performance .NET Code.

Update: Part 2 – How to Debug GC Issues with PerfView is now available.

On this blog, I’ve alluded to the fact that I work on high-performance server applications, most recently in .Net. Writing these in .Net is just as possible as it is in native code, but it does come with its own set of challenges. In particular, one of the biggest things you need to learn how to deal with is garbage collection.

There is a lot out there already written about the CLR’s garbage collector, so I’m not going to go over many of the details. If you need a primer on it, MSDN has some documentation:

Garbage Collection

Read that first. For the rest of this article series, I will assume that you understand how the GC basically works.

In this and future articles, I’ll cover a lot of the stuff I’ve learned to improve application performance in the face of garbage collection.

Tip 1: Use Server GC

There are two modes of garbage collection (GC): workstation and server. As long as you’re running multiple processors, you almost certainly want server mode collection. With workstation mode, a GC happens on the thread that makes the allocation that causes the GC. The collection happens at normal priority.

With server GC, a thread for every core is created just for doing GC. There is also a small object heap and a large object heap created for each GC thread. All of the program’s allocations are spread among these heaps (more on large object heaps later). When no GC is happening, these threads are blocked and do nothing. When a GC is triggered, all of the user threads get paused, and all the GC threads wake up at highest priority and do collection in parallel. All of these optimizations lead to server GC usually being much faster than workstation GC.

A word about concurrent collections: In workstation GC, concurrent collections are enabled by default. However, this only applies to generation 2 collections. Generations 0 and 1 are always blocking. However, given that it’s concurrent, that means that it will compete with your own threads that are trying to get actual work done. In a high-performance server scenario, that may not be acceptable. A better strategy is to ensure that generation 2 collections never (or extremely rarely) happen.

You enable server GC by putting this in app.config:

<configuration>
   <runtime>
      <gcServer enabled="true"/>
   </runtime>
</configuration>

Tip 2: Objects Live Briefly or Forever

A histogram of object lifetimes in your app should look essentially like this:

Object last either a vanishingly brief amount of time, or they last forever – it’s the stuff in the middle that will kill your performance.

This has everything to do with the generations of garbage collection and object survivorship. There are three generations: 0, 1, and 2. Generation 0 happens most often and is the fastest—ideally lasting only a couple of milliseconds, if that. Objects that didn’t get cleaned up in generation 0 are put into generation 1. Generation 1 collections are also very fast, usually as fast as generation 0. The problem, though, is that objects that make it to generation 1 have a fair chance of surviving this generation, and being put into generation 2.

Generation 2 is the problem. A generation 2 collection is much slower than 0 or 1—often on the order of hundreds of milliseconds or even seconds—that means your process is paused completely for that time. You do not want objects to survive to generation 2.

So how often do collections happen? There is no hard-and-fast rule: it all depends on your allocation rate, memory pressure, and patterns that you’ve trained into the GC. The GC will adapt over time, training itself on your memory usage patterns. All of this completely depends on your application and I’ll look at ways to measure all of this in a future article.

Tip 3: All Long-Lived Objects Must Be Pooled

It may be that you can’t ensure all objects for a given request are cleaned up in the first generation 0 collection that occurs. If requests are in memory longer than the time between collections, then you’re guaranteed to have survivorship.

For these types of objects, first see if you can factor them so that not all parts them have to live that long. Control object lifetime very closely and null out references once you’re done.

Once you’ve done that, hopefully there are only a handful of objects that really must last the entire length of a request. For those, create a pool of them with reinitialization semantics—effectively move them to the far end of that histogram above, where they live forever.

This works because of the adaptive nature of the garbage collector – it learns over time that if it does a collection and doesn’t free up much memory, it will schedule that generation of collection to happen less frequently. In my own case, at one point, our server had trained the GC to do a generation 2 collection less often than once per day, under a constant load. With enough work, we could probably get that to essentially never.

You may be able to get quite far without the need to implement object pooling. Or you may need to pool only a small number of objects, and the survivorship of the remaining objects is not enough to cause problematic garbage collections—only measurement and observation will tell you for sure.

Tip 4: All Large Objects Must Be Pooled

There is a way to cause an object to automatically be in generation 2: make it at least 85000 bytes in size. Anything at least that size gets put into a Large Object Heap. Only generation 2 collections service that type of heap.

Want to cause a generation 2 collection? Do this:

byte[] buffer = new byte[85000];

If you want high-performance, you absolutely cannot do this per request on a server. These types of buffers, or other large objects, must be pooled. There is no built-in pooling mechanism in .Net—you must write your own. There are usually not too many large objects you’ll need to pool: strings and byte buffers are the usual suspects, if you need to do much serialization/deserialization, but also look out for collections of any type.

If you want to know more about the Large Object Heap and why 85000 bytes is the threshold, read this great article: Large Object Heap Uncovered.

Pooling collection objects comes with its own set of challenges:

You can’t assume the full collection is valid (the difference between length and capacity). If you use pooled arrays, for example, you have to track the length separately, since only a small portion of the array may be valid. This can drastically affect the interfaces between components.
Pooled collections that can grow over time will cause your memory to rise indefinitely unless you put limits on the size of the pool and/or the size of collections within the pool.
Large Object Heaps are not compacted during collection, which means that you can fragment the heap such that it’s wasting a lot of memory. It all depends on your allocation and collection pattern. I may talk about heap fragmentation in another article.

Once you solve those, you’re good to go… no more generation 2 collections!

Next Time…

In my next article, I’ll cover tools you can use to measure garbage collection statistics, and how you can use that knowledge to improve your performance.

Alternative to Double-Checked Locking: Lazy<T>

A common pattern for lazy-initialization of a possibly-expensive type is double-checked locking.

The pattern is essentially this:

// member field
private volatile MyType myObject = null;
private object myLock = new object();

public MyType MyObject
{
    get
    {
        if (myObject == null)
        {
            lock(myLock)
            {
                if (myObject == null)
                {
                    myObject = new MyType();
                }
            }
        }
    }   
}

It’s quite common, and it’s also common to get it subtly wrong (not using volatile, for example).

However, in .Net 4, there’s a simpler way to do this: use Lazy<T>. This type uses a delegate to initialize the value with double-checked locking (there are other possible thread safety modes as well).

private Lazy<MyType> myObject = new Lazy<MyType>( () => new MyType );

public MyType MyObject
{
    get
    {
        return myObject.Value;
    }   
}

With Lazy<T>, the delegate you pass in will not be called until the Value property is accessed. Depending on the options you set, only a single thread will call your delegate and create the real type. I’d much rather do this than have to look at double-checked locking patterns scattered throughout a code base.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Use Appropriate Collections

.Net makes using fancy collections very easy. In fact, it can be almost too easy. It is so simple to just throw in a List<T> or a ConcurrentDictionary<K, T> that it’s tempting to do it at every opportunity.

Today’s tip is to stop and think critically about the type of collections you need.

Some examples:

I was doing a code review recently and saw that this person was using Dictionary<string, bool> where every value was true. This is not a dictionary—it’s a set with complicated accessors. So use HashSet<T> – it has a simpler API, which will lead to your own code being simpler and more correct semantically. Dictionary<K, T> has a meaning, and if you’re abusing that meaning, then your code is likely incorrect, or in the best case, misleading (and thus a maintenance problem, and thus incorrect!).
Performance characteristics are often non-intuitive. For example, if you’re doing a lot of searching for values, which would you use? Dictionary<K, T> or List<T>? Most people would say the Dictionary, and perhaps in 95% of cases, they’re right. On the other hand, if you only have a few values, you may get better performance because of cache locality with a binary search over a List (or even a linear search), which is usually implemented as an array.
Speaking of arrays, if you have a read-only vector of a known size, use arrays instead of Lists. It’s semantically more correct, simpler, and usually more performant.
Another code review I was doing had usage of ConcurrentDictionary<K, T>. This sounds like a great type to use when you need to modify and/or read a dictionary with multiple threads, but the usage of this type is not that straightforward, and in fact the official documentation is unclear on some things. In this case, it was better to redesign at a higher level to avoid use of this type.
I’ve seen this type of code in reviews a few times:

var list = new List<MyObject>();
for (int i =0;i < source.Count; ++i)
{
    list.Add(source.Get(i));
}

In most apps maybe this doesn’t matter. If you care about performance, then you should care that there are going to be an unknown number of pointless memory allocations, plus multiple copies of the old data into the new arrays. In a high-performance system, this matters. Use the constructor of the collection that takes an initial size. (Guess how many items are in the default List<T> internal item array: None.)

Some questions you can ask yourself:

Is the collection semantically correct? Or am I abusing it because it can do what I want? Do I have restrictions on the use of the collection that are not obeyed by the API of the collection? Is there a more appropriate data structure I can use? If not, should I wrap this collection into an API with suitable restrictions?
Am I using the simplest collection possible?
Is this collection type performant for how it will be used? How can I measure to make sure? Am I making assumptions that may be unfounded because of usage patterns or hardware optimizations? Am I initializing the data structure correctly, to avoid unnecessary memory reallocations?
Is the collection type I’m considering too complex to use effectively? Would it be better to redesign something at a higher level to avoid needing to use this collection type?

Collections are usually the core of any application (if you don’t have data, what you are acting upon?). Getting these right means simpler code, better performance, higher readability, and long-term maintainability. It doesn’t take any more work (usually) – it just takes a few moments to think about what you’re really doing.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Don’t Log Exception Stacks Unless You Can Afford It

A short, simple tip for this week: Don’t log exception stacks in managed applications unless you understand the performance of your system.

If you have a standard desktop client, or some other app where you can tolerate many-millisecond disruptions or a spike in CPU usage, then you don’t have to care about this.

If you are building a high-performance managed system, then you definitely do have to care about this.

The system I work on needs to handle exceptions coming from 3rd-party components. We can’t let the exceptions kills the process, and we can’t swallow them, ignoring the component failure, so we need to log them. The question is—what information do we log?

For managed exceptions, there are three properties that are most generally useful: type of the exception, Message, and StackTrace.

Getting the type and the Message are nearly free, but accessing StackTrace or calling ToString() on the exception object will cause a bunch of reflection to happen to build up a user-friendly stack trace string. If you can do without, go for it. It may be possible to augment the Message property of an exception to give some clues to the problem. Usually, however, this will not be possible.

Since getting the stack trace for an exception is relatively expensive, especially for a program that shouldn’t miss a beat and needs to continue running, many of the managed applications that I work on have a configuration setting to be able to turn off stack trace logging for exceptions. This enables us to either run it turned on until we see a performance problem, or keep it off until there is a reproable problem, when we can turn it on selectively.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Concurrency, Performance, Arrays and when Dirty Writes are OK

This article will show how to increase the performance of a rather simple algorithm up to 80%, by showing how we can accept a loss in absolutely accuracy, and also taking advantage of how processors work.

Introduction

When we talk about multiple threads using a shared resource, often the first thing we think of is how to synchronize access to that resource so that it is always in a valid, deterministic state. There are, however, many times when the exact opposite approach is desired. Sometimes it’s appropriate to sacrifice accuracy in favor of performance.

To demonstrate the examples in this article, I developed a small test driver that will put our various test classes through their paces on multiple threads and measure a couple of key factors: elapsed time and error rate. To see the code for this driver, download the accompanying sample project.

Histogram

Recently, I needed to implement something that, reduced to essentials, is a histogram. It needed an array of counts and an integer tracking how many samples had been added. Here’s a simplified version of just the histogram functionality:

class Histogram
{
    protected long sampleCount;
    protected ushort[] sampleValueCounts;

    public virtual long SampleCount { get { return sampleCount; } }
    public ushort[] SampleValueCounts { get { return sampleValueCounts; } }

    public Histogram(int range)
    {
        // + 1 to allow 0 - range, inclusive
        this.sampleValueCounts = new ushort[range + 1];
    }

    public virtual void AddSample(int value)
    {
        ++sampleValueCounts[value];
        ++sampleCount;
    }
}

Pretty simple, right?

Now let’s add the requirement that this needs to be access from multiple threads. How is that going to affect the implementation? How do we ensure that sampleCount and the sum of sampleValueCounts[] stay in sync?

The obvious solution is to add a lock in AddSample:

lock(sampleLock)
{
    ++sampleValueCounts[value];
    ++sampleCount;
}

Let’s add another requirement: performance. Does the locked version of AddSample perform well enough? Maybe—we should absolutely measure and find out. In my case, I knew some additional details in how the system worked and had past experience to think that I may want to reconsider the lock.

I knew that there would be many thousands of instances of the Histogram and that every second, hundreds to thousands of them would need to be updated. This could be very hot code.

AddSample is a very cheap function—two increments! By adding a lock, even one that most likely runs completely in user mode, I may have significantly increased the time it takes to execute this method. We’ll see an actual measurement below, and it’s not quite that bad in this case, but we can do better.

Lock first to Ensure Correctness

You should always try it first with a lock—make it correct first, then optimize. If your performance is fine with the lock, then you don’t need to go any further—just stop and work on something more important.

Before we add a lock, though, let’s see how the basic histogram performs with no locking and multiple threads:

Type: Histogram

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:10.0959830

SampleCount: 13,641,046 (Error: -86.359%)

Sum: 99,969,354 (Error: -0.031%)

On my 4 core machine, this took about 10 seconds. The sum of the histogram values was very close to the number of values we tried to add—definitely within our tolerance. Unfortunately, SampleCount is way off.

Intuitively, this makes sense. SampleCount is updated on every single sample, so between all the threads, there is ample opportunity to race and obliterate this value. But the array—not so much. Since I’m testing with a random distribution, there is very low likelihood that there will be a conflict in a particular slot.

Let’s see what happens when we add a lock:

class HistogramLocked : Histogram
{
    private object addLock = new object();

    public HistogramLocked(int range) : base(range) { }

    public override void AddSample(int value)
    {
        lock (addLock)
        {
            ++sampleValueCounts[value];
            ++sampleCount;
        }
    }
}

Type: HistogramLocked

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:08.0839173

SampleCount: 100,000,000 (Error: 0.000%)

Sum: 100,000,000 (Error: 0.000%)

Woah, our time decreased! Actually, on my home machine it decreased. On my work machine (which has faster and more processors) the time did actually increase by about 10% (proving the point above: you know nothing until you measure).

In most cases, we should just stop here. We’ve ensured a correct system for very little cost. You would need to have a good reason to try to optimize this further.

Since we saw that there weren’t that many conflicts on the array, what if we just protect the sampleCount increment, and do that with an Interlocked.Increment() method?

class HistogramInterlocked : Histogram
{
    public HistogramInterlocked(int range) : base(range) { }

    public override void AddSample(int value)
    {
        ++sampleValueCounts[value];
        Interlocked.Increment(ref this.sampleCount);
    }
}

Type: HistogramInterlocked

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:10.7659328

SampleCount: 100,000,000 (Error: 0.000%)

Sum: 99,957,294 (Error: -0.043%)

We may as well just use a lock and get 100% accuracy in both counts.

Approximation can be Good Enough

Want to eke out even more performance? To do this without a lock, you need to give up one attribute: correctness. If there are enough samples, then missing a some is fine. You just need to do two things:

Ensure your algorithm can work in the face of slightly incorrect data
Minimize the error as cheaply as possible.

Once you develop an algorithm that approximates a result, you should be able to measure the error rate to validate that the performance gain is worth the loss of accuracy.

Now let’s see if we can get an approximate SampleCount that’s “good enough.”

Instead of having all threads increment a single value, they can all increment their own value. We “bucketize” the sampleCount variable. When we need the real total, we can just add them up, and, in effect, transfer some of the CPU cost to a more rare operation.

class HistogramThreadBuckets : Histogram
{
    private long[] valueBuckets;

    public long[] ValueBuckets { get { return this.valueBuckets; } }

    public override long SampleCount
    {
        get
        {
            long sum = 0;
            for (int i = 0; i < this.valueBuckets.Length; ++i)
            {
                sum += this.valueBuckets[i];
            }
            return sum;
        }
    }

    public HistogramThreadBuckets(int range, int valueBuckets) : base(range) 
    {
        this.valueBuckets = new long[valueBuckets];
    }

    public override void AddSample(int value)
    {
        ++sampleValueCounts[value];

        int bucket = Thread.CurrentThread.ManagedThreadId % this.valueBuckets.Length;
        ++valueBuckets[bucket];            
    }
}

I run this algorithm with multiple bucket counts to see the difference:

Type: HistogramThreadBuckets

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:09.1901657

ValueCount Buckets: 4

SampleCount: 92,941,626 (Error: -7.058%)

Sum: 99,934,033 (Error: -0.066%)

Type: HistogramThreadBuckets

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:06.9367826

ValueCount Buckets: 8

SampleCount: 96,263,770 (Error: -3.736%)

Sum: 99,938,926 (Error: -0.061%)

Type: HistogramThreadBuckets

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:06.6882437

ValueCount Buckets: 16

SampleCount: 98,274,025 (Error: -1.726%)

Sum: 99,954,021 (Error: -0.046%)

Type: HistogramThreadBuckets

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:04.6969067

ValueCount Buckets: 32

SampleCount: 99,934,543 (Error: -0.065%)

Sum: 99,936,526 (Error: -0.063%)

Cool, we just cut our time in half! Something about this should seem weird, though. How can this be significantly faster than the naïve algorithm when neither of them use locks? There’s no contention anyway!

Well, actually, there is. These buckets likely exist in the various caches for multiple processors, and when you modify one, a certain amount of communication goes on between CPUs to resolve conflicts and invalidate the caches. The more variables there are, the less likely this will happen.

Know Your Cache Line Size

Can we do even better?

If we know that all our CPUs are going to need to coordinate among cache lines, can we ensure we have NO conflicts between our threads? If we know the size of a cache line and absolutely guaranteed that each thread went to a different cache line, then we could take complete advantage of the CPU caches.

To get the size of a cache line, use one of many CPU-information tools available on the Internet. I used one freeware utility called CPU-Z. In my case, the cache line is 64-bytes, so I just need to ensure that the buckets holding the values are at least 64 bytes long. Since a long is 8 bytes, I can just pad out a struct to do this (or just multiply the size of the array and use only every nth entry).

The other thing we need to do is ensure there are no collisions between threads on the buckets. In the previous code, I just hashed the ManagedThreadId into the buckets, but this isn’t reliable since we don’t know the thread ids that we’ll get. The solution is just to send in our own “thread id” that we can reliably map to a unique bucket.

class HistogramThreadBucketsCacheLine : Histogram
{
    public struct ValueBucket
    {
        private long _junk0;
        private long _junk1;
        private long _junk2;
        
        public long value;

        private long _junk4;
        private long _junk5;
        private long _junk6;
        private long _junk7;            
    };

    private ValueBucket[] valueBuckets;

    public ValueBucket[] ValueBuckets { get { return this.valueBuckets; } }

    public override long SampleCount
    {
        get
        {
            long sum = 0;
            for (int i = 0; i < this.valueBuckets.Length; ++i)
            {
                sum += this.valueBuckets[i].value;
            }
            return sum;
        }
    }

    public HistogramThreadBucketsCacheLine(int range, int valueBuckets)
        : base(range) 
    {
        this.valueBuckets = new ValueBucket[valueBuckets];
    }

    public override void AddSample(int value, int threadId)
    {
        ++sampleValueCounts[value];

        ++valueBuckets[threadId].value;
    }
}

And the output is:

Type: HistogramThreadBucketsCacheLine

Adding 100,000,000 samples on 32 threads

Elapsed: 00:00:02.0215371

SampleCount: 100,000,000 (Error: 0.000%)

Sum: 99,761,090 (Error: -0.239%)

We more than halved it again! Not too shabby! Smile

Summary

By being willing to accept some loss of accuracy, and taking advantage of how processors work, we were able to reduce the running time of a fairly simple and superficially efficient algorithm by 80%. As with all performance investigations, your mileage may vary, and lots depends on the data your manipulating and the hardware it runs on. Optimizations like these should be quite rare, and only in places you absolutely need them.

Download the sample code project.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Keep API Usage Simple

One lesson that I’m appreciating more every week is that old acronym, K.I.S.S. – Keep It Simple, Stupid. Simple code is best. The highest cost of most software is maintenance—and you don’t want maintenance to include a lot of “how the heck does this work?”

This simplicity does not apply to just your own code. Most API frameworks provide multiple ways of doing things, but if you don’t need it, don’t use it.

For example, I’ve been working a lot with the Task Parallel Library, and it’s great – it really is an awesome way of doing asynchronous programming. However, when you get to continuations, there are so many options, it can quickly make your code a bewildering mix of usage patterns, if you’re not careful.

You can have continuations called only when the task fails, or is canceled, or only when it succeeds, or only when it does not succeed. You can chain continuations, cancel them, and there are all sorts of scheduling options, and much more.

This API is powerful, and I’m grateful it’s all there, but when you need to have a large team work on a single code base, it’s helpful necessary to enforce some simplifying constraints. One thing we’ve done, specifically with the Task Parallel Library, is to require everyone to use just one of the many possible patterns. This way, no matter what part of the code we’re looking at, we can understand exactly how Tasks work. It takes the guesswork out, and reduces maintenance costs. This kind of thing is especially critical for complex areas of the code – i.e., asynchronous programming.

Takeaway: you don’t have to use every possible API feature or pattern—just pick one and enforce consistency and simplicity.

Activator.CreateInstance: Slow vs. Less-Slow

In the system I’m working on, there is a lot of runtime type resolution, e.g., lines like this:

var obj = Activator.CreateInstance(type, param1, param2, param3)

The params are arguments to the constructor of the type. The type itself is ultimately given through some configuration or otherwise-dynamic means.

Since we do a lot of this (many times per second), we’d like to ensure that instance creation is as fast as it can be. The method given above is not fast.

When this overload of CreateInstance is called, .Net has to do a lot of work to make sure it works: find the right constructor, verify that all the argument types match up, etc. This can be painfully slow, especially when, in a closed system, you can do a lot of offline validation to ensure that types are correctly specified.

Fortunately, Activator has another overload that is quite a bit faster, if you can use it.

Activator.CreateInstance(type)

This will call the default constructor of the type. However, we still have the problem of passing in those arguments. There are a few ways we could do this, but one of the simplest is to ensure that all the types you’re doing this for implement a common initialization interface, as in this sample program:

interface IPerson
{
    void Initialize(string name, DateTime dob, int weight);
}

class Employee : IPerson
{
    public Employee()
    {
    }

    public Employee(string name, DateTime dob, int weight)
    {
        Initialize(name, dob, weight);
    }

    public Employee(string name, DateTime dob)
        : this(name, dob, -1)
    {
    }

    public void Initialize(string name, DateTime dob, int weight)
    {
        this.name = name;
        this.dob = dob;
        this.weight = weight;
    }

    private string name;
    private DateTime dob;
    private int weight;
}

class Program
{
    static void Main(string[] args)
    {
        const int iterations = 1000000;

        Console.WriteLine("Timing Activator.CreateInstance(type, params)...");
        var employees = new IPerson[iterations];

        var stopwatch = Stopwatch.StartNew();

        for (int i = 0; i < iterations; ++i)
        {
            employees[i] = Activator.CreateInstance(typeof(Employee), 
                "Bob", DateTime.Now, 220) as IPerson;
        }

        stopwatch.Stop();

        Console.WriteLine("Total time: {0} for {1:N0} employees", 
            stopwatch.Elapsed.ToString(), iterations);
        Console.WriteLine();

        Console.WriteLine("Timing Activator.CreateInstance(type) + Initialization...");
        
        var employees2 = new IPerson[iterations];

        var stopwatch2 = Stopwatch.StartNew();            

        for (int i = 0; i < iterations; ++i)
        {
            employees2[i] = Activator.CreateInstance(typeof(Employee)) as IPerson;
            employees2[i].Initialize("Bob", DateTime.Now, 220);
        }

        stopwatch2.Stop();

        Console.WriteLine("Total time: {0} for {1:N0} employees", 
            stopwatch2.Elapsed.ToString(), iterations);
    }
}

What do you think the time difference in the two approaches is?

Timing Activator.CreateInstance(type, params)…

Total time: 00:00:05.4765664 for 1,000,000 employees

Timing Activator.CreateInstance(type) + Initialization…

Total time: 00:00:01.6007687 for 1,000,000 employees

About 3.5x faster. Pretty good. In practice, the improvement will vary. In some of our own timing experiments, we actually saw a much better speedup.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Performance and Other Issues with System.Enum

I’ve been doing a lot of very intense .Net coding at work for the last few months and it’s very gratifying to see that .Net can handle some pretty serious high-load scalable server work. We are doing some pretty amazing things. However, there are all sorts of gotchas—you do have to learn how to do it correctly (but that’s true of any technology).

One of the things we’re most concerned about right now is performance, and where performance is concerned, one of the big gotchas is the .Net Framework Class Library. Don’t get me wrong, for 99.99% of all .Net projects, the performance is perfectly adequate and you don’t have to care (much). On the other hand, if you’re dealing with highly-scalable systems, you should definitely validate the performance of any general-purpose library you use.

As with all performance question, the only 100% solid advice in all situations is: measure, measure, measure.

General Purpose Methods May Not Be Fast

I did a profiler run on one of our components the other day and a few methods showed up that I would never expect to see in a profiler trace:

We were calling these methods a lot. In my mind, they both basically boiled down to logical AND (&) statements. Unfortunately that’s not the case.

If you use an IL decompiler like ILSpy to examine the code of these methods, you’ll see they do quite a bit more. For example, HasFlag also checks whether the types are equal—useful, but that’s a lot more expensive than a simple & check.

IsDefined is even worse. There are half a dozen checks for specific types, followed by an allocation of an array containing all the enum values, and finally a binary search. If your enums are string types, then it’s just a linear search with string comparisons.

It’s easy to understand why these methods are so heavy—they’re being extra cautious for correctness-sake. That’s probably what you want in a general-purpose framework.

However, if your goal is performance and you have a closed system, complete control of your code, and adequate tests, then you can make better decisions.

Rather than do this:

if (bookFlags.HasFlag(BookTypes.Fiction))

Do this instead:

if ((bookFlags & BookTypes.Fiction) != 0)

As for IsDefined, just don’t use it. Or use it in debug-only assertions to validate functionality.

Another Enum method to avoid if you can is ToString(), especially on enumerations that have [Flags].

HasFlag Gotcha

Suppose you have this code:

[Flags]

enum NodeFlags

    Exception = 1,

    Failure = 2,

    Canceled = 4,

    TimedOut = 8,

    AnyFailure = Exception | Canceled | TimedOut

if (node.Flags.HasFlag(NodeFlags.Failure)

    || node.Flags.HasFlag(NodeFlags.TimedOut)

    || node.Flags.HasFlag(NodeFlags.Exception))

You may be tempted to do this:

if (node.Flags.HasFlag(NodeFlags.AnyFailure))

This will not work. It will check that ALL of the bits are set, not just one of them.

You will need to do this:

if ((node.Flags & NodeFlags.AnyFailure) != 0)

We saw that around 5% of our CPU time was being spent just in Enum.IsDefined and Enum.HasFlag. With these changes, we reduce that to approximately 0.

If you find tips like these helpful, you’ll love my book, C# 4.0 How-To.

A WPF Numeric Entry Control

When WPF first shipped, there was a noticeable lack of certain controls we’ve become used to in Win32 and WinForms: Calendar, DateTimePicker, and NumericUpDown. WPF 4 adds Calendar and DatePicker, but not anything for numeric entry.

For my solution I wanted something that behaved very similarly to the WinForms NumericUpdown control.

Some of the specifications:

Allows user to set Value, MaxValue, MinValue, Increment, and LargeIncrement.
Text directly entered is limited to numbers
Pasted text is not intercepted, but when the control has lost focus it will be validated and reset to the previous value if necessary
Two buttons, for increment and decrement
Holding down the buttons with the mouse causes the number to increment continuously
Up and down increment and decrement by Interval
Page Up and Page Down increment and decrement by LargeInterval
This version only supports integers

Creating the control

To begin, create a new WPF project and add a new User Control called NumericEntryControl. This will create a pair of .cs and .xaml files.

In the XAML file, change the <Grid> root element to be a <DockPanel>.

<UserControl

    x:Class="NumericEntryDemo.NumericEntryControl"

    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"

    mc:Ignorable="d"

    xmlns:my="clr-namespace:NumericEntryDemo" Width="200" Height="26">

    <DockPanel>

    </DockPanel>

</UserControl>

Before we add the controls, let’s add some properties to our user control to hold the values the controls will use. These are dependency properties in order to take advantage of all the WPF goodness like data binding and animation. Let’s also add standard .Net property wrappers.

public partial class NumericEntryControl : UserControl

    public static readonly DependencyProperty ValueProperty =

        DependencyProperty.Register("Value",

            typeof(Int32), typeof(NumericEntryControl),

            new PropertyMetadata(0));

    public static readonly DependencyProperty MaxValueProperty =

        DependencyProperty.Register("MaxValue",

            typeof(Int32), typeof(NumericEntryControl),

            new PropertyMetadata(100));

    public static readonly DependencyProperty MinValueProperty =

        DependencyProperty.Register("MinValue",

            typeof(Int32), typeof(NumericEntryControl),

            new PropertyMetadata(0));

    public static readonly DependencyProperty IncrementProperty =

        DependencyProperty.Register("Increment",

            typeof(Int32), typeof(NumericEntryControl),

            new PropertyMetadata(1));

    public static readonly DependencyProperty LargeIncrementProperty =

        DependencyProperty.Register("LargeIncrement",

            typeof(Int32), typeof(NumericEntryControl),

            new PropertyMetadata(5));

    public Int32 Value

get

            return (Int32)GetValue(ValueProperty);

set

            SetValue(ValueProperty, value);

    public Int32 MaxValue

get

            return (Int32)GetValue(MaxValueProperty);

set

            SetValue(MaxValueProperty, value);

    public Int32 MinValue

get

            return (Int32)GetValue(MinValueProperty);

set

            SetValue(MinValueProperty, value);

    public Int32 Increment

get

            return (Int32)GetValue(IncrementProperty);

set

            SetValue(IncrementProperty, value);

    public Int32 LargeIncrement

get

            return (Int32)GetValue(LargeIncrementProperty);

set

            SetValue(LargeIncrementProperty, value);

Creating an incrementing TextBox

Add a TextBox inside the <DockPanel> and bind its text to the value we created in our control:

<DockPanel d:LayoutOverrides="Width">

    <TextBox

        x:Name="_textbox"

        Margin="2,0"

        Text="{Binding Value,

            Mode=TwoWay,

            RelativeSource={RelativeSource FindAncestor,

                AncestorLevel=1,

                AncestorType={x:Type my:NumericEntryControl}},

            UpdateSourceTrigger=PropertyChanged}"

        HorizontalAlignment="Stretch"

        HorizontalContentAlignment="Right"

        VerticalContentAlignment="Center" />

</DockPanel>

This will create a TextBox that sizes itself with its parent (a feature I wanted, but is not strictly necessary) and its text will be bound to the Value in our UserControl.

Handling text input

It used to be that you pointed with a mouse and entered text with a keyboard. However, it is common now to enter text with a stylus, gestures, or some future method not invented yet. Thankfully, WPF supports generic text input handling so you don’t have to concern yourself with the specific hardware.

public NumericEntryControl()

   InitializeComponent();

   _textbox.PreviewTextInput +=

        new TextCompositionEventHandler(_textbox_PreviewTextInput);

void _textbox_PreviewTextInput(object sender,

                   TextCompositionEventArgs e)

    if (!IsNumericInput(e.Text))

        e.Handled = true;

        return;

private bool IsNumericInput(string text)

    foreach (char c in text)

        if (!char.IsDigit(c))

            return false;

    return true;

This prevents anything except numbers from being entered, whether via character recognition or keyboard. It does not, however, prevent the user from pasting non-numeric text into the box. We’ll handle that later.

Validating Text input

It’s problematic to validate and correct user input as they are entering it. For example, if you set the MaxValue to 100, then every time you enter 1000, it jumps to 100, it can be jarring. It’s a similar situation with text pasted into the control. What the NumericUpDown control does is handle these sort of situations when the control loses focus.

To prepare for this, when the control gains focus, we need to save the last valid value so we have something to restore to.

When the control loses focus, we need to first verify that it is a number and if so, clip it to the bounds of our MinValue and MaxValue. If anything fails, set it back to the previous value.

public partial class NumericEntryControl : UserControl

    private int _previousValue = 0;

    public NumericEntryControl()

        InitializeComponent();

        _textbox.PreviewTextInput +=

            new TextCompositionEventHandler(

                _textbox_PreviewTextInput);

        _textbox.GotFocus +=

            new RoutedEventHandler(_textbox_GotFocus);

        _textbox.LostFocus +=

            new RoutedEventHandler(_textbox_LostFocus);

    void _textbox_GotFocus(object sender, RoutedEventArgs e)

        _previousValue = Value;

    void _textbox_LostFocus(object sender, RoutedEventArgs e)

        int newValue = 0;

        if (Int32.TryParse(_textbox.Text, out newValue))

            if (newValue > MaxValue)

                newValue = MaxValue;

            else if (newValue < MinValue)

                newValue = MinValue;

        else

            newValue = _previousValue;

        _textbox.Text = newValue.ToString();

Handle arrow keys

Just because WPF can handle text input from a variety of sources in a hardware-agnostic way doesn’t mean we should ignore the particular strengths of the keyboard. Specifically, we should handle the up and down arrows.

public NumericEntryControl()

...

    _textbox.PreviewKeyDown +=

        new KeyEventHandler(_textbox_PreviewKeyDown);

void _textbox_PreviewKeyDown(object sender, KeyEventArgs e)

    switch (e.Key)

        case Key.Up:

            IncrementValue();

            break;

        case Key.Down:

            DecrementValue();

            break;

        case Key.PageUp:

            Value = Math.Min(Value + LargeIncrement, MaxValue);

            break;

        case Key.PageDown:

            Value = Math.Max(Value - LargeIncrement, MinValue);

            break;

        default:

            //do nothing

            break;

private void IncrementValue()

    Value = Math.Min(Value + Increment, MaxValue);

private void DecrementValue()

    Value = Math.Max(Value - Increment, MinValue);

IncrementValue() and DecrementValue() are pulled out as their own method because they’re used in the button-handling code as well (see below).

The code so far is a perfectly usable textbox that accepts only numbers and can be incremented using the keyboard. Typically, however, we also need to support the mouse, and for that we need buttons (unless you want to do something exotic like programs like Photoshop and Lightroom do, where text boxes have support for incrementing gestures—that’s another article).

A button you can hold down

Adding buttons to increment once per click is pretty easy, but we really want to be able to hold down the buttons and have the TextBox increment. Let’s start by adding the XAML for the buttons:

<UserControl

    x:Class="NumericEntryDemo.NumericEntryControl"

    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"

    mc:Ignorable="d"

    xmlns:my="clr-namespace:NumericEntryDemo"

    Width="200" Height="26"

    <DockPanel d:LayoutOverrides="Width">

        <Button x:Name="buttonDecrement"

                DockPanel.Dock="Left"

                Content="-"

                Width="{Binding ActualHeight,

                    ElementName=buttonDecrement, Mode=Default}"

                Height="{Binding ActualHeight,

                    ElementName=_textbox, Mode=Default}"/>

        <Button x:Name="buttonIncrement"

                DockPanel.Dock="Right"

                Content="+"

                Width="{Binding ActualHeight,

                    ElementName=buttonDecrement, Mode=Default}"

                Height="{Binding ActualHeight,

                    ElementName=_textbox, Mode=Default}"/>

        <TextBox

            x:Name="_textbox"

            Margin="2,0"

            Text="{Binding Value,

                Mode=TwoWay,

                RelativeSource={RelativeSource FindAncestor,

                    AncestorLevel=1,

                    AncestorType={x:Type my:NumericEntryControl}},

                UpdateSourceTrigger=PropertyChanged}"

            HorizontalAlignment="Stretch"

            HorizontalContentAlignment="Right"

            VerticalContentAlignment="Center" />

    </DockPanel>

</UserControl>

Note that the button Width property is bound to its own ActualHeight property, and the Height property is bound to the TextBox’s ActualHeight. This has the effect of keeping the buttons square, the same height as the TextBox. It’s an effect I wanted, but you can easily dispose of it. With these buttons, our control finally takes shape:

How fast should we increment?

Before writing the code to do the incrementing as we hold the button down, it’s worth considering how fast the incrementing should occur. Ideally, we would want it to increment at about the same rate as if we were holding down the up key on the keyboard. The keyboard repeat rate is an operating system value that we can retrieve.

There are actually two values:

private static int _delayRate =

    System.Windows.SystemParameters.KeyboardDelay;

private static int _repeatSpeed =

    Math.Max(1, System.Windows.SystemParameters.KeyboardSpeed);

The delay rate is how long we should wait before starting the repetition, and is given in multiples of 250ms. Roughly speaking, humans can determine and control actions with lengths of time of about 200ms, so 250ms is a good value to start with. Any shorter and the repetition might start when it was not intended (say, if they just click the button instead of holding it down).

The keyboard speed is the number of times per second we should repeat—sort of. The value can be 0, so because of the way I use it below I want to sure it’s at least 1.

To allow us to hold the button down, we need to override the default mouse handling of a button which is to disable the standard LeftMouseButtonDown/Up messages. Instead, we need to handle the PreviewMouseLeftButtonDown message and its corresponding Up message.

When we handle the down message, we need to set a timer for the keyboard delay value. When we handle the timer’s tick, we need to increment (or decrement) and change the timer’s interval to the repeat speed. This repeat speed is calculated merely by dividing 1000ms (1s) by the rate per second. There may be better ways, but this gets pretty close to the rate experienced by the keyboard on my computer. Finally, when the mouse button is released we need to stop the timer. We also do a final increment, which will cover the case where the user clicks instead of holds.

We also need to capture the mouse in case the user moves it off the button—otherwise the timer will just keep incrementing forever.

Here’s the code:

private DispatcherTimer _timer =

    new DispatcherTimer();

private static int _delayRate =

    System.Windows.SystemParameters.KeyboardDelay;

private static int _repeatSpeed =

    Math.Max(1, System.Windows.SystemParameters.KeyboardSpeed);

private bool _isIncrementing = false;

public NumericEntryControl()

...

    buttonIncrement.PreviewMouseLeftButtonDown +=

        new MouseButtonEventHandler(

            buttonIncrement_PreviewMouseLeftButtonDown);

    buttonIncrement.PreviewMouseLeftButtonUp +=

        new MouseButtonEventHandler(

            buttonIncrement_PreviewMouseLeftButtonUp);

    buttonDecrement.PreviewMouseLeftButtonDown +=

        new MouseButtonEventHandler(

            buttonDecrement_PreviewMouseLeftButtonDown);

    buttonDecrement.PreviewMouseLeftButtonUp +=

        new MouseButtonEventHandler(

            buttonDecrement_PreviewMouseLeftButtonUp);

    _timer.Tick += new EventHandler(_timer_Tick);

void buttonIncrement_PreviewMouseLeftButtonDown(

    object sender, MouseButtonEventArgs e)

    buttonIncrement.CaptureMouse();

    _timer.Interval =

        TimeSpan.FromMilliseconds(_delayRate * 250);

    _timer.Start();

    _isIncrementing = true;

void buttonIncrement_PreviewMouseLeftButtonUp(

    object sender, MouseButtonEventArgs e)

    _timer.Stop();

    buttonIncrement.ReleaseMouseCapture();

    IncrementValue();

void buttonDecrement_PreviewMouseLeftButtonDown(

    object sender, MouseButtonEventArgs e)

    buttonDecrement.CaptureMouse();

    _timer.Interval =

        TimeSpan.FromMilliseconds(_delayRate * 250);

    _timer.Start();

    _isIncrementing = false;

void buttonDecrement_PreviewMouseLeftButtonUp(

    object sender, MouseButtonEventArgs e)

    _timer.Stop();

    buttonDecrement.ReleaseMouseCapture();

    DecrementValue();

void _timer_Tick(object sender, EventArgs e)

    if (_isIncrementing)

        IncrementValue();

    else

        DecrementValue();

    _timer.Interval =

        TimeSpan.FromMilliseconds(1000.0 / _repeatSpeed);

And voila! A NumericEntryControl that’s basic and easy-to-use for both keyboard and mouse.

Further improvements

This isn’t the last word in numeric entry controls, by any means. There are many ways to accomplish it, and this is one that worked well for me. There are a number of further enhancements you could do (and perhaps should do):

More validation
Ensure that MaxValue >= MinValue
Set focus to the TextBox when the control gains focus (maybe)
Define strokes to increment and decrement with a stylus
The user can change the keyboard repeat rate through Control Panel. This program could be modified to listen for updates to this value.

There’s also another, really cool feature that I will add to this in a future post.

Download full source.

Found this article helpful? Want a resource of hundreds of similar, how-to tips? I’ve written a book that covers topics like this and more in C# 4. Check out C# 4.0 How-To!

How to learn WPF (or anything else)

I’ve recently been learning WPF. This is a huge topic that is uncontainable by any single book, tutorial, or web-site. The complexity and breadth of this framework is nearly oppressive, but the results are incredible. Or rather, I should say, potentially incredible.

Like this? Please check out my latest book, Writing High-Performance .NET Code.

From everything I’ve read, people who have suffered through the WPF learning curve have this to say, more or less:

yeah, it was really tough going for a few months. But now I can create awesome apps in a fraction of the time it would take with older technologies.

So with that in mind, I really do want to learn WPF. I have a number of C# references, weighty tomes that bend my shelves, but the main book I use is Programming WPFby Chris Sells and Ian Griffiths. I really like this book—it goes in deep. However, I realized that reading through it cover to cover and doing all the sample apps wasn’t going to work—it gets boring, no matter how good the book is. So here is my recommendation on how to learn WPF (and it probably applies to any programming technology):

Start reading the book, do the code, type stuff in, copy it, tweak it. Do this for as long as you can.
Once step 1 becomes boring, STOP. It is not productive to force yourself through the whole thing like this.
Find a sample project in your target technology. I used Family.Show. There are plenty out there.
Think of a project YOU find interesting that would be good in [WPF|other]. Start doing this. Even if you don’t know where to start at all.
While getting started, every step will be a challenge. Figure it out step-by-step, going back to the book and online resources.

You might be tempted to skip steps 1-2. I think this is a bad idea. You need at least some foundational understanding. Only when you can’t take it any more and you’re in danger of quitting, move on.

This has worked well for me in learning WPF. I decided to implement a game (if it ever gets into a polished state, I’ll share it).

Don’t underestimate the challenges in step 4, though. I had to think about how to even start, going back to the book numerous times, reading large sections. I looked up articles online about patterns and WPF, user controls, and more. Many seemingly-small steps in just displaying windows took hours to figure out. Figuring out data binding (really figuring it out in the context of my app) took hours. The point of doing your own project isn’t because it’s easier than following the book—it’s because it’s fun and you have more motivation to learn.

Philosophical Geek

Code and musings by Ben Watson

Category Archives: Tips

4 Essential Tips for High-Performance Garbage Collection on Servers

Tip 1: Use Server GC

Tip 2: Objects Live Briefly or Forever

Tip 3: All Long-Lived Objects Must Be Pooled

Tip 4: All Large Objects Must Be Pooled

Next Time…

Alternative to Double-Checked Locking: Lazy<T>

Don’t Log Exception Stacks Unless You Can Afford It

Concurrency, Performance, Arrays and when Dirty Writes are OK

Introduction

Histogram

Lock first to Ensure Correctness

Approximation can be Good Enough

Know Your Cache Line Size

Summary

Keep API Usage Simple

Activator.CreateInstance: Slow vs. Less-Slow

Performance and Other Issues with System.Enum

General Purpose Methods May Not Be Fast

HasFlag Gotcha

A WPF Numeric Entry Control

Creating the control

Creating an incrementing TextBox

Handling text input

Validating Text input

Handle arrow keys

A button you can hold down

How fast should we increment?

Further improvements

How to learn WPF (or anything else)