Tag Archives: bug

Get Your Thread Synchronization Right the First Time

I was recently debugging a problem that just didn’t make any sense at first.

The code looked like this:

class App
{
    public bool IsRunning {get; set;}
    private Thread houseKeepingThread;
    
    public void Start()
    {
        this.IsRunning = true;
        this.houseKeepingThread = new Thread(ThreadFunc);
        this.houseKeepingThread.Start();
    }
    
    private void ThreadFunc()
    {
        while (this.IsRunning)
        {
            DoWork();
            // wait for 30 seconds
        }
    }
};

The actual code was of course a bit more complicated, but this demonstrates the essence of the problem.

The outward manifestation of the bug was that there was evidence that DoWork wasn’t being called over time as it should have.

To debug this, I first concentrated on reasons that the thread could end early, and none of them made any sense.

To finally figure it out, I attached a debugger to a process known to be in a suspect state and discovered evidence in the state of the App object that DoWork had never run, not even once.

I stared at the code for five seconds and then said out loud: “IsRunning isn’t volatile!”

Despite setting IsRunning to true before starting the thread, it is completely possible that the thread starts running before the memory is coherent across all of the threads.

What’s amazing about this bug is that it existed from the very beginning of this application. As in checkin #1. It has probably been causing problems for a while at some low level, but it recently must have gotten worse—it could be for any reason, some change in the pattern of execution, the load on the system, who knows.

The fix was quite easy:

Make a volatile backing field for the IsRunning property.

By using volatile, there will be a memory barrier that will enforce coherency across all threads that access this variable. Problem solved.

The lesson here isn’t about any particular debugging skills. The real lesson is to make sure you get this right in the first place, because debugging it takes far longer and is much trickier than just reasoning about it in the first place. These types of problems can stay hidden and extremely rare for a very long time until some other innocuous change causes them to start manifesting in strange, seemingly impossible ways.

If you want to learn more about memory barriers and when they’re required, you can check out Chapter 4 of Writing High-Performance .NET Code, and there is also this tutorial by Joseph Albahari.