Digging Into .NET Loop Performance, Bounds-checking, Iteration, and Unrolling

Introduction

In this article I am going to go into detail about how .NET treats loops involving array and collection access and what kinds of optimizations you can expect. I had mentioned some of these in a previous article, but in this article, we’re going to go deep and see many more examples. We’re going to dig into the IL that gets generated as well the x64 assembly code that actually executes.

Will these optimization matter to your program? Only if your program is CPU bound and collection iteration is a core part of your processing. As you’ll see, there are ways you can hurt your performance if you’re not careful, but that only matters if it was a significant part of your program to begin with. My philosophy is this: most people never need to know this, but if you do, then understanding every layer of the system is important so that you can make intelligent choices. That’s the reason I wrote my book, Writing High-Performance .NET Code, and why I write these articles—to give you the information about what’s actually going on when you compile and execute a .NET program.

Some of this information you can read elsewhere, but I want to show you how to get at this yourself so that you don’t have to rely on someone publically coming out with this information, especially as new version of the JIT are released.

Tools I used: ILSpy to view IL, Visual Studio 2013 to debug and view assembly with .NET 4.5. I compiled the code in Release mode, targeting x64 explicitly. When debugging, don’t start with debugging from Visual Studio–that will not give you an accurate picture of your code. Instead, start the program separately and attach the debugger. I accomplish this easily by putting a call to Debugger.Launch() in the beginning of Main(), and then executing the program with Ctrl-F5.

A note on assembly. I chose x64 because it should be the default for development these days. For register names mentioned below, understand that things like eax and rax refer to the same register–rax is the 64-bit version of this register, and is essentially a sign-extended version of eax, but they both refer to the same physical location. Same with ecx/rdx, ecx/rcx, etc.

Bounds Checks

Bounds checks exist in .NET because the CLR must ensure memory safety at all times in managed code. It needs to simply be impossible for managed code to corrupt memory outside of the defined data structures. This means that code like the following cannot be allowed to execute:

int[] array = new int[100];
array[100] = 1;

Attempting to execute this should result in an exception rather than allow the memory corruption to occur. That means that the CLR has to inject bounds checking into array access. This can make repeated array access twice as slow if some optimizations aren’t done. This knowledge makes some people leery of using .NET for high-performance code, but we’ll see that the JIT compiler does indeed make these optimizations in some cases (and falls short in others). However, remember what I said in the introduction: these kinds of optimizations only matter if you’re CPU bound and this code is in your innermost, most called-upon code. That said, even if you don’t have to worry about performance on this level, knowing how to investigate these types of details is instructive.

Baseline Loop

To start, let’s take a look at the most basic of loops:

[MethodImpl(MethodImplOptions.NoInlining)]
private static long GetSum(int[] array)
{ 
    long sum = 0; 
    for (int i =0; i < array.Length; i++)
    { 
        sum += array[i]; 
    } 
    return sum; 
}

It simply loops through the entire array and sums all the elements. I’m explicitly disabling inlining because that can muddy the analysis a bit, and we want to focus strictly on loop optimization, not other improvements that the JIT compiler can and will do here.

In IL, this method looks like this (with my own annotations):

.method private hidebysig static 
	int64 GetSum (
		int32[] 'array'
	) cil managed noinlining
{
	// Method begins at RVA 0x20a4
	// Code size 26 (0x1a)
	.maxstack 3
	.locals init (
		[0] int64 sum,
		[1] int32 i
	)

	IL_0000: ldc.i4.0    // Push 0 onto evaluation stack as a 4-byte integer
	IL_0001: stloc.0     // Pop from evaluation stack and store into local variable at location 0, sum.
	IL_0002: ldc.i4.0    // Push 0 to evaluation stack
	IL_0003: stloc.1     // Pop and store in local variable i.
	IL_0004: br.s IL_0012     //  Jump to IL_0012 (unconditionally)
	// loop start (head: IL_0010)
		IL_0006: ldloc.0    // Push sum onto stack
		IL_0007: ldarg.0    // Push array onto stack
		IL_0008: ldloc.1    // Push i onto stack
		IL_0009: ldelem.i4  // Pop array and i off the stack and push array[i] onto the stack
		IL_000a: add        // Pop array[i] and sum off the stack, add, and push result to stack
		IL_000b: stloc.0    // Pop result off stack and store in sum
		IL_000c: ldloc.1    // Push i onto stack
		IL_000d: ldc.i4.1   // Push 1 onto stack
		IL_000e: add        // Pop i and 1 off of stack, add, and push result to stack
		IL_000f: stloc.1    // Pop result off stack and store in i

		IL_0010: ldloc.1    // Push i onto stack
		IL_0011: ldarg.0    // Push array onto stack
		IL_0012: ldlen      // Pop array off of stack, get length, and push length onto stack
		IL_0013: conv.i4    // Convert length to 4-byte integer
		IL_0014: blt.s IL_0006    // Pop i and length off of stack. If i < length, jump to IL_0006
	// end loop

	IL_0016: ldloc.0    // Push sum onto stack
	IL_0017: ret        // Return
} // end of method Program::GetSum

That looks like an awful lot of work to execute a loop. IL is a simple, stack-based language that ends up being quite verbose, but this gets translated by the JIT compiler into real machine code that takes advantage of efficient CPU instructions and registers. The resulting assembly is this:

00007FFADFCC02D0  mov         rdx,rcx                 // Copy array address to rdx
00007FFADFCC02D3  xor         ecx,ecx                 // Set ecx (sum) to 0
00007FFADFCC02D5  mov         r10,qword ptr [rdx+8]   // Copy array length to r10
00007FFADFCC02D9  test        r10d,r10d               // See if length is 0
00007FFADFCC02DC  jle         00007FFADFCC0303        // If so, skip the loop
00007FFADFCC02DE  mov         r8d,ecx                 // Copy 0 to r8 (source of ecx is not relevant)
00007FFADFCC02E1  xor         r9d,r9d                 // Set r9 to 0 (memory offset into array)
00007FFADFCC02E4  nop         word ptr [rax+rax]      // NOP
00007FFADFCC02F0  mov         eax,dword ptr [rdx+r9+10h]  // Copy array[i] to eax
00007FFADFCC02F5  add         ecx,eax                 // Add value to sum
00007FFADFCC02F7  inc         r8d                     // i++
00007FFADFCC02FA  add         r9,4                    // Increment memory address by 4 bytes
00007FFADFCC02FE  cmp         r8d,r10d                // i < length ?
00007FFADFCC0301  jl          00007FFADFCC02F0        // If so, loop back a few statements for another iteration
00007FFADFCC0303  mov         eax,ecx                 // Copy sum to return register
00007FFADFCC0305  ret                                 // Return

There are a few things to notice here:

  1. The code is tracking both the index value i as well as the direct memory address of the array values, which is far more efficient than doing a multiplication and offset on every loop iteration.
  2. There are no array bounds checks! This is sometimes a worry with people new to .NET—how can we maintain memory safety without boundary checks? In this case, the JIT compiler determined that the boundary checks weren’t necessary because it knew all of the constraints of this loop.

Bringing Back the Bounds Check

Now let’s introduce some variability. If instead, we pass in some boundaries to iterate through, does this change the situation?

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetSum(int[] array, int start, int end)
{
    int sum = 0;
    for (int i = start; i < end; i++)
    {
        sum += array[i];
    }
    return sum;
}
.
.
.
// Called with:
GetSum(array, 25, 75);

The IL differences aren’t very interesting so I won’t show it, but the machine instructions definitely change:

00007FFADFCC0320  sub         rsp,28h               // Adjust stack pointer
00007FFADFCC0324  mov         r9,rcx                // Copy array address to r9
00007FFADFCC0327  xor         ecx,ecx               // Set ecx (sum) to 0
00007FFADFCC0329  cmp         edx,r8d               // edx has i in it, initialized with start (25). Compare that to r8, which has the end value (75)
00007FFADFCC032C  jge         00007FFADFCC0348      // If start >= end, skip loop.
00007FFADFCC032E  mov         r10,qword ptr [r9+8]  // Copy array length to r10
00007FFADFCC0332  movsxd      rax,edx               // Copy start to rax
00007FFADFCC0335  cmp         rax,r10               // Compare i to array length
00007FFADFCC0338  jae         00007FFADFCC0350      // If i >= length, jump to this address (below)
00007FFADFCC033A  mov         eax,dword ptr [r9+rax*4+10h]  // Copy array[i] to eax
00007FFADFCC033F  add         ecx,eax               // sum += array[i]
00007FFADFCC0341  inc         edx                   // i++
00007FFADFCC0343  cmp         edx,r8d               // Compare i to end value
00007FFADFCC0346  jl          00007FFADFCC0332      // If i < jump to beginning of loop (above)
00007FFADFCC0348  mov         eax,ecx               // Copy sum to return register
00007FFADFCC034A  add         rsp,28h               // Fix up stack pointer
00007FFADFCC034E  ret                               // Return

00007FFADFCC0350  call        00007FFB3F7E6590      // Throw an exception!

The most significant difference is that now on every single iteration of the loop, it’s comparing the array index to the length of the array, and if it goes over we’ll get an exception. With this change, we’ve doubled the running time of the loop. The bounds check is there.

So why does it do this now? Is it because we’re doing a subarray? That’s easy to test. You can modify the original function at the top of this article to iterate from 25 to 75 and you’ll see that the compiler does not insert a boundary check.

No, the reason for the boundary check in this case is because the compiler can’t prove that the input values are 100% safe.

Ok, then. What about checking the start and end variables before the loop? I suspect that quickly leads to the problem of trying to understand the code. As soon as the constraints are not constant, you have to deal with variable aliasing, side effects, and tracking the usage of each value—and that’s not really practical for a JIT compiler. Sure, in my simple method, maybe it would be easy, but the general case might be extremely difficult if not impossible.

Other Ways of Getting Bounds Checks

Even simple modifications to the indexing variable, that you can logically prove cannot exceed the array boundaries may cause the bounds check to appear:

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Manipulate(int[] array)
{
    for (int i=0; i < array.Length; i++)
    {
        array[i] = array[i / 2];
    }
}

There is no way i / 2 can exceed array.Length, but the bounds check is still there:

00007FFADFCE03B0  sub         rsp,28h              // Adjust stack pointer  
00007FFADFCE03B4  mov         r8,qword ptr [rcx+8] // Copy array length to r8
00007FFADFCE03B8  test        r8d,r8d              // Test if length is 0
00007FFADFCE03BB  jle         00007FFADFCE03E8     // If 0, skip loop
00007FFADFCE03BD  xor         r9d,r9d              // Set i to 0
00007FFADFCE03C0  xor         r10d,r10d            // Set destination index to 0
00007FFADFCE03C3  mov         eax,r9d              // Set eax (source array index) to 0
00007FFADFCE03C6  cdq                              // sign extends eax into edx (needed for division)
00007FFADFCE03C7  sub         eax,edx              // eax = eax - edx 
00007FFADFCE03C9  sar         eax,1                // Shift right (divide by two)
00007FFADFCE03CB  movsxd      rax,eax              // copy 32-bit eax into 64-bit rax and sign extend
00007FFADFCE03CE  cmp         rax,r8               // Compare source index against array length
00007FFADFCE03D1  jae         00007FFADFCE03F0     // If greater than or equal, jump and throw exception
00007FFADFCE03D3  mov         eax,dword ptr [rcx+rax*4+10h]  // Copy value from array[i/2] to eax
00007FFADFCE03D7  mov         dword ptr [rcx+r10+10h],eax    // Copy value from eax to array[i]
            
00007FFADFCE03DC  inc         r9d                  // Increment i
00007FFADFCE03DF  add         r10,4                // Increment destination address by 4 bytes
00007FFADFCE03E3  cmp         r9d,r8d              // Compare i against array length
00007FFADFCE03E6  jl          00007FFADFCE03C3     // If less, loop again
00007FFADFCE03E8  add         rsp,28h              // Adjust stack
00007FFADFCE03EC  ret                              // return

00007FFADFCE03F0  call        00007FFB3F7E6590     // Throw exception

So even though we can prove that it’s safe, the compiler won’t.

There’s another way, that’s a bit more annoying. Remember the very first, baseline example, the one that did NOT have a bounds check. What if we had that exact code, but it operated on a static value rather than argument?

00007FFADFCE0500  mov         rax,0A42A235780h  
00007FFADFCE050A  mov         rax,qword ptr [rax]  
00007FFADFCE050D  movsxd      r9,ecx  
00007FFADFCE0510  mov         r8,qword ptr [rax+8]  
00007FFADFCE0514  cmp         r9,r8                 // bounds check!
00007FFADFCE0517  jae         00007FFADFCE0530  
00007FFADFCE0519  mov         eax,dword ptr [rax+r9*4+10h]  
00007FFADFCE051E  add         edx,eax  
00007FFADFCE0520  inc         ecx  
00007FFADFCE0522  cmp         ecx,r8d  
00007FFADFCE0525  jl          00007FFADFCE0500  
00007FFADFCE0527  mov         eax,edx  
00007FFADFCE0529  add         rsp,28h  
00007FFADFCE052D  ret  
                            
00007FFADFCE0530  call        00007FFB3F7E6590  

The bounds check is there.

Ok, so what if we’re clever and we pass the static array to our baseline method, which doesn’t know it’s static? There is no bounds check in this case–it’s exactly the same as the baseline case.

Another case where the JIT fails to optimize where it could is iterating an array backwards:

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetBackwardsSum(int[] array)
{
    int sum = 0;
    for (int i = array.Length - 1; i >= 0; i--)
    {
        sum += array[i];
    }
    return sum;
}

Sad, but true–as much as this looks like the baseline example, it does not have the optimization.

Finally, consider the case of repeated access:

for (int i = 2;i < array.Length - 1; i++)
{
    int x = array[i-2];
    int y = array[i-2];
}

In this case, the compiler will emit a single bounds check that suffices for both accesses.

Using fixed to Eliminate Bounds Checks

Another way to eliminate bounds checks is to use the fixed keyword. I don’t really recommend it because it has safety and security considerations that you should definitely consider. However, if you really have a need for optimal array access with some nontrivial access pattern that would otherwise trigger a bounds check, this may work for you. In order to use fixed, you must enable unsafe code for your project.

Here are two options for using fixed to iterate through an array:

[MethodImpl(MethodImplOptions.NoInlining)]
unsafe private static int GetBackwardsSumFixedSubscript(int[] array)
{
    int sum = 0;
    fixed (int* p = array)
    {
        for (int i = array.Length - 1; i >= 0; i--)
        {
            sum += p[i];
        }
    }
    return sum;
}

[MethodImpl(MethodImplOptions.NoInlining)]
unsafe private static int GetBackwardsSumFixedPointer(int[] array)
{
    int sum = 0;
    fixed (int* p = &array[array.Length - 1])
    {
        int* v = p;
        for (int i = array.Length - 1; i >= 0; i--)
        {
            sum += *v;
            v--;
        }
    }
    return sum;
}

Either one will work, but keep in mind that unsafe really means unsafe! You can trample your memory in this method. I recommend keeping all unsafe code segregated to its own module, class, or otherwise limited scope and that it receives extra scrutiny during code reviews. You may also have to deal with organizational security issues with deploying unsafe code. If you can eliminate the bounds checking with pure managed code, strongly prefer that approach instead.

Foreach vs. For

Now, let’s shift our focus up one level of abstraction. In .NET, you can iterate over most collections with the foreach keyword. foreach is in most cases syntactic sugar for calling methods like GetEnumerator(), MoveNext(), and so forth. In this respect, it would be fair to assume that foreach has worse performance overall than a simple for or while loop.

However, that’s not quite the case. In some instances, the compiler will recognize a simple for loop on an array for what it is, regardless of the syntax used.

Foreach on Arrays

If we take our baseline loop from above and convert it to foreach, like this:

private static int GetSumForeach(int[] array)
{
    int sum = 0;
    foreach(var val in array)
    {
        sum += val;
    }
    return sum;
}

What is the generated IL in this case?

.method private hidebysig static 
	int32 GetSumForeach (
		int32[] 'array'
	) cil managed 
{
	// Method begins at RVA 0x2098
	// Code size 28 (0x1c)
	.maxstack 2
	.locals init (
		[0] int32 sum,
		[1] int32 val,
		[2] int32[] CS$6$0000,
		[3] int32 CS$7$0001
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldarg.0
	IL_0003: stloc.2
	IL_0004: ldc.i4.0
	IL_0005: stloc.3
	IL_0006: br.s IL_0014
	// loop start (head: IL_0014)
		IL_0008: ldloc.2
		IL_0009: ldloc.3
		IL_000a: ldelem.i4
		IL_000b: stloc.1
		IL_000c: ldloc.0
		IL_000d: ldloc.1
		IL_000e: add
		IL_000f: stloc.0
		IL_0010: ldloc.3
		IL_0011: ldc.i4.1
		IL_0012: add
		IL_0013: stloc.3

		IL_0014: ldloc.3
		IL_0015: ldloc.2
		IL_0016: ldlen
		IL_0017: conv.i4
		IL_0018: blt.s IL_0008
	// end loop

	IL_001a: ldloc.0
	IL_001b: ret
} // end of method Program::GetSumForeach

Compare that to the IL from the baseline loop. The only difference is the presence of some additional, generated local variables to store the array and index variable. Otherwise, it looks exactly like a loop. And, in fact, the assembly code also looks exactly the same (including elimination of the bounds check):

00007FFADFCB01F0  mov         rdx,rcx  
00007FFADFCB01F3  xor         ecx,ecx  
00007FFADFCB01F5  mov         r10,qword ptr [rdx+8]  
00007FFADFCB01F9  test        r10d,r10d  
00007FFADFCB01FC  jle         00007FFADFCB0223  
00007FFADFCB01FE  mov         r8d,ecx  
00007FFADFCB0201  xor         r9d,r9d  
00007FFADFCB0204  nop         word ptr [rax+rax]  
00007FFADFCB0210  mov         eax,dword ptr [rdx+r9+10h]  
00007FFADFCB0215  add         ecx,eax  
00007FFADFCB0217  inc         r8d  
00007FFADFCB021A  add         r9,4  
00007FFADFCB021E  cmp         r8d,r10d  
00007FFADFCB0221  jl          00007FFADFCB0210  
00007FFADFCB0223  mov         eax,ecx  
00007FFADFCB0225  ret  

For loop on List<T>

Before seeing foreach operate on List<T>, we should see what for looks like:

private static int GetSumFor(List<int> array)
{
    int sum = 0;
    for (int i = 0; i < array.Count; i++)
    {
        sum += array[i];
    }
    return sum;
}

And the corresponding IL:

.method private hidebysig static 
	int32 GetSumFor (
		class [mscorlib]System.Collections.Generic.List`1 'array'
	) cil managed 
{
	// Method begins at RVA 0x20a0
	// Code size 31 (0x1f)
	.maxstack 3
	.locals init (
		[0] int32 sum,
		[1] int32 i
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldc.i4.0
	IL_0003: stloc.1
	IL_0004: br.s IL_0014
	// loop start (head: IL_0014)
		IL_0006: ldloc.0
		IL_0007: ldarg.0
		IL_0008: ldloc.1
		IL_0009: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1::get_Item(int32)
		IL_000e: add
		IL_000f: stloc.0
		IL_0010: ldloc.1
		IL_0011: ldc.i4.1
		IL_0012: add
		IL_0013: stloc.1

		IL_0014: ldloc.1
		IL_0015: ldarg.0
		IL_0016: callvirt instance int32 class [mscorlib]System.Collections.Generic.List`1::get_Count()
		IL_001b: blt.s IL_0006
	// end loop

	IL_001d: ldloc.0
	IL_001e: ret
} // end of method Program::GetSumFor

It looks pretty much like the for loop on an array, but with some method calls instead of direct memory accesses to retrieve the values and the length (Count).

However, the assembly code generated from this is more interesting and shows some subtle and important differences:

00007FFADFCD0BE0  push        rbx  	// Stack frame setup
00007FFADFCD0BE1  push        rsi  
00007FFADFCD0BE2  push        rdi  
00007FFADFCD0BE3  sub         rsp,20h  
00007FFADFCD0BE7  mov         rdx,rcx    // Copy array address to rdx
00007FFADFCD0BEA  xor         ecx,ecx    // Set i to 0. i is used only for the loop condition.
00007FFADFCD0BEC  xor         r11d,r11d  // Set j to 0. j is used as the iterator on the internal array inside List
00007FFADFCD0BEF  mov         r8d,ecx    // Set sum to 0
00007FFADFCD0BF2  mov         ebx,dword ptr [rdx+18h]    // Copy length to ebx
// LOOP START
00007FFADFCD0BF5  cmp         ecx,ebx                    // Compare i to length
00007FFADFCD0BF7  jl          00007FFADFCD0C00           // if less, jump below
00007FFADFCD0BF9  mov         eax,r8d                    // else copy sum to return value
00007FFADFCD0BFC  jmp         00007FFADFCD0C26           // jump to end of function
00007FFADFCD0BFE  xchg        ax,ax                      // No-op

00007FFADFCD0C00  mov         eax,dword ptr [rdx+18h]    // Copy list's length to eax
00007FFADFCD0C03  cmp         ecx,eax                    // Compare i to length
00007FFADFCD0C05  jae         00007FFADFCD0C35           // if greater than or equal, jump and throw exception
00007FFADFCD0C07  mov         r9,qword ptr [rdx+8]       // Copy List's array address to r9
00007FFADFCD0C0B  movsxd      r10,r11d                   // Copy j  to r10 and sign extend.
00007FFADFCD0C0E  mov         rax,qword ptr [r9+8]       // Copy array's length to rax
00007FFADFCD0C12  cmp         r10,rax                    // Compare j to array' length
00007FFADFCD0C15  jae         00007FFADFCD0C30           // If greater than or equal, jump and throw exception
00007FFADFCD0C17  mov         eax,dword ptr [r9+r10*4+10h]  // copy array[j] (list[i]) to eax
00007FFADFCD0C1C  add         r8d,eax                    // sum += array[j]
00007FFADFCD0C1F  inc         ecx                        // i++
00007FFADFCD0C21  inc         r11d                       // j++
00007FFADFCD0C24  jmp         00007FFADFCD0BF5           // Jump to beginning of loop 
00007FFADFCD0C26  add         rsp,20h                    // Clean up stack
00007FFADFCD0C2A  pop         rdi  
00007FFADFCD0C2B  pop         rsi  
00007FFADFCD0C2C  pop         rbx  
00007FFADFCD0C2D  ret                                    // return

00007FFADFCD0C30  call        00007FFB3F7E6590           // Throw exception

00007FFADFCD0C35  mov         ecx,0Dh                    
00007FFADFCD0C3A  call        00007FFB3CD990F8           // Throw exception

Here’s what’s going on:

  • The method calls on List<int> have been inlined away—we get direct memory access to the underlying array that List<int> maintains.
  • There is an iteration on i that compares it to the length of the List<int>.
  • There is a separate iteration on j (which I call j in the annotations), which compares it against the length of the internal array that List<int> wraps.
    • Failing either iteration will cause an exception to be thrown.

The duplicate iterations don’t really cause the loop to be twice as slow as a pure array would be—there is still only one memory access and every thing else is operating on registers, which is extremely efficient. But if you have the choice between arrays and List<T>, well, it’s hard to beat an array for pure efficiency.

Foreach on List<T>

Finally, let’s turn our attention to foreach on List<T>. Will the compiler (or the JIT) turn this into a straightforward for loop?

No. Here’s the IL:

.method private hidebysig static 
	int32 GetSumForeach (
		class [mscorlib]System.Collections.Generic.List`1 list
	) cil managed 
{
	// Method begins at RVA 0x20f4
	// Code size 50 (0x32)
	.maxstack 2
	.locals init (
		[0] int32 sum,
		[1] int32 val,
		[2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator CS$5$0000
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldarg.0
	IL_0003: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator< !0> class [mscorlib]System.Collections.Generic.List`1::GetEnumerator()
	IL_0008: stloc.2
	.try
	{
		IL_0009: br.s IL_0017
		// loop start (head: IL_0017)
			IL_000b: ldloca.s CS$5$0000
			IL_000d: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator::get_Current()
			IL_0012: stloc.1
			IL_0013: ldloc.0
			IL_0014: ldloc.1
			IL_0015: add
			IL_0016: stloc.0

			IL_0017: ldloca.s CS$5$0000
			IL_0019: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator::MoveNext()
			IL_001e: brtrue.s IL_000b
		// end loop

		IL_0020: leave.s IL_0030
	} // end .try
	finally
	{
		IL_0022: ldloca.s CS$5$0000
		IL_0024: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator
		IL_002a: callvirt instance void [mscorlib]System.IDisposable::Dispose()
		IL_002f: endfinally
	} // end handler

	IL_0030: ldloc.0
	IL_0031: ret
} // end of method Program::GetSumForeach

I won’t show the assembly code for this, but suffice it to say, it’s also quite different.

Foreach on (IEnumerable<T>)array

What would you expect to happen with code like this?

private static int GetSumForeach(int[] array)
{
    var collection = array as IEnumerable<int>;
    int sum = 0;
    foreach (var val in collection)
    {
        sum += val;
    }
    return sum;
}

In this case, the array optimization will be lost and you’ll get the same thing you did with foreach over List<T>.

Loop Unrolling

Loop unrolling is the practice of reducing the overhead of a loop by causing fewer iterations, but more work per iteration. For loop-heavy code, it can make a big difference.

You can do this unrolling yourself, such as with the following example:

private static int GetSum(int[] array)
{
    int sum = 0;
    for (int i = 0; i < array.Length-4; i+=4)
    {
        sum += array[i];
        sum += array[i+1];
        sum += array[i+2];
        sum += array[i+3];
    }
    return sum;
}

In some cases, the compiler can do this kind of optimization for you, but for loop unrolling to take place, there is one big constraint that must be obeyed: The loop constraints must be statically determinable so that the compiler can know how to modify the loop.

This is why we don’t see any loop unrolling in the assembly code for any of our previous examples. The length of the loop was always variable. What if, instead, we have something like this:

const int ArrayLength = 100;

private static int GetSum(int[] array)
{
    int sum = 0;
    for (int i = 0; i < ArrayLength; i++)
    {
        sum += array[i];
    }
    return sum;
}

The IL for this method looks about the same as before, but the assembly shows the optimization:

00007FFADFCE01C0  sub         rsp,28h              // Setup stack frame
00007FFADFCE01C4  mov         rdx,rcx              // Copy array address to rdx
00007FFADFCE01C7  xor         ecx,ecx              // Set sum to 0
00007FFADFCE01C9  xor         r8d,r8d              // Set address offset to 0
00007FFADFCE01CC  mov         rax,qword ptr [rdx+8]// Copy array length to rax
00007FFADFCE01D0  mov         r9d,60h              // Copy value 96 to r9
00007FFADFCE01D6  cmp         r9,rax               // Compare 96 to length
00007FFADFCE01D9  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01DB  mov         r9d,61h              // Copy value 97 to r9
00007FFADFCE01E1  cmp         r9,rax               // Compare to length
00007FFADFCE01E4  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01E6  mov         r9d,62h              // Copy value 98 to r9
00007FFADFCE01EC  cmp         r9,rax               // Compare to length
00007FFADFCE01EF  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01F1  mov         r9d,63h              // Copy value 99 to r9
00007FFADFCE01F7  cmp         r9,rax               // Compare to length
00007FFADFCE01FA  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01FC  nop         dword ptr [rax]      // Nop
// LOOP START
00007FFADFCE0200  mov         eax,dword ptr [rdx+r8+10h]  // Copy array[i] to eax
00007FFADFCE0205  add         ecx,eax                     //sum += array[i]
00007FFADFCE0207  mov         eax,dword ptr [rdx+r8+14h]  // Copy array[i+1] to eax
00007FFADFCE020C  add         ecx,eax                     // sum += array[i+1]
00007FFADFCE020E  mov         eax,dword ptr [rdx+r8+18h]  // Copy array[i+2] to eax
00007FFADFCE0213  add         ecx,eax                     // sum += array[i+2]
00007FFADFCE0215  mov         eax,dword ptr [rdx+r8+1Ch]  // Copy array[i+3] to eax
00007FFADFCE021A  add         ecx,eax                     // sum += array[i+3]
00007FFADFCE021C  add         r8,10h                      // Increment address offset by 16 bytes (4 integers)
00007FFADFCE0220  cmp         r8,190h                     // Compare address offset to 400
00007FFADFCE0227  jl          00007FFADFCE0200            // If less, loop around
00007FFADFCE0229  mov         eax,ecx                     // Copy sum to return value
00007FFADFCE022B  add         rsp,28h                     // Fix up stack
00007FFADFCE022F  ret                                     // return

00007FFADFCE0230  call        00007FFB3F7E6590            // throw exception

The basic operation of this is:

  1. Check that we can safely do increments of N values and overflow the array (N being 4 in this specific case).
  2. Do N operations per loop body.

As soon as you deviate from the static determinism, however, you will not be able to rely on this optimization. However, if you have special knowledge of your arrays (such as that their length is always a multiple of 4), then you can do the loop unrolling yourself in the source code.

As far as the current JIT compiler goes, you will lose this optimization if you change the array size in this example to 101. Some compilers will split this–an unrolled loop, with the leftovers tacked onto the end. The JIT compiler doesn’t do this at this time (remember, it’s under a severe time constraint).

The ultimate loop unrolling of course is just eliminating the loop altogether and listing out every operation individually. For small operations, this could be advisable–it’s just a tradeoff between code maintenance and performance that you have to make.

Summary

Understanding how small differences in code translate into more or less efficient machine code is important if you need to get the last small measure of performance on your more performance-critical components, especially if you are CPU bound.

Of all of the collections, arrays are the most efficient to access, but List<T> is very close. When iterating over arrays, you want to try to eliminate the bounds check if possible. When deciding between for and foreach, keep in mind that foreach can involve method calls for anything except pure arrays.

Understand the conditions that allow for loop unrolling. If you can formulate your code to allow it, and it makes a difference, go for it, or manually unroll loops if you get a performance gain out of it.

Most of all, figure out how to answer these types of questions for yourself so that the next time deep performance questions arise, you know how to quickly determine the answer.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Tips for Writing a Programming Book

In this article, I want to go through the process of writing a book, to give others who are thinking of writing a general idea of what kinds of things you will need to do. This is the article I wish I had read before starting out. Some of this will be applicable to fiction writing, but there are better resources if that’s what you’re interested in. I can only share what I personally know.

Before-You-Start Checklist

Here’s a short list-based summary of many of the things I’ll discuss in this article. These are all things to think about before you start writing and throughout the process:

  1. Do you have enough material for a book? Write out an outline to make sure. Iterate over it until it’s quite detailed.
  2. Quash self-doubt! If no one else has written this book, then it’s up to you!
  3. Do you want or need to go with self-publishing?
  4. Who will be your editor? (You do need one.)
  5. Should you do a print edition? What’s the market for that?
  6. When are you going to make the time to write?
  7. Do you have a place you can write uninterrupted?
  8. Will your family support the time commitment?
  9. Does your day job have a moonlighting policy? Follow it!
  10. Don’t use DRM!
  11. Familiarize yourself with the various publishers’ processes. Do some dry runs. Figure out what’s going to work and what’s not.
  12. How are you going to market your book? Get started on laying the groundwork for that early. Start tweeting and blogging in anticipation.
  13. When marketing your book, remember to Always Be Providing Value.
  14. Price sanely and fairly. Chances are that in unless you’re fulfilling a unique, high-in-demand niche, you’re not going to make a lot of money, certainly not enough to quit your day job. Optimize for the number of sales, and be competitive with others in your genre. The satisfaction of producing something meaningful and sharing your knowledge with others is worth the effort you are putting in.

Writing a book is not easy. Just because you know stuff does not mean you can write a book about it. But it is a prerequisite. I’m the author of two non-fiction computer programming books. The first, C# 4 How-To was published in March 2010 by Sams. My most recent book is Writing High-Performance .NET Code. This was self-published, for reasons I’ll get to later.

After the experience of writing my first book, I promised myself and whoever would listen that I would never write another book. The process was too grueling for the payoff. The money just isn’t there. Sure, you do get some, but if you have a well-paying full-time job, it’s a drop in the bucket. You should not write books for riches–chances are you won’t get them.

The rest of this article is part advice, part journal. Get out of this what you will.

Inspiration

At my day job in Bing, I’ve spent the last few years becoming something of an expert in the area of .NET performance. I wrote a very significant part of the Bing query-processing stack in .NET and performance is obviously a vital consideration in everything it does. I had the benefit of an amazing mentor (see my book’s acknowledgements [PDF]) for a number of years, and the experience we gained as a team was considerable. Many of the bits of knowledge we had to apply were things that were not available via Internet search, or if they were, lacked so much context that it was worse than useless.

One day, at a dinner with my brother-in-law, we got to talking tech and our work projects in generalities, and he casually mentioned that I should write a book about these performance lessons. This wasn’t the first time the thought had come to me, but this time I actually thought it through for a while. I mulled on it for a few weeks before writing out a simple outline, at which point I knew that I had enough material for a book.

I wasn’t sure how long it would take, or what my coworkers would think, so I kept it to myself and close family.

Doubts

Everyone has self-doubt. I certainly did. Through much of the writing process, I constantly wondered to myself whether I should be the person writing this book. I am not on the CLR team; I work on a specialized project; I’m fairly young compared to the experts I look up to: who am *I* to be writing a book on .NET Performance? Et cetera.

There is a fairly common psychological condition called Imposter Syndrome, which explains these doubts. Basically, most people at one point or another feel like a fraud in their success.

The fact is, I realized, that someone ought to have written a book like I was, and that someone might as well be me. As long as I got good editing, feedback, and technical help, there was no reason I could not produce a very high-quality textbook on .NET performance, despite my worries.

So I pressed ahead.

Another doubt was the scope of the book. My knowledge is heavily weighted towards vanilla servers, pure-.NET. No WCF, ASP.Net, WPF, etc. I am familiar with those, but they’re not my bread-and-butter, so to speak. Would people read a performance book that didn’t cover the trendy buzzwords?

I decided that they would. There was no book that covered the fundamentals of .NET performance engineering in the way I was going to. This was an important niche, and it was actually an advantage to not cover the higher-layer technologies. It gave the book laser-like focus, and allowed it to explain the fundamental techniques in a way that a widely scoped manual would lose. There was no book that explained the fundamental costs and benefits of using .NET from a performance point of view, regardless of code library.

It was like a really good cook book that would teach you fundamental cooking techniques, but the book would not have thousands of recipes. Once you had the techniques, you could apply them on any recipe from other books.

Why Self-Publishing

I probably could have gotten a publisher if I had asked. I had already published a book via Sams, and while it wasn’t a best-seller, I did earn back my advance and a little more. However, royalty rates truly are abysmal. 10% of the wholesale price is fairly typical. Wholesale is usually 50% of the cover price. So for a $40 book, you can expect to earn about $2 per copy, less on an eBook. The advantage of having a publisher is that they will get you into physical stores, which is where I saw most of my sales for my first book. However, that was 5 years ago. The market has changed dramatically in that time. Amazon is the king of book selling, by a LONG shot.

So I went with self-publishing for a number of reasons:

  • I could earn much higher royalties, even if I priced the book much lower.
  • I knew someone who could be technical editor.
  • I have a couple of editors in the family who could help with grammar and style.
  • I had experience and so knew basically what kinds of formatting I would need, what parts of the book I would need–all the technical details.
  • I was willing to do all of the pre-publication grunt work myself.
  • I have an artist in the family who could do a cover for me. Heck, even if I didn’t, I know Photoshop. I could probably whip up something better than most of the bland covers that seem de riguer for programming books these days. Admittedly, my sister did a much better job on the cover than I could have.

Self-publishing has definitely paid off for me. Benefits I’ve noticed:

  • A bigger sense of accomplishment. I didn’t just write the book. I managed it from beginning to end. I made it a business.
  • I feel much more in control of my own product. I can update it at will. I have more direct contact with many readers on Twitter and other places.
  • I’m not going to quit my day job anytime soon, but in the first three months of my book being out, I made more money than in all 4 years my previous book was on the market.

Writing

The hardest part of any large project is just getting started. The first thing I did was flesh out my draft outline to list all of the chapters I thought I would have, then within those chapters, individual topics.

Then I just started writing. I don’t remember if I started with the Introduction or Chapter 1, but I did start more-or-less at the beginning. At the start, I did not concern myself with grammatical correctness, precise descriptions, or even necessarily getting all the details I wanted. I just tried to get all of the ideas onto the page. As I thought of related ideas, I would write them into my outline, or later in the document itself with a TODO prefix and an explanation of what I was thinking.

I pretty much wrote the whole book linearly like this. Not perfectly, of course. I jumped around a bit as necessary. Sometimes, sections belonged in a different order, or in a different chapter entirely. When I got to a part where I knew I needed a code sample, sometimes I just wouldn’t be in the mood to work on code or the debugger, so I would put a TODO in the chapter with an explanation of what the code needed to do. Then I moved onto writing something else.

One of the things I had to decide fairly early on was my writing style. Was it going to be “folksy” or “clinical”? How much like a textbook did I want it to read? My natural style is fairly informal. I started writing the book just like I write a blog entry, but I realized, especially later during editing, that while this is mostly ok, you do need to tighten up the writing a little bit. Some things just require more clarity or a precise explanation.

Another important decision I made early on was to take a stand, be opinionated in what I recommend. So much programming documentation is clinical and does not take a firm stand on what you should do in a given situation. I wanted to make sure my own personal voice rang through the text.

This isn’t just about the language and grammar being used–it applies especially to the content. In a blog entry, if you don’t want to explain a piece of prerequisite knowledge, you can just link to an existing source somewhere on the Internet. Book readers do not want that. If there’s something they need to know, you need to provide it to them (within reason of course–it’s fair to state knowledge assumptions up front).

The increased need for formality also forces you to really get your facts right. There is nothing like teaching others to really get you to learn the material well. This is true for EVERYTHING, but especially if you’re writing a book. There is a huge difference between knowing in your head how something works, and being able to write it down in a coherent way. Of all the chapters in my book that I knew I had to get right, Chapter 2 – Garbage Collection is the one that needed to be the most perfected. I knew garbage collection details pretty well. I have nearly weekly interaction with the people responsible for maintaining and developing it, but it was still surprising how many nit-picky details I needed to tweak after going through the content with the owner of the GC.

I did not have daily or weekly writing goals at first, but as I got towards the end, I did start challenging myself to writing a thousand words per writing session. This can actually be pretty tough when you realize you need to do more research, or write a code sample. Those take time. My only goal was the rough timeline. I knew I wanted to publish in the summer of 2014, so I planned out what had to be done by when, and gave myself about two months for editing and finalization.

Resist the urge to pad your text. This is really easy to do in a programming text. Just throw some more code on the page, add some extra topics, and you can create pages out of thin air! Don’t do this! Yes, find good, relevant content that should be in your book, but only add it with the right motivation. It’s unlikely your outline will contain everything that needs to be in the book—you will certainly forget something. Do research, brainstorming, take a step away from the project, ask your technical editor what’s missing, etc. Add content for good reasons that fit the scope of your book and leave it at that.

Tools

I used Word 2013 for the entire process. This has its pros and cons.

Pros:

  • Advanced formatting abilities. (It’s not perfect for advanced layout requirements, but for a book like mine, it was sufficient.)
  • Great for producing a print-ready PDF.
  • Great collaboration abilities, especially via OneDrive and Office Online.
  • Ability to edit anywhere I was, on any computer.

Cons:

  • eBooks support very limited formatting. In particular, the Amazon Kindle conversion was a nightmare. I had to remove a LOT of formatting. (Tables and multiple fonts per paragraph are the two big ones that come to mind.)
  • When it came time for the EPUB conversion, which is used for every online retailer except Amazon, it required an entire day to clean up the HTML from the conversion process. I had to fix a lot of styles and makes tons of tweaks. To the point where it is now easier to make corrections in multiple documents rather than make them in the master document and then redo the conversions.

I stored the book on OneDrive while I was actively working on it, which allowed me to edit anywhere I was. When it came time to collaborate with editors, this made it very easy to share with others who could add comments directly in the document. I made weekly, sometimes daily, backups to another drive at home, which was further backed up by Carbonite.

For screenshots, I used the built-in Windows utility Snipping Tool. This mostly worked, but you have to keep in mind that print is 300 DPI, while most screens are 96 DPI, which means the images are smaller than you think. In some cases, I used Photoshop Elements to increase the size of and resample small images to make them suitable for printing.

For code samples, I used Visual Studio 2012 Ultimate. I used Subversion on my local Synology NAS to store my code samples. Once the book was done, I also added all graphics, manuscripts, and all other book artifacts as well. The Synology has a publicly accessible DNS that I could use from my laptop away from home.

For eBook editing, I used Calibre, which has both library management and eBook editing tools.

When it came time to upload to online retailers, I first ran EPUBCHECK on the final EPUB files to ensure they passed with 0 errors.

Time Scheduling

Given that I have a full-time job and a family with a small daughter, as well as other commitments (including a musical, where I was playing in the orchestra), finding time to work on the book was a problem. My wife and I instituted a “night off” every week to work on personal projects—after dinner, we have no more responsibilities for the rest of the night. We don’t have to help with dishes, bath time, reading stories, or interacting at all. I utilized my night for working on the book. I say goodnight, kiss and hug my daughter, and then I can isolate myself in the house or go somewhere else. The time was 100% my own. Typically this amounted to about 4 hours. For the rest of the week, I would be lucky to get an hour or two a night. But the 4 hour block really helped–it was enough time that I could focus on big issues.

Towards the end, perhaps the last month and a half, one night a week wasn’t sufficient either. There was so much to do that I had to start taking most of all my Saturdays to work on the book or I would never get done.

In the end, the book took about 10 months, start to end, and that was mostly working nights and weekends. And the book isn’t even that long! Don’t underestimate how much work it is.

Setting

For writing, by far the best place for me to work was somewhere outside the home that had some kind of ambient noise. I worked at the local library, Starbucks, and even McDonald’s a couple of times (they were open later than the others). There is something about the noise level that can help you concentrate more than being at home. If you have to work at home, you can try any number of ambient noise tracks on YouTube.

The times where I did work at home, I usually put in headphones and listened to classical music.

For code and debugging samples, it was much easier to just do that at home where I have two large screens.

Editing

I am lucky. I knew someone who would be a great technical editor. He was on the CLR team before joining my team and he became a great mentor. I asked if he would be my technical editor, and he readily agreed. This meant that my technical content was in good hands. He wouldn’t let anything egregious slip by.

Since I also had a relationship with people on the CLR team, I asked some of them to proofread smaller portions of the book, in particular Chapter 2, which I consider the most important part.

Once I was done with the first draft, I gave the whole thing to my TE. After he had it for a few weeks, we got together in person and walked through a major portion of the manuscript. I came away with about three pages of notes. Everything from minor technical errors, need for a source, a better wording, to a better way of summarizing content in the chapters and at the end of the book. Not all of them were big changes, but some were. A couple involved major restructuring. This was about a month worth of work.

For grammar, style, and formatting, I asked my wife and father to review. My wife is an amazing editor. She has worked at a couple of jobs that required very precise abilities in analyzing documents, so she was the perfect person for this job. I also asked my dad to take a look as well, and between them I got a lot of good feedback.

To allow them to edit, I shared the document with them via OneDrive and asked them to only leave comments—not change the body text. Once I resolved their comment, I deleted it.

One other thing I did was a bit a crazy. I turned on ALL of Word’s grammar suggestions and analyzed the final manuscript. Don’t do this unless you like pain. If you do like pain, then go to Options | Proofing | When correcting spelling and grammar in Word | Settings…, which brings up this window:

image

Turn on the settings you want and prepare to be nitpicked to death. I turned it off after a while, but it was helpful, and helped forestall some comments from my wife.

One of the most common things that was corrected by my editors was the use of contractions. If you look in the final book text, you will find very, very few contractions. Contractions are a very informal way of writing. Fine for a blog entry, but in a book, they do stand out a bit. I only left them in when they sounded better.

There are dozens of nitpicky details you have to take care of. It’s hard to think of them all right now, but I strongly suggest you look at another book in the same genre you’re thinking of writing and analyze all aspects of it, especially the parts that aren’t just paragraphs of text. Things like:

  • Heading Capitalization.
  • Number of heading levels.
  • Structure of lists. Sentences or not? Periods at the end or not? My rule was that each list had to be internally consistent, but not necessarily consistent with other lists. Catching all of these can be hard.
  • Sentences that are too long, too short, oddly phrased, confusing, etc.
  • Colloquialisms.

A lot of this stuff you will be absolutely blind to. You MUST have another person help you find a lot of it. It’s also helpful if that person is familiar with the genre or is at least well-read and understands the language very well. Don’t ask your buddy who reads < 1 book a year to proofread.

Formatting

If you’re writing a book that is guaranteed to never be printed, then your primary concern with formatting should be how little of it you can get away with. Use one or two fonts. Use as few header styles as possible. There are plenty of formatting guides out there. One that I read and was very helpful was Mark Coker’s Smashwords Style Guide.

Right up until the first week of publication, I was planning on keeping this eBook-only. As soon as I told people it was available, though, some people asked about a print edition, and I realized I would need to do that.

Formatting for print is significantly more work. I wanted a number of things:

  • Bold, obvious styles for chapter titles, section headers, and more. Some of this includes underlining.
  • Graphics with captions throughout the book.
  • Different font and background shading for code samples.
  • Different font for code elements cited in paragraphs.
  • Appropriate white space after tables, code samples, section titles.
  • Appropriate white space before section titles.
  • Special callouts and borders for anecdotes and tips.
  • Page numbers, with alternating sides for even/odd.
  • Page headers consisting of the chapter number and title.
  • Chapters always start on the right side (I had originally planned on some graphics that bled to the edge, but this would require a more expensive printing option, so I abandoned it)

Word was great for all of this. If you’re not familiar with these features, it can take a bit to make it work, and it probably doesn’t have the power of a true desktop publishing application, but it worked great for me.

There are a lot more things I could have done. Take a look at another programming book and check out the advanced formatting and features they include. Most of it is unnecessary, but it can add a certain flair.

Cover Design

I knew almost from the beginning that I wanted the cover to have gears on it (see the introduction to the book for an explanation why), and have a very distinctive look. Most programming books are very bland. I wanted mine to stand out and still be professional-looking.

I am fortunate to have an artist sister, Claire Watson, who was willing to do some Photoshop work for me. She has a great eye for design, and after I told her my general ideas, gave her some sample stock photos, she produced no less than 30 variations of multiple cover designs. The one I ultimately chose was a variation of her very first design, and I love it.

You can find artwork and jewelry by my sister Claire at http://www.bluekittycreations.co.uk/.

Code Samples

I used Visual Studio 2012 Ultimate for the code samples. I could have used the 2013 version, but decided 2012 was a safer bet for minimum hassle for the majority of readers.

There were a few challenges with code samples:

  • Keeping them extremely short while still demonstrating the point I’m trying to convey.
  • Line length, especially on an eBook is an issue. Small screens and limited resolution play havoc with code formatting. If you’re on a really small or old device like an iPhone or original Kindle, then sorry, I did not bother trying to format for your device—it doesn’t even make sense at that point. I optimized for devices like the Kindle Fire and better.

Moonlighting Policy

Before I completed the book, I checked and double-checked the current moonlighting policy of Microsoft, which does allow me to write and publish books without prior approval. The only thing that can get me in trouble is revealing proprietary information, and I was very careful about this, especially when talking about .NET and GC internals (I had the GC owner review the whole chapter, and she did ask me to remove some small details that are prone to change in the future). So no worries here about publishing.

Prepping for Publish

If you thought you were done when the manuscript was complete, edited, and finalized, then you are in for a surprise. Depending on how complicated your manuscript is, you still have days worth of work ahead of you, so plan accordingly.

At some point I set a hard deadline for myself: July 12. That was the day it was going to go live on at least Amazon.com. The weeks before that, I basically worked every night and all weekend to complete as much of the final edits that I could and get final feedback from my editors, both technical and grammar. Then I took off work on Thursday and Friday to work on the eBook conversion. I knew this was going to be a chore, but I had no idea how grueling it would be. I understand now the value that professional publishers, editors, typesetters, etc. have. I decided to do the Amazon Kindle conversion first. For this, I just needed to upload the Word document to Amazon and their system would check it and provide a list of errors to me. I could also proofread it on various Kindle simulators, which would attempt to show me what it would look like. This was very helpful.

The single biggest problem from this process was that paragraphs with different fonts in them did not render properly. I used different font styles for code keywords within body text and this was throwing off the whole system. So I had to go fix a bunch of styles in a copy of my manuscript. I didn’t want to modify the original document because I knew that I would want those different styles for EPUB and print.

I also quickly realized that tables, despite being supported in all eBook formats will not work except for the most trivial of tables. It just doesn’t look good. This was too bad, because I had spent quite a while converting some bullets to tables. I spent a few hours converting back.

It was invaluable going through the book on the various Kindle simulators. I got to see a lot of things that just didn’t work on a small screen. It’s always a challenge putting a well-formatted book into an eBook, especially one with tons of code samples. Some Kindles displayed the shaded backgrounds, others didn’t. None will use different fonts, so you don’t get the benefits of uniform spacing for code samples. All of this I just let slide–I figure if you’re reading a coding book on a Kindle, you kind of know what you’re getting. I didn’t even bother proofreading (beyond curiosity) on the older Kindles or iPhone Kindle App. You deserve what you get if you try it. Sorry.

By Friday night, I hit the Publish button on Amazon KDP. A few hours later, it was up for sale.

On Saturday, I went through the EPUB conversion process. This went a bit faster than the Kindle process, but was still a little bit labor intensive. I used Calibre to convert the Word document to EPUB format, which is really just a ZIP file containing HTML, styles, and images. Then I used in the Calibre eBook editor to modify the HTML as necessary to fix up styles and formatting inconsistencies. This took a few hours and required a bit of hand-editing of styles and individual locations throughout each chapter.

Once all of that was done, the EPUB could be uploaded to all of the retailers besides Amazon.

By the end of Saturday, I was published on all major eBook platforms.

On Monday when I announced it to my work colleagues, a bunch of people said they wanted a print edition, which up until this point, I was not sure I was going to do. But given that response, I instantly realized that a dead tree version of a programming book is still in-demand. It’s the code samples—some people just prefer a physical book for code.

So I spent the next week getting it ready for publication via CreateSpace. Basically, this was just ensuring that all of the formatting I wanted (see above) was present and correct. My first proof copy actually had some bad fonts for most of the code samples–I had selected Courier New, and it just didn’t look as good as Consolas, so that was the major fix before publication.

I ordered a single proof copy, made the font change (and a few other minor fixes as well), and then hit the publish button. Within a day or so it was on Amazon for sale. A couple of days after that, it was matched up with the existing Kindle book so people could easily see both editions. (The matching process is normally automatic, but if it doesn’t happen for you, you can contact KDP support and let them know the ISBN/ASIN numbers and they can match them up for you.)

A note on DRM (Digital Rights Management): Avoid it. Unfortunately, your book is going to be hacked, stolen, put on Pirate Bay, whatever, anyway. DRM only punishes the law-abiding consumer. Just don’t, and make peace with the fact that people will steal things. If you sell to organizations, then it is fair to ask them to pay for multiple copies for multiple people, but don’t try to enforce this. It’s a losing battle and isn’t going to help your bottom line much anyway.

Pricing

By the end of Friday, I had the Kindle ready to go. I just had to decide on markets and pricing. I went with $9.99 because that’s the maximum price you’re allowed to do on KDP and still get a 70% royalty rate. This was actually disappointing to me. For me, the sweet spot in pricing would have been around $15. This is competitive with existing programming eBooks while being slightly cheaper. I had a fear that if I priced it too low, I wouldn’t have credibility compared to more expensive offerings. I don’t worry about this now. The book has taken off and gotten enough reviews that it can stand on its own, and the price now only helps.

For the print edition, I set the price much more in line with current offerings (slightly cheaper, of course). I also made EU prices similar to other books in those markets, but I have since changed them to be tied to US price and the exchange rate. This makes my book VERY competitive in those markets, and it has definitely netted me more sales.

Marketing

I tried to have a comprehensive marketing plan, but I don’t know how well I followed it. There were so many things to do that a lot of things just fell through the cracks for weeks. However, when you self-publish, time doesn’t really matter. Yes, more sales in a short period of time can lead to follow-on sales as the book will get featured in some places, especially on Amazon, but in the long run, you can try lots of things and let your book’s demand naturally grow.

Here is a short summary of what I did:

  • Reddit – I had no idea how much good this would do for me, but it really did. I didn’t just post a link to my book and beg people to buy it—that’s guaranteed to fail on any social media site, but especially on Reddit. Instead, I discussed my own background and motivation for the book. Just those facts contained enough interesting tidbits that people wanted to know more. My biggest single-day spike in sales came from that thread. Plus it led to a lot of interesting discussion. The point here is that I tried to provide value external to my book.
  • Twitter – I was never a big Twitter user, and I’m still not, but I decided to make some efforts and reach out to people, participate in discussions, comment on tech news, share personal tidbits, and more. Yes, I plug my book, especially retweeting other people’s comments, but I don’t make my Twitter feed exclusively book plugs—that’s annoying. Remember, try to provide value.
  • Free Samples – The book’s site contains a PDF of the book up through Chapter 1. People can see the full Table of Contents and get a feel for what’s in the book, without me giving it all away. I believe a free sample is critical. I even went further, releasing the entirety of Chapter 5 (Class Design and General Coding) as an article on CodeProject. It won Best C# Article for August and is ranked as one of the Top 5 articles posted on the site.
  • More articles on CodeProject, such as this one about object allocation. That article starts with something that’s in the book, but I provide a lot more information and examples that are not in the book. This article won Best C# Article of September. (Sadly, I didn’t have an article for October, but I’ll work on another one soon.)
  • Letters to influential bloggers and podcasters. I knew this was a long shot. They probably get product and media pitches all the time, right? But a few did respond. Some written reviews will hopefully show up on some influential blogs. I was a guest on the fabulous .NET podcast, .NET Rocks in episode 1041. My book was also mentioned in This Week on Channel 9.
  • Facebook Page – an easy way for people to follow updates on the book, reviews, articles, blog entries. It’s not a super busy page, but it’s a useful central place for making announcements.
  • Ads – I tried advertisements on Google, Bing, Facebook, and Twitter, with very limited effect. I mostly tried out of curiosity, and shut down these experiments fairly quickly after it was clear they weren’t doing much.
  • Letters to family and friends – I sent an email about my book to literally everyone in my address book, not asking them to buy it, but to forward to programmer friends or colleagues of theirs.
  • Ask for reviews – I’m not shy about it. Anybody who I know read it and loved it, I will ask them to leave a review on Amazon. These make a difference. I hear from a lot of people on Twitter, and I’ll give them a few weeks or so, and then ask them to write a review. I never tell people to leave a five star review or give any other guidance. More reviews the better.
  • Book’s website – Where to get all information about the book, a complete retail location listing, a place to buy the PDF or EPUB directly from me, errata page, download the book’s source code samples, and more. This site is important.
  • Posters – I printed 50 copies of a small poster of my book and put it up on every floor of my building and two others near me. I’m not sure what effect this had. Probably not worth it.
  • GoodReads – I made sure my book was in GoodReads, with the right cover, editions, descriptions, etc. You can add all of this yourself.
  • Blogging – I started blogging again, before the book came out. This provides another outlet for providing extra value as well as places to mention the book.

Remember to go beyond just “Please buy my book” – Always Be Providing Value.

In the end, the buzz from the podcasts, and other online reviews that have popped up have definitely helped, but most people are finding this book just through searching Amazon. My book comes up when you search for “.NET Performance” and that’s probably the most effective thing for lasting sales performance.

Sales Results

I put my book up for sale literally everywhere I could: Kindle, CreateSpace (for print), Kobo, Nook, Smashwords, Google Play, Apple iBooks, and my own site in EPUB and PDF. Smashwords, in turn, distributes to other retailers. I wanted it available everywhere, and it mostly is. Through CreateSpace’s extended distribution network, it even goes to some physical stores I believe–I know I’ve seen some fairly decent-sized orders for that network.

But looking at the numbers, it’s interesting just how much Amazon dominates the field here. Here are the percentages:

image

If you add Kindle and Print (both primarily on Amazon), that’s nearly 90% of all my sales coming from Amazon. The next-best thing is the PDF from the book’s site. You might be tempted then to only publish on Amazon, but I think this is a mistake in the long run. Even though the percentages are small, I do get sales from the other channels. There is a psychological benefit as well—many people are happy just to see that it’s available everywhere, even if they buy from one of the usual places.

Also, don’t forget the international market. Nearly all of my sales via my own website are from overseas customers who can’t easily use one of the big eBook retailers. By providing an EPUB and PDF to them, they can use the document however they need.

Contact

If you have any questions about this whole process, please just let me know either in the comments or on Twitter.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Book Review: High-Performance Windows Store Apps

I recently read Brian Rasmussen’s book High-Performance Windows Store Apps, and I think it’s an excellent companion to my own Writing High-Performance .NET Code.

It’s a good companion because while my book is all about the nitty-gritty details of .NET and how they effect performance, I don’t go too much into things at the UI layer. My perspective is much more systems and servers based, while Brian is coming at this from a UI/XAML focus. Brian covers a bunch of topics in a very good way.

The title specifically mentions “Windows Store” apps, but the principles and techniques described in here apply to any desktop/UI application, not just the types you see typically on tablets or phones. It’s worthwhile even if you develop just desktop apps.

Particular things I liked:

  • Chapter 3 – Designing for Performance – It presents a number of interesting performance scenarios, many of which aren’t obvious if you’re new to the new world of multi-device, cloud-connected apps we’re in. The discussion on resource management and prioritization was good too.
  • Chapter 4 – Instrumentation – A very good walkthrough of Event Tracing for Windows, which is what every developer needs to know these days, for tracking app events for performance, debugging, or just plain logging. Lots of good stuff here, and critical for all performance analyses these days.
  • Chapter 5 – Performance testing – Lots of good stuff in here I didn’t know about, particularly around building an automated performance testing environment for UI apps. Automation is critical, and this gives some good examples to get you on your way.

The book used the Windows Performance Recorder for most of its examples. It’s an excellent tutorial on how to make this tool useful to you, without getting bogged down into the complexities.

Overall, highly recommended. There are not very many guides on how to do performance engineering for XAML apps, and Brian writes a great one.

You can follow Brian on Twitter @kodehoved. Check out High-Performance Windows Store Apps at Amazon.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Using MemoryStream to Wrap Existing Buffers: Gotchas and Tips

A MemoryStream is a useful thing to have around when you want to process an array of bytes as a stream, but there are a few gotchas you need to be aware of, and some alternatives that may be better in a few cases. In Writing High-Performance .NET Code, I mention some situations where you may want to use pooled buffers, but in this article I will talk specifically about using MemoryStream specifically to wrap existing buffers to avoid additional memory allocations.

There are essentially two ways to use a MemoryStream:

  1. MemoryStream creates and manages a resizable buffer for you. You can write to and read from it however you want.
  2. MemoryStream wraps an existing buffer. You can choose how the underlying buffer is treated.

Let’s look at the constructors of MemoryStream and how they lead to one of those situations.

  • MemoryStream() – The default constructor. MemoryStream owns the buffer and resizes it as needed. The initial capacity (buffer size) is 0.
  • MemoryStream(int capacity) – Same as default, but initial capacity is what you pass in.
  • MemoryStream(byte[] buffer)MemoryStream wraps the given buffer. You can write to it, but not change the size—basically, this buffer is all the space you will have. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, bool writable)MemoryStream wraps the given buffer, but you you can choose whether to make the stream writable at all. You could make it a pure read-only stream. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count) – Wraps an existing buffer, allowing writes, but allows you to specify an offset (aka origin) into the buffer that the stream will consider position 0. It also allows you to specify how many bytes to use after that origin as part of the stream. The stream is read-only. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count, bool writable) – Same as previous, but you can choose whether the stream is read-only. The buffer is still not resizable, and you cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count, bool writable, bool exposable)– Same as previous, but now you can specify whether the buffer should be exposed via GetBuffer(). This is the ultimate control you’re given here, but using it comes with an unfortunate catch, which we’ll see later.

Stream-Managed Buffer

If MemoryStream is allowed to manage the buffer size itself (the first two constructors above), then understanding the growth algorithm is important. The algorithm as currently coded looks like this:

  1. If requested buffer size is less than the current size, do nothing.
  2. If requested buffer size is less than 256 bytes, set new size to 256 bytes.
  3. If requested buffer size is less than twice the current buffer size, set the new size to twice the current size.
  4. Otherwise set capacity to exactly what was requested.

Essentially, if you’re not careful, you will start doubling the amount of memory you’re using, which may be overkill for some situations.

Wrapping an Existing Buffer

You would want to wrap an existing buffer in any situation where you have an existing array of bytes and don’t want to needlessly copy them, causing more memory allocations and thus a higher rate of garbage collections.  For example, you’ve read a bunch of bytes from the wire via HTTP, you’ve got an existing buffer. You need to pass those bytes to a parser that expects a Stream. So far, so good.

However, there is a gotcha here. To illustrate, let’s see a hypothetical example.

You pull some bytes off the wire. The first few bytes are a standard header that all input has, and can be ignored. For the actual content, you have a parser that expects a Stream. Within that stream, suppose there are a subsections of data that have their own parsers. In other words, you have a buffer that looks something like this:

image

To deal with this, you wrap a MemoryStream around the content section, like this:

// comes from network

byte[] buffer = …

// Content starts at byte 24

MemoryStream ms = new MemoryStream(buffer, 24, buffer.Length – 24, writable:false, publiclyVisible:true);

So far so good, but what if the parser for the sub-section really needs to operate on the raw bytes instead of as a Stream? This should be possible, right? After all, publiclyVisible was set to true, so you can call GetBuffer(), which returns the original buffer. There’s just one (major) problem: You don’t know where you are in that buffer.

This may sound like a contrived situation, but it’s completely possible and I’ve run into it multiple times.

See, when you wrapped that buffer and told MemoryStream to consider byte 24 as the start, it set a private field called origin to 24. If you set the stream’s Position to 24, the index into the array is set to 24. That’s position 0. Unfortunately, MemoryStream will not surface the origin to you. You can’t even deduce it from other properties like Capacity, Length, and Position. The origin just disappears, which means that the buffer you get back from GetBuffer() is useless. I consider this a bug in the .NET Framework—why have the ability to retrieve the original buffer if you don’t surface the offset being used? It may be that it would be more confusing, with an additional property that many people won’t understand.

There are a few ways you can choose to handle this:

  1. Derive a child class from MemoryStream that doesn’t do anything except mimic the underlying constructors and store this offset itself and make it available via a property.
  2. Pass down the offset separately. Works, but feels very annoying.
  3. Instead of using MemoryStream, use an ArraySegment<byte> and wrap that (or parts of it) as needed in a MemoryStream.

Probably the cleanest option is #1. If you can’t do that, convert the deserialization methods to take ArraySegment<byte>, which wraps a buffer, an offset, and a length into a cheap struct, which you can then pass to all deserialization functions. If you need a Stream, you can easily construct it from the segment:

byte[] buffer = …

var segment = new ArraySegment<byte>(buffer, 24, buffer.Length – 24);

var stream = new MemoryStream(segment.Array, segment.Offset, segment.Count);

Essentially, make your master buffer source the ArraySegment, not the Stream.

If you find this kind of tip useful, you will love my book Writing High-Performance .NET Code, which will teach you hundreds of optimizations like this as well as the fundamentals of .NET performance engineering.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

50 Reasons You Should Be Using Bing

I frequently get asked by family, friends, and acquaintances why they should use Bing over our competitors. This post is a comprehensive answer to that question, as much as it can be.

Note that I’m hardly an unbiased observer. I work for Bing, but I work deep in the layers that do the bulk of query serving, not on user-facing features. If you want a technical peek at the kind of work I typically do, you can read my book about Writing High-Performance .NET Code.

This post isn’t about that. It’s about all of the things I love about Bing. I haven’t been asked to write this. I’m doing it completely on my own, without the knowledge of people whose job it typically is to write about this kind of stuff. But I don’t see many blogs talking about these things consistently, or organizing them into a coherent list. My hope is that this will be that list, updated over time.

So standard disclaimer applies: This blog post is my opinion and may not represent those of my employer.

Second disclaimer: I will not claim that all of the things I list are unique to Bing. Some are, some aren’t, but taken together, they add up to an impressive whole, that I believe is better overall.

Third disclaimer: new features come online all the time, and sometimes features disappear. Tweaks are always being made. Some of the described features may work differently in your market, or may change in the future.

To try these examples out for yourself, you can click on most of the images below to take you to the actual results page on Bing.com.

On to the list:

1. Search Engine Result Quality

Bing’s results are every bit as relevant as Google’s and often more so. While relevance is a science that does have objective measures, I do not know of any publically available reports that compare Bing and Google with any degree of scientific precision. However, reputable sources such as SearchEngineLand have weighed in and found Bing superior in many areas.

Not too surprisingly, there was not a massive disparity in the results of my little test. In fact, Bing came out on top. Some queries performed very differently than others, for example, Bing was able to tell that my query for “Attorney Tom Brady”, was looking for an attorney and not the pictures of the hunky Patriots quarterback served up by Google.

Bing also did well with date nuances, unlike Google…

For the layperson, the relevance of a search engine is often personal and subjective. I enthusiastically recommend the BingItOn challenge that allows you to perform brand-blind search comparisons.

image

Bing certainly did not start out on an equal footing. Even when I started working at Bing over 6 years ago (before it was Bing), I would sometimes have to jump to Google to find something a little less obvious. The only reason I visit Google now is to verify the appearance of my blog and websites in their index.

2. The Home Page Photo

This was the killer feature that “launched” Bing in the minds of many people. Not just any photo, but exceptional works of art from Getty, 500px, and more. (At one time, photo submissions were accepted from Bing employees. I should see if that’s still possible…)

image

These photos showcase the beauty of our world and culture. Localized versions of Bing.com often have different photos on the same day, giving a meaningful interaction to users all over the world.

You can also view pictures from previous days.

3. Dive Into the Photo to Explore the World

In the bottom-right of the photo, there is an Info button that will take you to a search result page with more information about the subject of the photo.

image

In addition, as you move the mouse over the image, four different “hotspots,” marked with semi-transparent squares are highlighted. Clicking on them takes you to search results pages with more information about related topics, such as the location, similar topics, and more.

image

4. Use the Photo as your Windows Desktop Wallpaper

image

Click on the Download button in the lower-right corner to download the image to your computer for use as your wallpaper. A light Bing watermark will be embedded.

5. See Every Amazing Photo from the last 5 Years

Just visit the Bing Homepage Gallery and view every photo in one list, or filter them to try to find your favorites.

image

6. Bing Desktop Brings it all to your Windows Desktop

Bing Desktop takes that gorgeous photo and automatically applies it as your Windows Desktop wallpaper image each day. From the floating toolbar or the task bar, it provides instant access to searching Bing or your entire computer, including inside documents. Bing Desktop also shows you feeds of news, photos, videos, weather, and your Facebook news feed. All of this is configurable. You could use it just to change the wallpaper if you like.

image

7. More Attractive Results Page

Beauty is in the eye of the beholder, but I don’t know if anyone can look at Google’s search results page and claim they’re attractive, especially when laid next to Bing’s. It’s true for nearly all results, but especially true for product searches, technical queries, and many other structured-data type results.

Here is one for a camera. Bing breaks out a bunch of information and review sources on the right, which is already a great head start, but even in the normal web results in the main part of the page, Bing has an edge. The titles, links, and subheadings with ratings all look better.

image

image

Try it out on a bunch of different types of queries. Bing just looks better, in addition to giving great results.

8. Freedom to Innovate

Bing tries a lot of experiments, throws a lot of new features out there to see what works and what doesn’t. We have that freedom to try all sorts of new things that more established players may not enjoy. Bing can change its look and functionality drastically over time to attract new types of users. You will see a lot of new things on Bing if you start paying attention. As a challenger, we are less beholden to advertisers then other search engines.

9. News Carousel

Along the home page photo is a carousel of topics, including top news (customizable to topics you are interested in), weather, trending searches, and more. This list is related to your Cortana entries as well (more on Cortana later).

image

You can collapse this bar in two stages. The first click will remove pictures and reduce the number of headlines.

image

A second click will remove all traces of it, other than the button bringing it back.

image

10. Image Search

Bing’s image search functionality is unparalleled. It presents related searches as well as a ton of filtering mechanisms in an attractive, compact grid that maximizes the screen real estate so you can find what you need faster. Bing also removes duplicates from this list so it doesn’t end up being just a long selection of the exact same image.

image

The filtering mechanism even includes niceties like license type:

image

If you’re searching for people, then you can filter by pose as well.

image

Clicking on an image brings up a pop-up with the larger version of the image. This screen allows you to view the image directly, visit the page it came from, find similar images, Pin it to Pinterest, among other things. Along the bottom is a carousel of the original image search results.

image

11. Video Search

Like image search, video search is far better than the competition. Holding the mouse over a thumbnail results in a playing preview with sound.

image

Clicking on a video brings up a larger preview version with a similar layout as the image search.

Like images, the list of videos also has duplicates removed. While YouTube may command the lion’s share of video these days, Bing will show videos from all over the web.

12. It Does Math For You

Yeah, it will do 4+4 for you, and then bring up an interactive calculator:

image

Big. Deal.

However, it can do advanced math too! It can also solve quadratics and other equations:

image

Handles imaginary numbers to boot. Ok, that’s pretty cool. Does the competition do this? No.

13. Unit Conversion

Yes, it will convert all sorts of lengths, areas, volumes, temperatures, and more, even ridiculous ones:

image

14. Currency Conversion

When I look at my book royalties in Amazon, it displays them in the native currency rather than what I actually care about: good ol’ USD. Bing to the constant rescue:

image

More natural phrasing also works, such as “100 euros in dollars”

15. Finds the Cheapest Flights For You

Searching for: “seattle to los angeles flights” yields this little widget:

image

Update your details and click Find flights and you get:

image

16. Get At-a-Glance University Information

The most useful information about universities is displayed for you right on the results page, including the mailing address, national ranking, enrollment, acceptance rates, tuition, and more, with links to more information.

image

17. Find Online Courses

This is one of the coolest features. Notice the online courses list in the previous image? Those are free, online courses offered by the school. Clicking on them takes you directly the course page where you can sign up.

18. Great Personality and Celebrity Results

You can get a great summary and portal to more information for many, many people. It’s not just limited to the usual actors and recording artists. Atheletes, authors, and more are included too, such as one of my favorite authors, the late, great Robert Jordan (AKA James Oliver Rigney, Jr.):

image

For singers, you can sample some of their tracks right from Bing:

image

Clicking on a song does another search in Bing which gives more information about the song itself, including lyrics, and links to retailers to purchase the song.

19. Song and Lyric Information

Searching for a song title will give you a sidebar similar to that for artists:

image

Searching for lyrics specifically will show those as well:

image

20. Great for Finding “Normals” Too

I don’t have the most popular name (at least in the U.S. – I suspect it’s more common in the U.K.), but if I do a search for it, I get this:

image

That’s…not me.

But if I qualify my name with my job, “Ben Watson Microsoft Bing”, then I get something about me, admittedly not much (hey, I’m not that famous!):

image

Clicking on my name brings up results including this blog and LinkedIn profile.

21. Find Product Buying Guides

This is probably one of the most popular types of searches. I research things both small and large and while there isn’t always a card for each item, often there is.

Try the search “vacuum cleaner recommendations”:

image

Or better, try “Windows Phone reviews” and you get a similar sidebar:

image

If you click on a manufacturer, it brings up a carousel for phones of that brand:

image

Click on those in turn bring up web search results for each one.

22. The Best Search Engine for Programmers

I have so far abstained from showing many Bing vs. Google head-to-head, but I just have to for this. The query is “GC.WaitForPendingFinalizers”, a .NET Framework method.

Here are the first two results of Bing on the left, compared to Google on the right:

image

Bing has a much more attractive and useful layout, links to various .NET versions, and a code sample! For the StackOverflow result, it shows related questions grouped under the most relevant question it found.

23. Time Around the World

My family is spread out throughout the world, from all over the US to Europe. I can generally figure out what time it is on the East coast, but what my family in Arizona, where they don’t have Daily Savings Time—are they in the same time zone as me right now or not? What time is it in Sweden?

image

24. Package Tracking

Just copy & paste the tracking number from your product order into Bing, and you’ve got a link directly to the carrier’s tracking page:

image_thumb39

(The tracking number in that screen shot has been censored by me, and there is no link to a live results page.)

25. Search Within the Site, from Bing

Many web sites have search functions built-in to them. You can take advantage of these directly from Bing. For example, search for the popular book social media site Goodreads.com:

image90

If you then type something into that search back and click “Search” you will be taken to a results page on that web-site directly. Not a huge feature, but it saves you a few clicks.

26. Recipes

You can get top-rated recipes directly in Bing, with enough of a preview to know if you want to read more details.

Here’s a search for “chicken parmesan”:

image96

27. Nutritional Information

Highlighted recipes will have nutrition information, but so will plain foods, such as “pork tenderloin”:

image1011

28. Local Searches

The go-to query for this is “pizza”, but around here pho is nearly as important (to me, anyway). If you search for “pho Redmond” you get a carousel showing the top restaurants. This carousel interacts with the map of the local area:

image112

There is an alternate format that shows up for things without pictures, such as piano stores:

image113

But some topics just have so much about them. Revisiting “pizza”, these results will include the restaurant carousel, nutrition facts, a map, images, and more.

29. City Information

If you do a search for a city, you will get images, some top links for tourist information, and a sidebar containing a map, facts, the current weather, points of interest, and even live webcams!

I lived in Rome for a year, and loved it. There is enough there to fill a month of sight-seeing and still not cover nearly everything. Bing conveys a bit of that:

image118

30. Advanced Search Keywords

Sure, Bing tries to guess what you mean just from the words and phrases you type, but there are ambiguous scenarios that require some more finesse. The ones I use more often are “” (quotes), + (must have), and – (must not have). For example:

“Ben Watson” +Bing –football

Indicates that the phrases “Ben Watson” should be included, Bing must be included, and football must NOT be present.

Start with these advanced search options, and then move on to some more advanced keywords that give you even more power, like:

.NET ext:docx

Will find documents ending with the docx extension containing the word .NET.

or site: which restricts results to pages from a specific site, or language: to specify a particular language.

31. Bing Maps and Bird’s Eye View

Bing Maps by itself is great. It provides all the standard features you expect: directions, live traffic, accident notification, satellite imagery, and more.

But the really cool thing is Bird’s Eye View, which offers an up-close, detailed, isomorphic view of an area. Check out this shot:

image5[1]

You’ll get that view automatically if you keep zooming in, but you can switch to a standard aerial view as well.

32. Mall Maps

Search for a mall near you and see if it has this information. Here’s one from Tysons Corner Center in Virginia:

image15

Now click on Mall Map, which will take you to Bing Maps, but show you a map of the inside of the mall!

image201

You can even switch levels:

image_thumb12

33. Airport Maps

The same inside map feature exists for airports too. Here’s a detail of SeaTac:

image5

You can see individual carrier counters, escalators, kiosks, stores, and more.

34. Venue Maps

I think you get the idea…

image10

35. Answers Demographic Questions

image52

 

36. Awesome Special Pages

Did you see the Halloween Bing page from 2013? No? I could not find a way to access it now, but someone did put a video of it up:

Bing’s 2013 Halloween home page was interactive

37. Predictions

Bing analyzes historical trends, expert analysis, and popular opinion to predict the outcomes of all sorts of events, including a near-perfect record for the World Cup. Bing can also predict the outcomes of NFL games, Dancing with the Stars, and more.

image17[2]

image22[2]

To see all of the topics Bing can predict (with more coming soon), head over to http://www.bing.com/explore/predicts.

38. Election Results and Predictions

Bing has had real-time election results for a while, but new this time are predictions. Check it out at http://www.bing.com/elections (or just search for elections). It breaks down results and predictions state-by-state, showing elections for the House, Senate, and Governorship.

Bing Elections

39. Find Seats to Local Events

Go to http://www.bing.com/events to find a list of major events in your area. You can filter by type of event (Music, Sports, etc.), city, date, and distance.

You can even submit your own events!

image27[1][2]

40. Language Translation

Bing can translate text for you. For example, I typed the query “translate thank you to italian” and it resulted in:

image57[1][2]

You can also go to http://www.bing.com/translator for more control over what you want translated.

Fun fact: Translation on Facebook is done via Bing:

image_thumb21_thumb

41. Provide feedback on the current query

On the result page, at the bottom, you can provide Feedback about the current query.

image_thumb55_thumb

This goes into a database that gets analyzed to suggest improvements for that query, or the system as a whole.

42. Links to Libraries

When you search for a book, you get a great sidebar, similar to that for songs or artists. Besides the usual links to buy the book, you also get a link to your local library to borrow an eBook version.

snowcrashOther interesting tidbits it gives you are the reading level and other books in the series.

43. Integration Into Windows

Performing a search on your Windows 8 computer shows results from your local computer as well as Bing, all in a seamless, integrated interface. It’s particularly effective on a Surface, with the touch interface.

44. Bing From Xbox One

You can use Bing without a keyboard. Without a mouse. Without a controller. Just your voice! If you have an Xbox One, you can do searches using voice commands, with natural, plain language, e.g., “Show me comedies starring Leslie Nielsen.”

Check out some examples and try them for yourself.

45. Bing is in Office

Bing is integrated into Microsoft Office. You can add Apps into Office that utilize Bing. You do this from the Insert tab on the Ribbon, under My Apps:

image_thumb69_thumb

You can also insert images into your document, directly from Bing (via Insert | Online Pictures:

image_thumb71_thumb

46. Cortana

http://ts4.mm.bing.net/th?id=HN.608053303690136121&pid=1.7

Cortana is awesome. You can have her record notes for you, remind you to do something at a specific time, search for something, even tell you a corny joke or sing a song.

The best part is that Cortana is powered by Bing and learns from your interests.

If you have Windows Phone, then you should definitely take some time to learn about how you can interact with Cortana.

(Sidebar: Do you wonder who Cortana really is?)

47. Bing Rewards

You can get points for each query you do, for specific tasks, for inviting friends, and more. With the points, you can redeem small prizes. I usually get enough Amazon gift certificates to get something decent every few months.

Go to http://www.bing.com/rewards to sign up, view your status, or redeem the points. Some of the gifts:

  • Xbox Live Gold Membership
  • Xbox Music Pass
  • Amazon gift cards
  • Windows Store gift cards
  • Skype credits,
  • OneDrive storage
  • Ad-free Outlook.com
  • Flowers
  • Restaurants
  • Movie tickets

By itself, the program isn’t going to change your life significantly, but it’s a nice little perk.

48. Bing is the Portal for your Microsoft Services

Above the photo, there is a bar with links to the most popular Microsoft destinations, including MSN, Outlook.Com, and Office Online.

image137[2]

49. High-Quality Partners

Microsoft has partnerships with many, many companies to ingest structured data from all over the world in many contexts. Some of the most obvious are Yelp and Trip Advisor. We also have partnerships with Twitter, Apple, Facebook, and many more.

50. Bing is the Portal to Your Day and the World

Yes, Bing is a search engine, and a GREAT one at that, but as demonstrated throughout this entire article, it does a lot more. By organizing and presenting the information in an attractive format, it aspires to be a lot more than just 10 blue links. You can learn and grow, find your information faster, explore related topics, answer your questions, and solve more problems. Bing is for people who want to experience the world.

It looks better, it performs better, it is awesome.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Appearance on .NET Rocks! Podcast

Carl and Richard put together a great podcast. .NET Rocks! has existed for years now and it’s amazing how many episodes they’ve published.

A couple of weeks ago, I had the privilege of recording their latest episode with them, #1041. We talked about a ton of interesting things like the importance of memory management, precise measurement, using the correct tools, not being afraid of the debugger, a little bit about Microsoft culture, and even LEGO!

Here is their description:

Carl and Richard talk to Ben Watson about his work around writing high performance .NET code. Ben talks about how the Bing team decided to use .NET code internally, which seems like an obvious choice for a Microsoft group, but it isn’t really – when milliseconds count, does .NET makes sense? Ben says it does, and he’s done the work to prove it. Ben’s book “Writing High Performance .NET Code” focuses not only on coding techniques, but also the larger practice of having a deep understanding of how .NET works, and the processes that take place to turn .NET code into machine code. The conversation also digs deeply into the need for performance measurement, especially Event Tracing for Windows. .NET can be fast when you do it right!

Give a listen. Subscribe in iTunes or listen on the web. Let me know what you think!


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Digging Into .NET Object Allocation Fundamentals

[Note: this article also appeared on CodeProject]

Introduction

While understanding garbage collection fundamentals is vital to working with .NET, it is also important to understand how object allocation works. It shows you just how simple and performant it is, especially compared to the potentially blocking nature of native heap allocations. In a large, native, multi-threaded application, heap allocations can be major performance bottleneck which requires you to perform all sorts of custom heap management techniques. It’s also harder to measure when this is happening because many of those details are hidden behind the OS’s allocation APIs. More importantly, understanding this will give you clues to how you can mess up and make object allocation far less efficient.

In this article, I want to go through an example taken from Chapter 2 of Writing High-Performance .NET Code and then take it further with some additional examples that weren’t covered in the book.

Viewing Object Allocation in a Debugger

Let’s start with a simple object definition: completely empty.

class MyObject 
{
}

static void Main(string[] args)
{
    var x = new MyObject();
}

In order to examine what happens during allocation, we need to use a “real” debugger, like Windbg. Don’t be afraid of this. If you need a quick primer on how to get started, look at the free sample chapter on this page, which will get you up and running in no time. It’s not nearly as bad you think.

Build the above program in Release mode for x86 (you can do x64 if you’d like, but the samples below are x86).

In Windbg, follow these steps to start and debug the program:

  1. Ctrl+E to execute a program. Navigate to and open the built executable file.
  2. Run command: sxe ld clrjit (this tells the debugger to break on loading any assembly with clrjit in the name, which you need loaded before the next steps)
  3. Run command: g (continues execution)
  4. When it breaks, run command: .loadby sos clr (loads .NET debugging tools)
  5. Run command: !bpmd ObjectAllocationFundamentals Program.Main (Sets a breakpoint at the beginning of a method. The first argument is the name of the assembly. The second is the name of the method, including the class it is in.)
  6. Run command: g

Execution will break at the beginning of the Main method, right before new() is called. Open the Disassembly window to see the code.

Here is the Main method’s code, annotated for clarity:

; Copy method table pointer for the class into
; ecx as argument to new()
; You can use !dumpmt to examine this value.
mov ecx,006f3864h
; Call new
call 006e2100 
; Copy return value (address of object) into a register
mov edi,eax

Note that the actual addresses will be different each time you execute the program. Step over (F10, or toolbar) a few times until call 006e2100 (or your equivalent) is highlighted. Then Step Into that (F11). Now you will see the primary allocation mechanism in .NET. It’s extremely simple. Essentially, at the end of the current gen0 segment, there is a reserved bit of space which I will call the allocation buffer. If the allocation we’re attempting can fit in there, we can update a couple of values and return immediately without more complicated work.

If I were to outline this in pseudocode, it would look like this:

if (object fits in current allocation buffer)
{
   Increment a pointer, return address;
}
else
{
   call JIT_New to do more complicated work in CLR
}

The actual assembly looks like this:

; Set eax to value 0x0c, the size of the object to
; allocate, which comes from the method table
006e2100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:006f3868=0000000c
; Put allocation buffer information into edx
006e2103 648b15300e0000  mov     edx,dword ptr fs:[0E30h]
; edx+40 contains the address of the next available byte
; for allocation. Add that value to the desired size.
006e210a 034240          add     eax,dword ptr [edx+40h]
; Compare the intended allocation against the
; end of the allocation buffer.
006e210d 3b4244          cmp     eax,dword ptr [edx+44h]
; If we spill over the allocation buffer,
; jump to the slow path
006e2110 7709            ja      006e211b
; update the pointer to the next free
; byte (0x0c bytes past old value)
006e2112 894240          mov     dword ptr [edx+40h],eax
; Subtract the object size from the pointer to
; get to the start of the new obj
006e2115 2b4104          sub     eax,dword ptr [ecx+4]
; Put the method table pointer into the
; first 4 bytes of the object.
; eax now points to new object
006e2118 8908            mov     dword ptr [eax],ecx
; Return to caller
006e211a c3              ret
; Slow Path - call into CLR method
006e211b e914145f71      jmp     clr!JIT_New (71cd3534)

In the fast path, there are only 9 instructions, including the return. That’s incredibly efficient, especially compared to something like malloc. Yes, that complexity is traded for time at the end of object lifetime, but so far, this is looking pretty good!

What happens in the slow path? The short answer is a lot. The following could all happen:

  • A free slot somewhere in gen0 needs to be located
  • A gen0 GC is triggered
  • A full GC is triggered
  • A new memory segment needs to be allocated from the operating system and assigned to the GC heap
  • Objects with finalizers need extra bookkeeping
  • Possibly more…

Another thing to notice is the size of the object: 0x0c (12 decimal) bytes. As covered elsewhere, this is the minimum size for an object in a 32-bit process, even if there are no fields.

Now let’s do the same experiment with an object that has a single int field.

class MyObjectWithInt { int x; }

Follow the same steps as above to get into the allocation code.

The first line of the allocator on my run is:

00882100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:00893874=0000000c

The only interesting thing is that the size of the object (0x0c) is exactly the same as before. The new int field fit into the minimum size. You can see this by examining the object with the !DumpObject command (or the abbreviated version: !do). To get the address of the object after it has been allocated, step over instructions until you get to the ret instruction. The address of the object is now in the eax register, so open up the Registers view and see the value. On my computer, it has a value of 2372770. Now execute the command: !do 2372770

You should see similar output to this:

0:000> !do 2372770
Name:        ConsoleApplication1.MyObjectWithInt
MethodTable: 00893870
EEClass:     008913dc
Size:        12(0xc) bytes
File:        D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70f63b04  4000001        4         System.Int32  1 instance        0 x

This is curious. The field is at offset 4 (and an int has a length of 4), so that only accounts for 8 bytes (range 0-7). Offset 0 (i.e., the object’s address) contains the method table pointer, so where are the other 4 bytes? This is the sync block and they are actually at offset -4 bytes, before the object’s address. These are the 12 bytes.

Try it with a long.

class MyObjectWithLong { long x; }

The first line of the allocator is now:

00f22100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:00f33874=00000010

Showing a size of 0x10 (decimal 16 bytes), which we would expect now. 12 byte minimum object size, but 4 already in the overhead, so an extra 4 bytes for the 8 byte long. And an examination of the allocated object shows an object size of 16 bytes as well.

0:000> !do 2932770
Name:        ConsoleApplication1.MyObjectWithLong
MethodTable: 00f33870
EEClass:     00f313dc
Size:        16(0x10) bytes
File:        D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70f5b524  4000002        4         System.Int64  1 instance 0 x

If you put an object reference into the test class, you’ll see the same thing as you did with the int.

Finalizers

Now let’s make it more interesting. What happens if the object has a finalizer? You may have heard that objects with finalizers have more overhead during GC. This is true–they will survive longer, require more CPU cycles, and generally cause things to be less efficient. But do finalizers also affect object allocation?

Recall that our Main method above looked like this:

mov ecx,006f3864h
call 006e2100 
mov edi,eax

If the object has a finalizer, however, it looks like this:

mov     ecx,119386Ch
call    clr!JIT_New (71cd3534)
mov     esi,eax

We’ve lost our nifty allocation helper! We have to now jump directly to JIT_New. Allocating an object that has a finalizer is a LOT slower than a normal object. More internal CLR structures need to be modified to track this object’s lifetime. The cost isn’t just at the end of object lifetime.

How much slower is it? In my own testing, it appears to be about 8-10x worse than the fast path of allocating a normal object. If you allocate a lot of objects, this difference is considerable. For this, and other reasons, just don’t add a finalizer unless it really is required.

Calling the Constructor

If you are particularly eagle-eyed, you may have noticed that there was no call to a constructor to initialize the object once allocated. The allocator is changing some pointers, returning you an object, and there is no further function call on that object. This is because memory that belongs to a class field is always pre-initialized to 0 for you and these objects had no further initialization requirements. Let’s see what happens if we change to the following definition:

class MyObjectWithInt { int x = 13; }

Now the Main function looks like this:

mov     ecx,0A43834h
; Allocate memory
call    00a32100
; Copy object address to esi
mov     esi,eax
; Set object + 4 to value 0x0D (13 decimal)
mov     dword ptr [esi+4],0Dh

The field initialization was inlined into the caller!

Note that this code is exactly equivalent:

class MyObjectWithInt { int x; public MyObjectWithInt() { this.x = 13; } }

But what if we do this?

class MyObjectWithInt 
{ 
    int x; 

    [MethodImpl(MethodImplOptions.NoInlining)]  
    public MyObjectWithInt() 
    { 
        this.x = 13; 
    } 
}

This explicitly disables inlining for the object constructor. There are other ways of preventing inlining, but this is the most direct.

Now we can see the call to the constructor happening after the memory allocation:

mov     ecx,0F43834h
call    00f32100
mov     esi,eax
mov     ecx,esi
call    dword ptr ds:[0F43854h]

Exercise for the Reader

Can you get the allocator shown above to jump to the slow path? How big does the allocation request have to be to trigger this? (Hint: Try allocating arrays of various sizes.) Can you figure this out by examining the registers and other values from the running code?

Summary

You can see that in most cases, allocation of objects in .NET is extremely fast and efficient, requiring no calls into the CLR and no complicated algorithms in the simple case. Avoid finalizers unless absolutely needed. Not only are they less efficient during cleanup in a garbage collection, but they are slower to allocate as well.

Play around with the sample code in the debugger to get a feel for this yourself. If you wish to learn more about .NET memory handling, especially garbage collection, take a look at the book Writing High-Performance .NET Code.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

iTunes 11.4 not syncing/refreshing podcasts? How I resolved it

In general, I try to avoid Apple products when I can, but I do use and enjoy and iPod Nano for podcasts.

With the recent update to 11.4 I noticed that my podcasts were not refreshing, either on a schedule or on-demand. I tried restarting iTunes, unplugging the iPod, restarting the computer – nothing. Until I looked in the settings.

Look at this setting:

(Windows version: Edit | Preferences | Store)

image

Uncheck the highlighted setting. I’m not sure how this feature is supposed to work, but once disabled, podcasts started refreshing correctly again.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

5 More Attributes of Highly Effective Programmers

Nearly 7 years ago, I wrote a little article called The Top 5 Attributes of Highly Effective Programmers that got some good feedback and has proven popular over time.

One matures as a developer, of course. I wrote that last article quite closer to the beginning of my career. Over the last few years, especially at Microsoft, I’ve had the opportunity to witness a much wider range of behaviors. I’ve been able to develop a much better sense of what differentiates the novice from the truly effective developer.

The difference in skills can be truly staggering if you’re not used to seeing it. A new programmer, or one who has not learned much from experience, can often be an order of magnitude or more less productive than a good, experienced developer. You don’t want to spend very long at the bottom of this kind of ranking. Some of this is just experience, but in many cases it’s just a mindset–there are plenty of “experienced” developers who haven’t actually learned to improve. It’s true in many professions, but especially so in programming–you can’t plateau. You have to keep learning. The world changes, programming changes, and what was true 10 years ago is laughably outdated.

The attributes I listed in the previous article are still applicable. They are still valuable, but there is more. Note that I am not claiming in this article that I’ve mastered these. I still aspire to meet higher standards in each of these areas. Remember that it is not hypocrisy to espouse good ideas, even while struggling to live up to them. These are standards to live up to, not descriptions of any one person I know (though I do know plenty of people who are solid in at least one of these areas).

Sense of Ownership

Ownership means a lot of things, but mainly that you don’t wait for problems to find you. It means that if you see a problem, you assume it’s your job to either fix it or find someone else who can, and then to make sure it happens. It means not ignoring emails because, hey, not my problem! It means taking issues seriously and making sure they are dealt with. Someone with a sense of ownership would never sweep a problem under the rug or blithely hope that someone else will deal with it.

You could equate ownership with responsibility, but I think it goes beyond that. “Responsibility” often takes on the hue of a burden or delegation of an unwelcome task, while “ownership” implies that you are invested in the outcome.

Ownership often means stepping outside of your comfort zone. You may think you’re not the best person to deal with something, but if no one else is doing it, than you absolutely are. Just step up, own the problem, and get it done.

Ownership does not mean that you do all the work–that would be draining, debilitating, and ultimately impossible. It does not mean that you specify bounds for your responsibility and forbid others to encroach. It especially does not mean code ownership in the sense that only you are allowed to change your code.

Ownership is a mentality that defies strict hierarchies of control in favor of a more egalitarian opportunism.

Closely related to the idea of ownership is taking responsibility for your mistakes. This means you don’t try to excuse yourself, shift blame, or minimize the issue unnecessarily. If there’s a problem you caused, be straight about it, explain what happened, what you’re going to do to prevent it, and move on.

Together, these ideas on ownership will gain you a reputation as someone who wants the best for the team or product. You want to be that person.

Remember, if you are ever having the thought, Someone ought to…–stop! That someone is you.

Data-Driven

A good developer does not make assumptions. Experience is good, yes, but data is better. Far, far better. Knowing how to measure things is far more important than being able to change them. If you make changes without measuring, then you’re just a random-coding monkey, just guessing that you’re doing something useful. Especially when it comes to performance, building a system to automatically measure performance is actually more important than the actual changes to performance. This is because if you don’t have that system, you will spend far more time doing manual measurement than actual development. See the section on Automation below.

Measurement can be simple. For some bugs, the measurement is merely, does the bug repro or not? For performance tuning of data center server applications, it will likely be orders of magnitude more complicated and involve systems dedicated to measurement.

Determining the right amount of data to make a decision is not always easy. You do have to balance this with expediency, and you don’t want to hold good ideas hostage to more measurement than necessary. However, there is very little you should that do completely blind with no data at all. As a developer, your every action should be independently justifiable.

The mantra of performance optimization is Measure, Measure, Measure. This should be the mantra of all software development. Are things improving or not? Faster or not? How much? Are customers happier or not? Can tasks be completed easier? Are we saving more money? Does it use less memory? Is our capacity larger? Is the UI more responsive? How much, exactly?

The degree to which you measure the answer to those questions is in large part dependent on how important it is to your bottom line.

My day job involves working on an application that runs on thousands of servers, powering a large part of Bing. With something like this, even seemingly small decisions can have a drastic effect in the end. If I make something a bit more inefficient, it could translate into us needing to buy more machines. Great, now my little coding change that I didn’t adequately measure is costing the company hundreds of thousands of extra dollars per year. Oops.

Even for smaller applications, this can be a big deal. For example, making a change that causes the UI to be 20% more sluggish in some cases may not be noticed if you don’t have adequate measurement in place, but if it leads to a bad review by someone who noticed it, and there are adequate competitors, that one decision could be a major loss of revenue.

Solid Tests

Notice that I don’t say “tests”, unqualified. Good tests, solid, repeatable tests. Those are the only ones worth having.

If you see a code change that doesn’t have accompanying test changes, don’t be afraid to ask the question, “Where are the tests?” The answer might be that existing tests cover the change, or that tests at a larger scope, or in a different change will cover it, but the point is to ask the question, and make sure there is a satisfactory answer. “Manual test” is a valid response sometimes, but this should be very rare, and justifiable.

I cannot say how many times I’ve been saved due to the hundreds of unit tests that exercise my code, especially when I’m attempting a big internal refactor, usually for performance reasons.

As important as good tests are, it’s also important to get rid of bad tests. Don’t waste resources on things that aren’t helpful. Insist on a clean, reliable test suite. I’m not sure which is worse: no tests, or tests you can’t rely on. Eventually, unreliable tests become the same as having no tests at all.

Automation

An effective developer is always trying to put themselves out of a job. Seriously. There is more work than you can possibly fit in the time allotted. Automate the heck out of the stuff that annoys you, trips you up, is repetitive, is frequent, is error-prone. Once you can break down a process into something so deterministic that you could write a script for someone else to follow and get the same result, then make sure that someone else is a program.

This is more than just simple maintenance scripts for server management. This is ANY part of your job. Collecting data? Get it automatically ingested into the systems that need it. Generating reports? If you’ve generated the same report more than twice, don’t do it a third time. Your build system requires more than a single step? What’s wrong with you?

You have to free yourself up for more interesting, more creative work. You’re a highly paid programmer. Act like it.

Example: One of my jobs in the last year has been to run regular performance profiling, analyze the results, and send them to my team, making suggestions for future focus. This involved a bunch of steps:

  1. Log onto a random machine in the datacenter.
  2. Start a 120-second CPU profile.
  3. Wait for 120-seconds plus a few minutes for processing, symbol resolution, etc.
  4. Compress file, copy to my machine
  5. Analyze file–group, filter, and sort data according to various rules.
    1. Look for a bunch of standard things that I always report on
  6. Do the same thing for a 900-second memory/exception/thread/etc. profile.

This took about an hour each time, sometimes more.

I realized that every single part of this could happen automatically. I a wrote a service that gets deployed to every datacenter machine. A couple of times per day it checks to see whether we need a profile, whether the machine is in a good state to profile, etc.. It then runs the profiler, collects the data, and even analyzes the data automatically (See Chapter 8 of Writing High-Performance .NET Code for a hint about how I did this). This all gets uploaded to a file server and the analysis gets displayed on a web-site. No intervention whatsoever. Not only do I not have to do this work myself anymore, but others are empowered to look at the data for themselves, and we can easily add more analysis components over time.

Unafraid of Communication

The final thing I want to talk about is communication. This has been a challenge for me. I definitely have the personality type that really likes to disappear into a cave and pound on a keyboard for a few days, to emerge at the end with some magical piece of code. I would delete Outlook from my computer if I could.

This kind of attitude might serve you well for a while, but it’s ultimately limiting.

As you get more senior, communication becomes key. Effective communication skills are one of the things you can use to distinguish yourself to advance your career.

Effective communication can begin with a simple acknowledgement of someone’s issue, or an explanation that you’re working on something, with a follow-up to everyone involved at regular points. Nobody likes to be kept in the dark, especially for burning issues. For time-critical issues, a “next update in XX hours” can be vital.

Effective communication also means being able to say what you’re working on and why it’s cool.

Eventually, it means a lot more–being able to present complicated ideas to many other people in a simple, understandable, logical way.

Good communication skills enable you to be able to move beyond implementing software all by yourself to helping teams as a whole do better software. You can have a much wider impact by helping and teaching others. This is good for your team, your company, and your career.

Do you have a good engineering culture?

I assume one big prerequisite to all of these attributes: You must have a solid engineering environment to operate in. If management gives short shrift to employee happiness, sound software engineering principles, or the workplace is otherwise toxic, than perhaps you need to focus on changing that first.

If your leaders are so short-sighted that they can’t stand the thought of you automating your work instead of just getting the job done, that’s a problem.

If bringing up problems or admitting fault to a mistake is a career-limiting move, then you need to get out soon. That’s a team that will eventually implode under the weight of cumulative failure that no one wants to address.

Don’t settle for this kind of workplace. Either work to change it or find some place better.

 


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at:

Practical uses of WeakReference

In Part 1, I discussed the basics of WeakReference and WeakReference<T>. Part 2 introduced short and long weak references as well as the concept of resurrection. I also covered how to use the debugger to inspect your memory for the presence of weak references. This article will complete this miniseries with a discussion of when to use weak references at all and a small, practical example.

When to Use

Short answer: rarely. Most applications won’t require this.

Long answer: If all of the following criteria are met, then you may want to consider it:

  1. Memory use needs to be tightly restricted – this probably means mobile devices these days. If you’re running on Windows RT or Windows Phone, then your memory is restricted.
  2. Object lifetime is highly variable – if you can predict the lifetime of your objects well, then using WeakReference doesn’t really make sense. In that case, you should just control their lifetime directly.
  3. Objects are relatively large, but easy to create – WeakReference is really ideal for that large object that would be nice to have around, but if not, you could easily regenerate it as needed (or just do without).
  4. The object’s size is significantly more than the overhead of using WeakReference<T> – Using WeakReference<T> adds an additional object, which means more memory pressure, an extra dereference step. It would be a complete waste of time and memory to use WeakReference<T> to store an object that’s barely larger than WeakReference<T> itself. However, there are some caveats to this, below.

There is another scenario in which WeakReference may make sense. I call this the “secondary index” feature. Suppose you have an in-memory cache of objects, all indexed by some key. This could be as simple as Dictionary<string, Person>, for example. This is the primary index, and represents the most common lookup pattern, the master table, if you will.

However, you also want to look up these objects with another key, say a last name. Maybe you want a dozen other indexes. Using standard strong references, you could have additional indexes, such as Dictionary<DateTime, Person> for birthdays, etc. When it comes time to update the cache, you then have to modify all of these indexes to ensure that the Person object gets garbage collected when no longer needed.

This might be a pretty big performance hit to do this every time there is an update. Instead, you could spread that cost around by having all of the secondary indexes use WeakReference instead: Dictionary<DateTime, WeakReference<Person>>, or, if the index has non-unique keys (likely), Dictionary<DateTime, List<WeakReference<Person>>>.

By doing this, the cleanup process becomes much easier: you just update the master cache, which removes the only strong reference to the object. The next time a garbage collection runs (of the appropriate generation), the Person object will be cleaned up. If you ever access a secondary index looking for those objects, you’ll discover the object has been cleaned up, and you can clean up those indexes right then. This spreads out the cost of cleanup of the index overhead, while allowing the expensive cached objects to be cleaned up earlier.

Other Uses

This Stack Overflow thread has some additional thoughts, with some variations of the example below and other uses.

A rather famous and involved example is using WeakReferences to prevent the dangling event handler problem (where failure to unregister an event handler keeps objects in memory, despite them having no explicit references anywhere in your code).

Practical Example

I had mentioned in Chapter 2 (Garbage Collection) of Writing High-Performance .NET Code that WeakReference could be used in a multilevel caching system to allow objects to gracefully fall out of memory when pressure increases. You can start with strong references and then demote them to weak references according to some criteria you choose.

That is the example I’ll show here. Note that this not production-quality code. It’s only about 5% of the code you would actually need, even assuming this algorithm makes sense in your scenario. At a minimum, you probably want to implement IDictionary<TKey, TValue> on it, perhaps tighten up some of the temporary memory allocations, and more.

This is a very simple implementation. When you add items to the cache, it adds them as strong references (removing any existing weak references for that key). When you attempt to read a value from the cache, it tries the strong references first, before attempting the weak references.

Objects are demoted from strong to weak references based simply on a maximum age. This is admittedly rather simplistic, but it gets the point across.

using System; using System.Collections.Concurrent; using System.Collections.Generic; using System.Diagnostics; namespace WeakReferenceCache { sealed class HybridCache<TKey, TValue>

where TValue:class { class ValueContainer<T> { public T value; public long additionTime; public long demoteTime; } private readonly TimeSpan maxAgeBeforeDemotion; private readonly ConcurrentDictionary<TKey, ValueContainer<TValue>> strongReferences = new ConcurrentDictionary<TKey, ValueContainer<TValue>>(); private readonly ConcurrentDictionary<TKey, WeakReference<ValueContainer<TValue>>> weakReferences = new ConcurrentDictionary<TKey, WeakReference<ValueContainer<TValue>>>(); public int Count { get { return this.strongReferences.Count; } } public int WeakCount { get { return this.weakReferences.Count; } } public HybridCache(TimeSpan maxAgeBeforeDemotion) { this.maxAgeBeforeDemotion = maxAgeBeforeDemotion; } public void Add(TKey key, TValue value) { RemoveFromWeak(key); var container = new ValueContainer<TValue>(); container.value = value; container.additionTime = Stopwatch.GetTimestamp(); container.demoteTime = 0; this.strongReferences.AddOrUpdate(key, container, (k, existingValue) => container); } private void RemoveFromWeak(TKey key) { WeakReference<ValueContainer<TValue>> oldValue; weakReferences.TryRemove(key, out oldValue); } public bool TryGetValue(TKey key, out TValue value) { value = null; ValueContainer<TValue> container; if (this.strongReferences.TryGetValue(key, out container)) { AttemptDemotion(key, container); value = container.value; return true; } WeakReference<ValueContainer<TValue>> weakRef; if (this.weakReferences.TryGetValue(key, out weakRef)) { if (weakRef.TryGetTarget(out container)) { value = container.value; return true; } else { RemoveFromWeak(key); } } return false; } public void DemoteOldObjects() { var demotionList = new List<KeyValuePair<TKey, ValueContainer<TValue>>>(); long now = Stopwatch.GetTimestamp(); foreach(var kvp in this.strongReferences) { var age = CalculateTimeSpan(kvp.Value.additionTime, now); if (age > this.maxAgeBeforeDemotion) { demotionList.Add(kvp); } } foreach(var kvp in demotionList) { Demote(kvp.Key, kvp.Value); } } private void AttemptDemotion(TKey key, ValueContainer<TValue> container) { long now = Stopwatch.GetTimestamp(); var age = CalculateTimeSpan(container.additionTime, now); if (age > this.maxAgeBeforeDemotion) { Demote(key, container); } } private void Demote(TKey key, ValueContainer<TValue> container) { ValueContainer<TValue> oldContainer; this.strongReferences.TryRemove(key, out oldContainer); container.demoteTime = Stopwatch.GetTimestamp(); var weakRef = new WeakReference<ValueContainer<TValue>>(container); this.weakReferences.AddOrUpdate(key, weakRef, (k, oldRef) => weakRef); } private TimeSpan CalculateTimeSpan(long offsetA, long offsetB) { long diff = offsetB - offsetA; double seconds = (double)diff / Stopwatch.Frequency; return TimeSpan.FromSeconds(seconds); } } }

That’s it for the series on Weak References–I hope you enjoyed it! You may never need them, but when you do, you should understand how they work in detail to make the smartest decisions.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code by Ben Watson. Available now in print and as an eBook at: