Category Archives: Code

Announcing Writing High-Performance .NET Code, 2nd Edition!

~~The new book will be available for sale on or around May 1, 2018. You can pre-order right now for Kindle in many Amazon markets, and for print as well in the US.~~

The book is out! Get it in the US, or go here for more markets and options.

The new book is a mix of greatly expanded coverage of old topics, as well as new material throughout. The fundamentals of .NET and how the GC and JIT work have not changed, not even in the face of .NET Core, but there have been many improvements, new APIs, and new ways of doing things that necessitated some updating and coverage of new topics.

It was also a great opportunity to incorporate a lot of the feedback I’ve received on the first edition.

Here’s a visualization of the book’s structure. I wrote the book in Scrivener and marked each section with a label depending on whether it was unchanged (green), modified (yellow), or completely new (purple). Click to expand to full size.

This is a good overview, but it doesn’t really convey what is exactly “new”–the newness is spread throughout the book in many forms. Here are some highlights:

Meta-content:

50% increase in content overall (from 68K words to over 100K words, or over 200 new pages).
Fixed all known errata (quite a bit more than on that list, honestly).
Incorporated feedback from hundreds of readers.
A new foreword by Vance Morrison, .NET Performance Architect at Microsoft.
Better-looking graphics.
Completely new typesetting system to give a more professional look. (You’ll only notice this in print and PDF formats.)
Automated build process that can produce new revisions of all formats much easier than before, allowing me to publish updates as needed.
Cut the Windows Phone chapter. (No one will miss this, right?)
Updated cover.

Content:

List of CLR performance improvements over time.
Significant increase in examples of using Visual Studio to diagnose CPU usage, heap state, memory allocations, and more, throughout entire book.
Introduced a new tool for analyzing .NET processes: Microsoft.Diagnostics.Runtime (“CLR MD”). It’s kind of like having programmatic access to a debugger, and it’s very powerful. I have many examples of its usage throughout the book, tied to specific diagnostic scenarios.
Benchmarking! I include actual benchmark results in many sections. Many code samples use a benchmarking library to show the differences in various approaches.
More on garbage collection, including advanced configuration options; reworked and more detailed explanations of the GC process including less-known information; discussion of recent libraries to help with pooling; examples of using weak references; object resurrection; stackalloc; more on finalization; more ways to diagnose problems; and much, much more. This was already the biggest chapter, and it got bigger.
More details about JIT, how to do custom warmup, how to figure out which code is jitted, and many more improvements.
Greatly expanded coverage of TPL, including the TPL Dataflow library, how to use tasks effectively, avoiding lock convoys, and many other smaller improvements to existing topics.
More guidance on struct and class design, such as immutability, thread safety, and more. When to use tuples, the new ValueTuple type.
Ref-returns and locals and when to use.
Much more content on collections, including effectively taking advantage of initial capacity; sorting; key comparisons; jagged vs. multi-dimensional arrays; analyzing the performance of various collection types; and more.
Discussion of SIMD commands, including examples and benchmarks.
A detailed analysis of LINQ and why it’s more expensive than you think.
More ways to consume and process ETW events.
A detailed example of building a Roslyn-style Code Analyzer and automatic fixer (FxCop replacement).
A new appendix with high-level tips for ADO.NET, ASP.NET, and WPF.
And a lot more…

See the book’s web-site for up-to-date details on where to buy, and other news.

Don’t Make This Dumb Locking Mistake

What’s wrong with this code:

try
{
    if (!Monitor.TryEnter(this.lockObj))
    {
        return;
    }
    
    DoWork();
}
finally
{
    Monitor.Exit(this.lockObj);
}

This is a rookie mistake, but sadly I made it the other day when I was fixing a bug in haste. The problem is that the Monitor.Exit in the finally block will try to exit the lock, even if the lock was never taken. This will cause an exception. Use a different TryEnter overload instead:

bool lockTaken = false;
try
{
    Monitor.TryEnter(this.lockObj, ref lockTaken))
    if (!lockTaken)
    {
        return;
    }
    
    DoWork();
}
finally
{
    if (lockTaken)
    {
        Monitor.Exit(this.lockObj);
    }
}

You might be tempted to use the same TryEnter overload we used before and rely on the return value to set lockTaken. That might be fine in this case, but it’s better to use the version shown here. It guarantees that lockTaken is always set, even in cases where an exception is thrown, in which case it will be set to false.

Free Kindle version of Writing High-Performance .NET Code when you buy the print version

As of last week, when you buy the print version of Writing High-Performance .NET Code from Amazon, you can get the Kindle version for free.

Get Your Thread Synchronization Right the First Time

I was recently debugging a problem that just didn’t make any sense at first.

The code looked like this:

class App
{
    public bool IsRunning {get; set;}
    private Thread houseKeepingThread;
    
    public void Start()
    {
        this.IsRunning = true;
        this.houseKeepingThread = new Thread(ThreadFunc);
        this.houseKeepingThread.Start();
    }
    
    private void ThreadFunc()
    {
        while (this.IsRunning)
        {
            DoWork();
            // wait for 30 seconds
        }
    }
};

The actual code was of course a bit more complicated, but this demonstrates the essence of the problem.

The outward manifestation of the bug was that there was evidence that DoWork wasn’t being called over time as it should have.

To debug this, I first concentrated on reasons that the thread could end early, and none of them made any sense.

To finally figure it out, I attached a debugger to a process known to be in a suspect state and discovered evidence in the state of the App object that DoWork had never run, not even once.

I stared at the code for five seconds and then said out loud: “IsRunning isn’t volatile!”

Despite setting IsRunning to true before starting the thread, it is completely possible that the thread starts running before the memory is coherent across all of the threads.

What’s amazing about this bug is that it existed from the very beginning of this application. As in checkin #1. It has probably been causing problems for a while at some low level, but it recently must have gotten worse—it could be for any reason, some change in the pattern of execution, the load on the system, who knows.

The fix was quite easy:

Make a volatile backing field for the IsRunning property.

By using volatile, there will be a memory barrier that will enforce coherency across all threads that access this variable. Problem solved.

The lesson here isn’t about any particular debugging skills. The real lesson is to make sure you get this right in the first place, because debugging it takes far longer and is much trickier than just reasoning about it in the first place. These types of problems can stay hidden and extremely rare for a very long time until some other innocuous change causes them to start manifesting in strange, seemingly impossible ways.

If you want to learn more about memory barriers and when they’re required, you can check out Chapter 4 of Writing High-Performance .NET Code, and there is also this tutorial by Joseph Albahari.

Announcing Microsoft.IO.RecycableMemoryStream

It is with great pleasure that I announce the latest open source release from Microsoft. This time it’s coming from Bing.

Before explaining what it is and how it works, I have to mention that nearly all of the work for actually getting this setup on GitHub and preparing it for public release was done by by Chip Locke [Twitter | Blog], one of our superstars.

What It Is

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

Eliminate Large Object Heap allocations by using pooled buffers
Incur far fewer gen 2 GCs, and spend far less time paused due to GC
Avoid memory leaks by having a bounded pool size
Avoid memory fragmentation
Provide excellent debuggability
Provide metrics for performance tracking

In my book Writing High-Performance .NET Code, I had this anecdote:

In one application that suffered from too many LOH allocations, we discovered that if we pooled a single type of object, we could eliminate 99% of all problems with the LOH. This was MemoryStream, which we used for serialization and transmitting bits over the network. The actual implementation is more complex than just keeping a queue of MemoryStream objects because of the need to avoid fragmentation, but conceptually, that is exactly what it is. Every time a MemoryStream object was disposed, it was put back in the pool for reuse.

-Writing High-Performance .NET Code, p. 65

The exact code that I’m talking about is what is being released.

How It Works

Here are some more details about the features:

A drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics, as close as possible.
Rather than pooling the streams themselves, the underlying buffers are pooled. This allows you to use the simple Dispose pattern to release the buffers back to the pool, as well as detect invalid usage patterns (such as reusing a stream after it’s been disposed).
Completely thread-safe. That is, the MemoryManager is thread safe. Streams themselves are inherently NOT thread safe.
Each stream can be tagged with an identifying string that is used in logging. This can help you find bugs and memory leaks in your code relating to incorrect pool use.
Debug features like recording the call stack of the stream allocation to track down pool leaks
Maximum free pool size to handle spikes in usage without using too much memory.
Flexible and adjustable limits to the pooling algorithm.
Metrics tracking and events so that you can see the impact on the system.
Multiple internal pools: a default “small” buffer (default of 128 KB) and additional, “large” pools (default: in 1 MB chunks). The pools look kind of like this:

In normal operation, only the small pool is used. The stream abstracts away the use of multiple buffers for you. This makes the memory use extremely efficient (much better than MemoryStream’s default doubling of capacity).

The large pool is only used when you need a contiguous byte[] buffer, via a call to GetBuffer or (let’s hope not) ToArray. When this happens, the buffers belonging to the small pool are released and replaced with a single buffer at least as large as what was requested. The size of the objects in the large pool are completely configurable, but if a buffer greater than the maximum size is requested then one will be created (it just won’t be pooled upon Dispose).

Examples

You can jump right in with no fuss by just doing a simple replacement of MemoryStream with something like this:

var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7}; 
var manager = new RecyclableMemoryStreamManager(); 
using (var stream = manager.GetStream()) 
{ 
    stream.Write(sourceBuffer, 0, sourceBuffer.Length); 
}

Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.

To facilitate easier debugging, you can optionally provide a string tag, which serves as a human-readable identifier for the stream. In practice, I’ve usually used something like “ClassName.MethodName” for this, but it can be whatever you want. Each stream also has a GUID to provide absolute identity if needed, but the tag is usually sufficient.

using (var stream = manager.GetStream("Program.Main"))
{
    stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}

You can also provide an existing buffer. It’s important to note that this buffer will be copied into the pooled buffer:

var stream = manager.GetStream("Program.Main", sourceBuffer, 
                                    0, sourceBuffer.Length);

You can also change the parameters of the pool itself:

int blockSize = 1024;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;

var manager = new RecyclableMemoryStreamManager(blockSize, 
                                                largeBufferMultiple, 
                                                maxBufferSize);

manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;

Is this library for everybody? No, definitely not. This library was designed with some specific performance characteristics in mind. Most applications probably don’t need those. However, if they do, then this library can absolutely help reduce the impact of GC on your software.

Let us know what you think! If you find bugs or want to improve it in some way, then dive right into the code on GitHub.

Buy EPUB and PDF of Writing High-Performance .NET Code and Save 25%

My book Writing High-Performance .NET Code is already pretty inexpensive by programming book standards, but I’ve added a new deal. If you want to do get both the PDF and the EPUB, you can get 25% off the entire order, which basically makes one of them over 50% off.

Just go to http://www.writinghighperf.net/buy, ~~add both formats to your shopping cart, and use the 25PERCENT discount code at checkout to receive the discount~~.

Update: The discount code wasn’t working reliably, so I’ve just added a third buying option that is a bundle of the two formats.

Digging Into .NET Loop Performance, Bounds-checking, Iteration, and Unrolling

Introduction

In this article I am going to go into detail about how .NET treats loops involving array and collection access and what kinds of optimizations you can expect. I had mentioned some of these in a previous article, but in this article, we’re going to go deep and see many more examples. We’re going to dig into the IL that gets generated as well the x64 assembly code that actually executes.

Will these optimization matter to your program? Only if your program is CPU bound and collection iteration is a core part of your processing. As you’ll see, there are ways you can hurt your performance if you’re not careful, but that only matters if it was a significant part of your program to begin with. My philosophy is this: most people never need to know this, but if you do, then understanding every layer of the system is important so that you can make intelligent choices. That’s the reason I wrote my book, Writing High-Performance .NET Code, and why I write these articles—to give you the information about what’s actually going on when you compile and execute a .NET program.

Some of this information you can read elsewhere, but I want to show you how to get at this yourself so that you don’t have to rely on someone publically coming out with this information, especially as new version of the JIT are released.

Tools I used: ILSpy to view IL, Visual Studio 2013 to debug and view assembly with .NET 4.5. I compiled the code in Release mode, targeting x64 explicitly. When debugging, don’t start with debugging from Visual Studio–that will not give you an accurate picture of your code. Instead, start the program separately and attach the debugger. I accomplish this easily by putting a call to Debugger.Launch() in the beginning of Main(), and then executing the program with Ctrl-F5.

A note on assembly. I chose x64 because it should be the default for development these days. For register names mentioned below, understand that things like eax and rax refer to the same register–rax is the 64-bit version of this register, and is essentially a sign-extended version of eax, but they both refer to the same physical location. Same with ecx/rdx, ecx/rcx, etc.

Bounds Checks

Bounds checks exist in .NET because the CLR must ensure memory safety at all times in managed code. It needs to simply be impossible for managed code to corrupt memory outside of the defined data structures. This means that code like the following cannot be allowed to execute:

int[] array = new int[100];
array[100] = 1;

Attempting to execute this should result in an exception rather than allow the memory corruption to occur. That means that the CLR has to inject bounds checking into array access. This can make repeated array access twice as slow if some optimizations aren’t done. This knowledge makes some people leery of using .NET for high-performance code, but we’ll see that the JIT compiler does indeed make these optimizations in some cases (and falls short in others). However, remember what I said in the introduction: these kinds of optimizations only matter if you’re CPU bound and this code is in your innermost, most called-upon code. That said, even if you don’t have to worry about performance on this level, knowing how to investigate these types of details is instructive.

Baseline Loop

To start, let’s take a look at the most basic of loops:

[MethodImpl(MethodImplOptions.NoInlining)]
private static long GetSum(int[] array)
{ 
    long sum = 0; 
    for (int i =0; i < array.Length; i++)
    { 
        sum += array[i]; 
    } 
    return sum; 
}

It simply loops through the entire array and sums all the elements. I’m explicitly disabling inlining because that can muddy the analysis a bit, and we want to focus strictly on loop optimization, not other improvements that the JIT compiler can and will do here.

In IL, this method looks like this (with my own annotations):

.method private hidebysig static 
	int64 GetSum (
		int32[] 'array'
	) cil managed noinlining
{
	// Method begins at RVA 0x20a4
	// Code size 26 (0x1a)
	.maxstack 3
	.locals init (
		[0] int64 sum,
		[1] int32 i
	)

	IL_0000: ldc.i4.0    // Push 0 onto evaluation stack as a 4-byte integer
	IL_0001: stloc.0     // Pop from evaluation stack and store into local variable at location 0, sum.
	IL_0002: ldc.i4.0    // Push 0 to evaluation stack
	IL_0003: stloc.1     // Pop and store in local variable i.
	IL_0004: br.s IL_0012     //  Jump to IL_0012 (unconditionally)
	// loop start (head: IL_0010)
		IL_0006: ldloc.0    // Push sum onto stack
		IL_0007: ldarg.0    // Push array onto stack
		IL_0008: ldloc.1    // Push i onto stack
		IL_0009: ldelem.i4  // Pop array and i off the stack and push array[i] onto the stack
		IL_000a: add        // Pop array[i] and sum off the stack, add, and push result to stack
		IL_000b: stloc.0    // Pop result off stack and store in sum
		IL_000c: ldloc.1    // Push i onto stack
		IL_000d: ldc.i4.1   // Push 1 onto stack
		IL_000e: add        // Pop i and 1 off of stack, add, and push result to stack
		IL_000f: stloc.1    // Pop result off stack and store in i

		IL_0010: ldloc.1    // Push i onto stack
		IL_0011: ldarg.0    // Push array onto stack
		IL_0012: ldlen      // Pop array off of stack, get length, and push length onto stack
		IL_0013: conv.i4    // Convert length to 4-byte integer
		IL_0014: blt.s IL_0006    // Pop i and length off of stack. If i < length, jump to IL_0006
	// end loop

	IL_0016: ldloc.0    // Push sum onto stack
	IL_0017: ret        // Return
} // end of method Program::GetSum

That looks like an awful lot of work to execute a loop. IL is a simple, stack-based language that ends up being quite verbose, but this gets translated by the JIT compiler into real machine code that takes advantage of efficient CPU instructions and registers. The resulting assembly is this:

00007FFADFCC02D0  mov         rdx,rcx                 // Copy array address to rdx
00007FFADFCC02D3  xor         ecx,ecx                 // Set ecx (sum) to 0
00007FFADFCC02D5  mov         r10,qword ptr [rdx+8]   // Copy array length to r10
00007FFADFCC02D9  test        r10d,r10d               // See if length is 0
00007FFADFCC02DC  jle         00007FFADFCC0303        // If so, skip the loop
00007FFADFCC02DE  mov         r8d,ecx                 // Copy 0 to r8 (source of ecx is not relevant)
00007FFADFCC02E1  xor         r9d,r9d                 // Set r9 to 0 (memory offset into array)
00007FFADFCC02E4  nop         word ptr [rax+rax]      // NOP
00007FFADFCC02F0  mov         eax,dword ptr [rdx+r9+10h]  // Copy array[i] to eax
00007FFADFCC02F5  add         ecx,eax                 // Add value to sum
00007FFADFCC02F7  inc         r8d                     // i++
00007FFADFCC02FA  add         r9,4                    // Increment memory address by 4 bytes
00007FFADFCC02FE  cmp         r8d,r10d                // i < length ?
00007FFADFCC0301  jl          00007FFADFCC02F0        // If so, loop back a few statements for another iteration
00007FFADFCC0303  mov         eax,ecx                 // Copy sum to return register
00007FFADFCC0305  ret                                 // Return

There are a few things to notice here:

The code is tracking both the index value i as well as the direct memory address of the array values, which is far more efficient than doing a multiplication and offset on every loop iteration.
There are no array bounds checks! This is sometimes a worry with people new to .NET—how can we maintain memory safety without boundary checks? In this case, the JIT compiler determined that the boundary checks weren’t necessary because it knew all of the constraints of this loop.

Bringing Back the Bounds Check

Now let’s introduce some variability. If instead, we pass in some boundaries to iterate through, does this change the situation?

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetSum(int[] array, int start, int end)
{
    int sum = 0;
    for (int i = start; i < end; i++)
    {
        sum += array[i];
    }
    return sum;
}
.
.
.
// Called with:
GetSum(array, 25, 75);

The IL differences aren’t very interesting so I won’t show it, but the machine instructions definitely change:

00007FFADFCC0320  sub         rsp,28h               // Adjust stack pointer
00007FFADFCC0324  mov         r9,rcx                // Copy array address to r9
00007FFADFCC0327  xor         ecx,ecx               // Set ecx (sum) to 0
00007FFADFCC0329  cmp         edx,r8d               // edx has i in it, initialized with start (25). Compare that to r8, which has the end value (75)
00007FFADFCC032C  jge         00007FFADFCC0348      // If start >= end, skip loop.
00007FFADFCC032E  mov         r10,qword ptr [r9+8]  // Copy array length to r10
00007FFADFCC0332  movsxd      rax,edx               // Copy start to rax
00007FFADFCC0335  cmp         rax,r10               // Compare i to array length
00007FFADFCC0338  jae         00007FFADFCC0350      // If i >= length, jump to this address (below)
00007FFADFCC033A  mov         eax,dword ptr [r9+rax*4+10h]  // Copy array[i] to eax
00007FFADFCC033F  add         ecx,eax               // sum += array[i]
00007FFADFCC0341  inc         edx                   // i++
00007FFADFCC0343  cmp         edx,r8d               // Compare i to end value
00007FFADFCC0346  jl          00007FFADFCC0332      // If i < jump to beginning of loop (above)
00007FFADFCC0348  mov         eax,ecx               // Copy sum to return register
00007FFADFCC034A  add         rsp,28h               // Fix up stack pointer
00007FFADFCC034E  ret                               // Return

00007FFADFCC0350  call        00007FFB3F7E6590      // Throw an exception!

The most significant difference is that now on every single iteration of the loop, it’s comparing the array index to the length of the array, and if it goes over we’ll get an exception. With this change, we’ve doubled the running time of the loop. The bounds check is there.

So why does it do this now? Is it because we’re doing a subarray? That’s easy to test. You can modify the original function at the top of this article to iterate from 25 to 75 and you’ll see that the compiler does not insert a boundary check.

No, the reason for the boundary check in this case is because the compiler can’t prove that the input values are 100% safe.

Ok, then. What about checking the start and end variables before the loop? I suspect that quickly leads to the problem of trying to understand the code. As soon as the constraints are not constant, you have to deal with variable aliasing, side effects, and tracking the usage of each value—and that’s not really practical for a JIT compiler. Sure, in my simple method, maybe it would be easy, but the general case might be extremely difficult if not impossible.

Other Ways of Getting Bounds Checks

Even simple modifications to the indexing variable, that you can logically prove cannot exceed the array boundaries may cause the bounds check to appear:

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Manipulate(int[] array)
{
    for (int i=0; i < array.Length; i++)
    {
        array[i] = array[i / 2];
    }
}

There is no way i / 2 can exceed array.Length, but the bounds check is still there:

00007FFADFCE03B0  sub         rsp,28h              // Adjust stack pointer  
00007FFADFCE03B4  mov         r8,qword ptr [rcx+8] // Copy array length to r8
00007FFADFCE03B8  test        r8d,r8d              // Test if length is 0
00007FFADFCE03BB  jle         00007FFADFCE03E8     // If 0, skip loop
00007FFADFCE03BD  xor         r9d,r9d              // Set i to 0
00007FFADFCE03C0  xor         r10d,r10d            // Set destination index to 0
00007FFADFCE03C3  mov         eax,r9d              // Set eax (source array index) to 0
00007FFADFCE03C6  cdq                              // sign extends eax into edx (needed for division)
00007FFADFCE03C7  sub         eax,edx              // eax = eax - edx 
00007FFADFCE03C9  sar         eax,1                // Shift right (divide by two)
00007FFADFCE03CB  movsxd      rax,eax              // copy 32-bit eax into 64-bit rax and sign extend
00007FFADFCE03CE  cmp         rax,r8               // Compare source index against array length
00007FFADFCE03D1  jae         00007FFADFCE03F0     // If greater than or equal, jump and throw exception
00007FFADFCE03D3  mov         eax,dword ptr [rcx+rax*4+10h]  // Copy value from array[i/2] to eax
00007FFADFCE03D7  mov         dword ptr [rcx+r10+10h],eax    // Copy value from eax to array[i]
            
00007FFADFCE03DC  inc         r9d                  // Increment i
00007FFADFCE03DF  add         r10,4                // Increment destination address by 4 bytes
00007FFADFCE03E3  cmp         r9d,r8d              // Compare i against array length
00007FFADFCE03E6  jl          00007FFADFCE03C3     // If less, loop again
00007FFADFCE03E8  add         rsp,28h              // Adjust stack
00007FFADFCE03EC  ret                              // return

00007FFADFCE03F0  call        00007FFB3F7E6590     // Throw exception

So even though we can prove that it’s safe, the compiler won’t.

There’s another way, that’s a bit more annoying. Remember the very first, baseline example, the one that did NOT have a bounds check. What if we had that exact code, but it operated on a static value rather than argument?

00007FFADFCE0500  mov         rax,0A42A235780h  
00007FFADFCE050A  mov         rax,qword ptr [rax]  
00007FFADFCE050D  movsxd      r9,ecx  
00007FFADFCE0510  mov         r8,qword ptr [rax+8]  
00007FFADFCE0514  cmp         r9,r8                 // bounds check!
00007FFADFCE0517  jae         00007FFADFCE0530  
00007FFADFCE0519  mov         eax,dword ptr [rax+r9*4+10h]  
00007FFADFCE051E  add         edx,eax  
00007FFADFCE0520  inc         ecx  
00007FFADFCE0522  cmp         ecx,r8d  
00007FFADFCE0525  jl          00007FFADFCE0500  
00007FFADFCE0527  mov         eax,edx  
00007FFADFCE0529  add         rsp,28h  
00007FFADFCE052D  ret  
                            
00007FFADFCE0530  call        00007FFB3F7E6590

The bounds check is there.

Ok, so what if we’re clever and we pass the static array to our baseline method, which doesn’t know it’s static? There is no bounds check in this case–it’s exactly the same as the baseline case.

Another case where the JIT fails to optimize where it could is iterating an array backwards:

[MethodImpl(MethodImplOptions.NoInlining)]
private static int GetBackwardsSum(int[] array)
{
    int sum = 0;
    for (int i = array.Length - 1; i >= 0; i--)
    {
        sum += array[i];
    }
    return sum;
}

Sad, but true–as much as this looks like the baseline example, it does not have the optimization.

Finally, consider the case of repeated access:

for (int i = 2;i < array.Length - 1; i++)
{
    int x = array[i-2];
    int y = array[i-2];
}

In this case, the compiler will emit a single bounds check that suffices for both accesses.

Using fixed to Eliminate Bounds Checks

Another way to eliminate bounds checks is to use the fixed keyword. I don’t really recommend it because it has safety and security considerations that you should definitely consider. However, if you really have a need for optimal array access with some nontrivial access pattern that would otherwise trigger a bounds check, this may work for you. In order to use fixed, you must enable unsafe code for your project.

Here are two options for using fixed to iterate through an array:

[MethodImpl(MethodImplOptions.NoInlining)]
unsafe private static int GetBackwardsSumFixedSubscript(int[] array)
{
    int sum = 0;
    fixed (int* p = array)
    {
        for (int i = array.Length - 1; i >= 0; i--)
        {
            sum += p[i];
        }
    }
    return sum;
}

[MethodImpl(MethodImplOptions.NoInlining)]
unsafe private static int GetBackwardsSumFixedPointer(int[] array)
{
    int sum = 0;
    fixed (int* p = &array[array.Length - 1])
    {
        int* v = p;
        for (int i = array.Length - 1; i >= 0; i--)
        {
            sum += *v;
            v--;
        }
    }
    return sum;
}

Either one will work, but keep in mind that unsafe really means unsafe! You can trample your memory in this method. I recommend keeping all unsafe code segregated to its own module, class, or otherwise limited scope and that it receives extra scrutiny during code reviews. You may also have to deal with organizational security issues with deploying unsafe code. If you can eliminate the bounds checking with pure managed code, strongly prefer that approach instead.

Foreach vs. For

Now, let’s shift our focus up one level of abstraction. In .NET, you can iterate over most collections with the foreach keyword. foreach is in most cases syntactic sugar for calling methods like GetEnumerator(), MoveNext(), and so forth. In this respect, it would be fair to assume that foreach has worse performance overall than a simple for or while loop.

However, that’s not quite the case. In some instances, the compiler will recognize a simple for loop on an array for what it is, regardless of the syntax used.

Foreach on Arrays

If we take our baseline loop from above and convert it to foreach, like this:

private static int GetSumForeach(int[] array)
{
    int sum = 0;
    foreach(var val in array)
    {
        sum += val;
    }
    return sum;
}

What is the generated IL in this case?

.method private hidebysig static 
	int32 GetSumForeach (
		int32[] 'array'
	) cil managed 
{
	// Method begins at RVA 0x2098
	// Code size 28 (0x1c)
	.maxstack 2
	.locals init (
		[0] int32 sum,
		[1] int32 val,
		[2] int32[] CS$6$0000,
		[3] int32 CS$7$0001
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldarg.0
	IL_0003: stloc.2
	IL_0004: ldc.i4.0
	IL_0005: stloc.3
	IL_0006: br.s IL_0014
	// loop start (head: IL_0014)
		IL_0008: ldloc.2
		IL_0009: ldloc.3
		IL_000a: ldelem.i4
		IL_000b: stloc.1
		IL_000c: ldloc.0
		IL_000d: ldloc.1
		IL_000e: add
		IL_000f: stloc.0
		IL_0010: ldloc.3
		IL_0011: ldc.i4.1
		IL_0012: add
		IL_0013: stloc.3

		IL_0014: ldloc.3
		IL_0015: ldloc.2
		IL_0016: ldlen
		IL_0017: conv.i4
		IL_0018: blt.s IL_0008
	// end loop

	IL_001a: ldloc.0
	IL_001b: ret
} // end of method Program::GetSumForeach

Compare that to the IL from the baseline loop. The only difference is the presence of some additional, generated local variables to store the array and index variable. Otherwise, it looks exactly like a loop. And, in fact, the assembly code also looks exactly the same (including elimination of the bounds check):

00007FFADFCB01F0  mov         rdx,rcx  
00007FFADFCB01F3  xor         ecx,ecx  
00007FFADFCB01F5  mov         r10,qword ptr [rdx+8]  
00007FFADFCB01F9  test        r10d,r10d  
00007FFADFCB01FC  jle         00007FFADFCB0223  
00007FFADFCB01FE  mov         r8d,ecx  
00007FFADFCB0201  xor         r9d,r9d  
00007FFADFCB0204  nop         word ptr [rax+rax]  
00007FFADFCB0210  mov         eax,dword ptr [rdx+r9+10h]  
00007FFADFCB0215  add         ecx,eax  
00007FFADFCB0217  inc         r8d  
00007FFADFCB021A  add         r9,4  
00007FFADFCB021E  cmp         r8d,r10d  
00007FFADFCB0221  jl          00007FFADFCB0210  
00007FFADFCB0223  mov         eax,ecx  
00007FFADFCB0225  ret

For loop on List<T>

Before seeing foreach operate on List<T>, we should see what for looks like:

private static int GetSumFor(List<int> array)
{
    int sum = 0;
    for (int i = 0; i < array.Count; i++)
    {
        sum += array[i];
    }
    return sum;
}

And the corresponding IL:

.method private hidebysig static 
	int32 GetSumFor (
		class [mscorlib]System.Collections.Generic.List`1 'array'
	) cil managed 
{
	// Method begins at RVA 0x20a0
	// Code size 31 (0x1f)
	.maxstack 3
	.locals init (
		[0] int32 sum,
		[1] int32 i
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldc.i4.0
	IL_0003: stloc.1
	IL_0004: br.s IL_0014
	// loop start (head: IL_0014)
		IL_0006: ldloc.0
		IL_0007: ldarg.0
		IL_0008: ldloc.1
		IL_0009: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1::get_Item(int32)
		IL_000e: add
		IL_000f: stloc.0
		IL_0010: ldloc.1
		IL_0011: ldc.i4.1
		IL_0012: add
		IL_0013: stloc.1

		IL_0014: ldloc.1
		IL_0015: ldarg.0
		IL_0016: callvirt instance int32 class [mscorlib]System.Collections.Generic.List`1::get_Count()
		IL_001b: blt.s IL_0006
	// end loop

	IL_001d: ldloc.0
	IL_001e: ret
} // end of method Program::GetSumFor

It looks pretty much like the for loop on an array, but with some method calls instead of direct memory accesses to retrieve the values and the length (Count).

However, the assembly code generated from this is more interesting and shows some subtle and important differences:

00007FFADFCD0BE0  push        rbx  	// Stack frame setup
00007FFADFCD0BE1  push        rsi  
00007FFADFCD0BE2  push        rdi  
00007FFADFCD0BE3  sub         rsp,20h  
00007FFADFCD0BE7  mov         rdx,rcx    // Copy array address to rdx
00007FFADFCD0BEA  xor         ecx,ecx    // Set i to 0. i is used only for the loop condition.
00007FFADFCD0BEC  xor         r11d,r11d  // Set j to 0. j is used as the iterator on the internal array inside List
00007FFADFCD0BEF  mov         r8d,ecx    // Set sum to 0
00007FFADFCD0BF2  mov         ebx,dword ptr [rdx+18h]    // Copy length to ebx
// LOOP START
00007FFADFCD0BF5  cmp         ecx,ebx                    // Compare i to length
00007FFADFCD0BF7  jl          00007FFADFCD0C00           // if less, jump below
00007FFADFCD0BF9  mov         eax,r8d                    // else copy sum to return value
00007FFADFCD0BFC  jmp         00007FFADFCD0C26           // jump to end of function
00007FFADFCD0BFE  xchg        ax,ax                      // No-op

00007FFADFCD0C00  mov         eax,dword ptr [rdx+18h]    // Copy list's length to eax
00007FFADFCD0C03  cmp         ecx,eax                    // Compare i to length
00007FFADFCD0C05  jae         00007FFADFCD0C35           // if greater than or equal, jump and throw exception
00007FFADFCD0C07  mov         r9,qword ptr [rdx+8]       // Copy List's array address to r9
00007FFADFCD0C0B  movsxd      r10,r11d                   // Copy j  to r10 and sign extend.
00007FFADFCD0C0E  mov         rax,qword ptr [r9+8]       // Copy array's length to rax
00007FFADFCD0C12  cmp         r10,rax                    // Compare j to array' length
00007FFADFCD0C15  jae         00007FFADFCD0C30           // If greater than or equal, jump and throw exception
00007FFADFCD0C17  mov         eax,dword ptr [r9+r10*4+10h]  // copy array[j] (list[i]) to eax
00007FFADFCD0C1C  add         r8d,eax                    // sum += array[j]
00007FFADFCD0C1F  inc         ecx                        // i++
00007FFADFCD0C21  inc         r11d                       // j++
00007FFADFCD0C24  jmp         00007FFADFCD0BF5           // Jump to beginning of loop 
00007FFADFCD0C26  add         rsp,20h                    // Clean up stack
00007FFADFCD0C2A  pop         rdi  
00007FFADFCD0C2B  pop         rsi  
00007FFADFCD0C2C  pop         rbx  
00007FFADFCD0C2D  ret                                    // return

00007FFADFCD0C30  call        00007FFB3F7E6590           // Throw exception

00007FFADFCD0C35  mov         ecx,0Dh                    
00007FFADFCD0C3A  call        00007FFB3CD990F8           // Throw exception

Here’s what’s going on:

The method calls on List<int> have been inlined away—we get direct memory access to the underlying array that List<int> maintains.
There is an iteration on i that compares it to the length of the List<int>.
There is a separate iteration on j (which I call j in the annotations), which compares it against the length of the internal array that List<int> wraps.
- Failing either iteration will cause an exception to be thrown.

The duplicate iterations don’t really cause the loop to be twice as slow as a pure array would be—there is still only one memory access and every thing else is operating on registers, which is extremely efficient. But if you have the choice between arrays and List<T>, well, it’s hard to beat an array for pure efficiency.

Foreach on List<T>

Finally, let’s turn our attention to foreach on List<T>. Will the compiler (or the JIT) turn this into a straightforward for loop?

No. Here’s the IL:

.method private hidebysig static 
	int32 GetSumForeach (
		class [mscorlib]System.Collections.Generic.List`1 list
	) cil managed 
{
	// Method begins at RVA 0x20f4
	// Code size 50 (0x32)
	.maxstack 2
	.locals init (
		[0] int32 sum,
		[1] int32 val,
		[2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator CS$5$0000
	)

	IL_0000: ldc.i4.0
	IL_0001: stloc.0
	IL_0002: ldarg.0
	IL_0003: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator< !0> class [mscorlib]System.Collections.Generic.List`1::GetEnumerator()
	IL_0008: stloc.2
	.try
	{
		IL_0009: br.s IL_0017
		// loop start (head: IL_0017)
			IL_000b: ldloca.s CS$5$0000
			IL_000d: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator::get_Current()
			IL_0012: stloc.1
			IL_0013: ldloc.0
			IL_0014: ldloc.1
			IL_0015: add
			IL_0016: stloc.0

			IL_0017: ldloca.s CS$5$0000
			IL_0019: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator::MoveNext()
			IL_001e: brtrue.s IL_000b
		// end loop

		IL_0020: leave.s IL_0030
	} // end .try
	finally
	{
		IL_0022: ldloca.s CS$5$0000
		IL_0024: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator
		IL_002a: callvirt instance void [mscorlib]System.IDisposable::Dispose()
		IL_002f: endfinally
	} // end handler

	IL_0030: ldloc.0
	IL_0031: ret
} // end of method Program::GetSumForeach

I won’t show the assembly code for this, but suffice it to say, it’s also quite different.

Foreach on (IEnumerable<T>)array

What would you expect to happen with code like this?

private static int GetSumForeach(int[] array)
{
    var collection = array as IEnumerable<int>;
    int sum = 0;
    foreach (var val in collection)
    {
        sum += val;
    }
    return sum;
}

In this case, the array optimization will be lost and you’ll get the same thing you did with foreach over List<T>.

Loop Unrolling

Loop unrolling is the practice of reducing the overhead of a loop by causing fewer iterations, but more work per iteration. For loop-heavy code, it can make a big difference.

You can do this unrolling yourself, such as with the following example:

private static int GetSum(int[] array)
{
    int sum = 0;
    for (int i = 0; i < array.Length-4; i+=4)
    {
        sum += array[i];
        sum += array[i+1];
        sum += array[i+2];
        sum += array[i+3];
    }
    return sum;
}

In some cases, the compiler can do this kind of optimization for you, but for loop unrolling to take place, there is one big constraint that must be obeyed: The loop constraints must be statically determinable so that the compiler can know how to modify the loop.

This is why we don’t see any loop unrolling in the assembly code for any of our previous examples. The length of the loop was always variable. What if, instead, we have something like this:

const int ArrayLength = 100;

private static int GetSum(int[] array)
{
    int sum = 0;
    for (int i = 0; i < ArrayLength; i++)
    {
        sum += array[i];
    }
    return sum;
}

The IL for this method looks about the same as before, but the assembly shows the optimization:

00007FFADFCE01C0  sub         rsp,28h              // Setup stack frame
00007FFADFCE01C4  mov         rdx,rcx              // Copy array address to rdx
00007FFADFCE01C7  xor         ecx,ecx              // Set sum to 0
00007FFADFCE01C9  xor         r8d,r8d              // Set address offset to 0
00007FFADFCE01CC  mov         rax,qword ptr [rdx+8]// Copy array length to rax
00007FFADFCE01D0  mov         r9d,60h              // Copy value 96 to r9
00007FFADFCE01D6  cmp         r9,rax               // Compare 96 to length
00007FFADFCE01D9  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01DB  mov         r9d,61h              // Copy value 97 to r9
00007FFADFCE01E1  cmp         r9,rax               // Compare to length
00007FFADFCE01E4  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01E6  mov         r9d,62h              // Copy value 98 to r9
00007FFADFCE01EC  cmp         r9,rax               // Compare to length
00007FFADFCE01EF  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01F1  mov         r9d,63h              // Copy value 99 to r9
00007FFADFCE01F7  cmp         r9,rax               // Compare to length
00007FFADFCE01FA  jae         00007FFADFCE0230     // If greater than or equal, jump and throw exception
00007FFADFCE01FC  nop         dword ptr [rax]      // Nop
// LOOP START
00007FFADFCE0200  mov         eax,dword ptr [rdx+r8+10h]  // Copy array[i] to eax
00007FFADFCE0205  add         ecx,eax                     //sum += array[i]
00007FFADFCE0207  mov         eax,dword ptr [rdx+r8+14h]  // Copy array[i+1] to eax
00007FFADFCE020C  add         ecx,eax                     // sum += array[i+1]
00007FFADFCE020E  mov         eax,dword ptr [rdx+r8+18h]  // Copy array[i+2] to eax
00007FFADFCE0213  add         ecx,eax                     // sum += array[i+2]
00007FFADFCE0215  mov         eax,dword ptr [rdx+r8+1Ch]  // Copy array[i+3] to eax
00007FFADFCE021A  add         ecx,eax                     // sum += array[i+3]
00007FFADFCE021C  add         r8,10h                      // Increment address offset by 16 bytes (4 integers)
00007FFADFCE0220  cmp         r8,190h                     // Compare address offset to 400
00007FFADFCE0227  jl          00007FFADFCE0200            // If less, loop around
00007FFADFCE0229  mov         eax,ecx                     // Copy sum to return value
00007FFADFCE022B  add         rsp,28h                     // Fix up stack
00007FFADFCE022F  ret                                     // return

00007FFADFCE0230  call        00007FFB3F7E6590            // throw exception

The basic operation of this is:

Check that we can safely do increments of N values and overflow the array (N being 4 in this specific case).
Do N operations per loop body.

As soon as you deviate from the static determinism, however, you will not be able to rely on this optimization. However, if you have special knowledge of your arrays (such as that their length is always a multiple of 4), then you can do the loop unrolling yourself in the source code.

As far as the current JIT compiler goes, you will lose this optimization if you change the array size in this example to 101. Some compilers will split this–an unrolled loop, with the leftovers tacked onto the end. The JIT compiler doesn’t do this at this time (remember, it’s under a severe time constraint).

The ultimate loop unrolling of course is just eliminating the loop altogether and listing out every operation individually. For small operations, this could be advisable–it’s just a tradeoff between code maintenance and performance that you have to make.

Summary

Understanding how small differences in code translate into more or less efficient machine code is important if you need to get the last small measure of performance on your more performance-critical components, especially if you are CPU bound.

Of all of the collections, arrays are the most efficient to access, but List<T> is very close. When iterating over arrays, you want to try to eliminate the bounds check if possible. When deciding between for and foreach, keep in mind that foreach can involve method calls for anything except pure arrays.

Understand the conditions that allow for loop unrolling. If you can formulate your code to allow it, and it makes a difference, go for it, or manually unroll loops if you get a performance gain out of it.

Most of all, figure out how to answer these types of questions for yourself so that the next time deep performance questions arise, you know how to quickly determine the answer.

Tips for Writing a Programming Book

In this article, I want to go through the process of writing a book, to give others who are thinking of writing a general idea of what kinds of things you will need to do. This is the article I wish I had read before starting out. Some of this will be applicable to fiction writing, but there are better resources if that’s what you’re interested in. I can only share what I personally know.

Before-You-Start Checklist

Here’s a short list-based summary of many of the things I’ll discuss in this article. These are all things to think about before you start writing and throughout the process:

Do you have enough material for a book? Write out an outline to make sure. Iterate over it until it’s quite detailed.
Quash self-doubt! If no one else has written this book, then it’s up to you!
Do you want or need to go with self-publishing?
Who will be your editor? (You do need one.)
Should you do a print edition? What’s the market for that?
When are you going to make the time to write?
Do you have a place you can write uninterrupted?
Will your family support the time commitment?
Does your day job have a moonlighting policy? Follow it!
Don’t use DRM!
Familiarize yourself with the various publishers’ processes. Do some dry runs. Figure out what’s going to work and what’s not.
How are you going to market your book? Get started on laying the groundwork for that early. Start tweeting and blogging in anticipation.
When marketing your book, remember to Always Be Providing Value.
Price sanely and fairly. Chances are that in unless you’re fulfilling a unique, high-in-demand niche, you’re not going to make a lot of money, certainly not enough to quit your day job. Optimize for the number of sales, and be competitive with others in your genre. The satisfaction of producing something meaningful and sharing your knowledge with others is worth the effort you are putting in.

Writing a book is not easy. Just because you know stuff does not mean you can write a book about it. But it is a prerequisite. I’m the author of two non-fiction computer programming books. The first, C# 4 How-To was published in March 2010 by Sams. My most recent book is Writing High-Performance .NET Code. This was self-published, for reasons I’ll get to later.

After the experience of writing my first book, I promised myself and whoever would listen that I would never write another book. The process was too grueling for the payoff. The money just isn’t there. Sure, you do get some, but if you have a well-paying full-time job, it’s a drop in the bucket. You should not write books for riches–chances are you won’t get them.

The rest of this article is part advice, part journal. Get out of this what you will.

Inspiration

At my day job in Bing, I’ve spent the last few years becoming something of an expert in the area of .NET performance. I wrote a very significant part of the Bing query-processing stack in .NET and performance is obviously a vital consideration in everything it does. I had the benefit of an amazing mentor (see my book’s acknowledgements [PDF]) for a number of years, and the experience we gained as a team was considerable. Many of the bits of knowledge we had to apply were things that were not available via Internet search, or if they were, lacked so much context that it was worse than useless.

One day, at a dinner with my brother-in-law, we got to talking tech and our work projects in generalities, and he casually mentioned that I should write a book about these performance lessons. This wasn’t the first time the thought had come to me, but this time I actually thought it through for a while. I mulled on it for a few weeks before writing out a simple outline, at which point I knew that I had enough material for a book.

I wasn’t sure how long it would take, or what my coworkers would think, so I kept it to myself and close family.

Doubts

Everyone has self-doubt. I certainly did. Through much of the writing process, I constantly wondered to myself whether I should be the person writing this book. I am not on the CLR team; I work on a specialized project; I’m fairly young compared to the experts I look up to: who am *I* to be writing a book on .NET Performance? Et cetera.

There is a fairly common psychological condition called Imposter Syndrome, which explains these doubts. Basically, most people at one point or another feel like a fraud in their success.

The fact is, I realized, that someone ought to have written a book like I was, and that someone might as well be me. As long as I got good editing, feedback, and technical help, there was no reason I could not produce a very high-quality textbook on .NET performance, despite my worries.

So I pressed ahead.

Another doubt was the scope of the book. My knowledge is heavily weighted towards vanilla servers, pure-.NET. No WCF, ASP.Net, WPF, etc. I am familiar with those, but they’re not my bread-and-butter, so to speak. Would people read a performance book that didn’t cover the trendy buzzwords?

I decided that they would. There was no book that covered the fundamentals of .NET performance engineering in the way I was going to. This was an important niche, and it was actually an advantage to not cover the higher-layer technologies. It gave the book laser-like focus, and allowed it to explain the fundamental techniques in a way that a widely scoped manual would lose. There was no book that explained the fundamental costs and benefits of using .NET from a performance point of view, regardless of code library.

It was like a really good cook book that would teach you fundamental cooking techniques, but the book would not have thousands of recipes. Once you had the techniques, you could apply them on any recipe from other books.

Why Self-Publishing

I probably could have gotten a publisher if I had asked. I had already published a book via Sams, and while it wasn’t a best-seller, I did earn back my advance and a little more. However, royalty rates truly are abysmal. 10% of the wholesale price is fairly typical. Wholesale is usually 50% of the cover price. So for a $40 book, you can expect to earn about $2 per copy, less on an eBook. The advantage of having a publisher is that they will get you into physical stores, which is where I saw most of my sales for my first book. However, that was 5 years ago. The market has changed dramatically in that time. Amazon is the king of book selling, by a LONG shot.

So I went with self-publishing for a number of reasons:

I could earn much higher royalties, even if I priced the book much lower.
I knew someone who could be technical editor.
I have a couple of editors in the family who could help with grammar and style.
I had experience and so knew basically what kinds of formatting I would need, what parts of the book I would need–all the technical details.
I was willing to do all of the pre-publication grunt work myself.
I have an artist in the family who could do a cover for me. Heck, even if I didn’t, I know Photoshop. I could probably whip up something better than most of the bland covers that seem de riguer for programming books these days. Admittedly, my sister did a much better job on the cover than I could have.

Self-publishing has definitely paid off for me. Benefits I’ve noticed:

A bigger sense of accomplishment. I didn’t just write the book. I managed it from beginning to end. I made it a business.
I feel much more in control of my own product. I can update it at will. I have more direct contact with many readers on Twitter and other places.
I’m not going to quit my day job anytime soon, but in the first three months of my book being out, I made more money than in all 4 years my previous book was on the market.

Writing

The hardest part of any large project is just getting started. The first thing I did was flesh out my draft outline to list all of the chapters I thought I would have, then within those chapters, individual topics.

Then I just started writing. I don’t remember if I started with the Introduction or Chapter 1, but I did start more-or-less at the beginning. At the start, I did not concern myself with grammatical correctness, precise descriptions, or even necessarily getting all the details I wanted. I just tried to get all of the ideas onto the page. As I thought of related ideas, I would write them into my outline, or later in the document itself with a TODO prefix and an explanation of what I was thinking.

I pretty much wrote the whole book linearly like this. Not perfectly, of course. I jumped around a bit as necessary. Sometimes, sections belonged in a different order, or in a different chapter entirely. When I got to a part where I knew I needed a code sample, sometimes I just wouldn’t be in the mood to work on code or the debugger, so I would put a TODO in the chapter with an explanation of what the code needed to do. Then I moved onto writing something else.

One of the things I had to decide fairly early on was my writing style. Was it going to be “folksy” or “clinical”? How much like a textbook did I want it to read? My natural style is fairly informal. I started writing the book just like I write a blog entry, but I realized, especially later during editing, that while this is mostly ok, you do need to tighten up the writing a little bit. Some things just require more clarity or a precise explanation.

Another important decision I made early on was to take a stand, be opinionated in what I recommend. So much programming documentation is clinical and does not take a firm stand on what you should do in a given situation. I wanted to make sure my own personal voice rang through the text.

This isn’t just about the language and grammar being used–it applies especially to the content. In a blog entry, if you don’t want to explain a piece of prerequisite knowledge, you can just link to an existing source somewhere on the Internet. Book readers do not want that. If there’s something they need to know, you need to provide it to them (within reason of course–it’s fair to state knowledge assumptions up front).

The increased need for formality also forces you to really get your facts right. There is nothing like teaching others to really get you to learn the material well. This is true for EVERYTHING, but especially if you’re writing a book. There is a huge difference between knowing in your head how something works, and being able to write it down in a coherent way. Of all the chapters in my book that I knew I had to get right, Chapter 2 – Garbage Collection is the one that needed to be the most perfected. I knew garbage collection details pretty well. I have nearly weekly interaction with the people responsible for maintaining and developing it, but it was still surprising how many nit-picky details I needed to tweak after going through the content with the owner of the GC.

I did not have daily or weekly writing goals at first, but as I got towards the end, I did start challenging myself to writing a thousand words per writing session. This can actually be pretty tough when you realize you need to do more research, or write a code sample. Those take time. My only goal was the rough timeline. I knew I wanted to publish in the summer of 2014, so I planned out what had to be done by when, and gave myself about two months for editing and finalization.

Resist the urge to pad your text. This is really easy to do in a programming text. Just throw some more code on the page, add some extra topics, and you can create pages out of thin air! Don’t do this! Yes, find good, relevant content that should be in your book, but only add it with the right motivation. It’s unlikely your outline will contain everything that needs to be in the book—you will certainly forget something. Do research, brainstorming, take a step away from the project, ask your technical editor what’s missing, etc. Add content for good reasons that fit the scope of your book and leave it at that.

Tools

I used Word 2013 for the entire process. This has its pros and cons.

Pros:

Advanced formatting abilities. (It’s not perfect for advanced layout requirements, but for a book like mine, it was sufficient.)
Great for producing a print-ready PDF.
Great collaboration abilities, especially via OneDrive and Office Online.
Ability to edit anywhere I was, on any computer.

Cons:

eBooks support very limited formatting. In particular, the Amazon Kindle conversion was a nightmare. I had to remove a LOT of formatting. (Tables and multiple fonts per paragraph are the two big ones that come to mind.)
When it came time for the EPUB conversion, which is used for every online retailer except Amazon, it required an entire day to clean up the HTML from the conversion process. I had to fix a lot of styles and makes tons of tweaks. To the point where it is now easier to make corrections in multiple documents rather than make them in the master document and then redo the conversions.

I stored the book on OneDrive while I was actively working on it, which allowed me to edit anywhere I was. When it came time to collaborate with editors, this made it very easy to share with others who could add comments directly in the document. I made weekly, sometimes daily, backups to another drive at home, which was further backed up by Carbonite.

For screenshots, I used the built-in Windows utility Snipping Tool. This mostly worked, but you have to keep in mind that print is 300 DPI, while most screens are 96 DPI, which means the images are smaller than you think. In some cases, I used Photoshop Elements to increase the size of and resample small images to make them suitable for printing.

For code samples, I used Visual Studio 2012 Ultimate. I used Subversion on my local Synology NAS to store my code samples. Once the book was done, I also added all graphics, manuscripts, and all other book artifacts as well. The Synology has a publicly accessible DNS that I could use from my laptop away from home.

For eBook editing, I used Calibre, which has both library management and eBook editing tools.

When it came time to upload to online retailers, I first ran EPUBCHECK on the final EPUB files to ensure they passed with 0 errors.

Time Scheduling

Given that I have a full-time job and a family with a small daughter, as well as other commitments (including a musical, where I was playing in the orchestra), finding time to work on the book was a problem. My wife and I instituted a “night off” every week to work on personal projects—after dinner, we have no more responsibilities for the rest of the night. We don’t have to help with dishes, bath time, reading stories, or interacting at all. I utilized my night for working on the book. I say goodnight, kiss and hug my daughter, and then I can isolate myself in the house or go somewhere else. The time was 100% my own. Typically this amounted to about 4 hours. For the rest of the week, I would be lucky to get an hour or two a night. But the 4 hour block really helped–it was enough time that I could focus on big issues.

Towards the end, perhaps the last month and a half, one night a week wasn’t sufficient either. There was so much to do that I had to start taking most of all my Saturdays to work on the book or I would never get done.

In the end, the book took about 10 months, start to end, and that was mostly working nights and weekends. And the book isn’t even that long! Don’t underestimate how much work it is.

Setting

For writing, by far the best place for me to work was somewhere outside the home that had some kind of ambient noise. I worked at the local library, Starbucks, and even McDonald’s a couple of times (they were open later than the others). There is something about the noise level that can help you concentrate more than being at home. If you have to work at home, you can try any number of ambient noise tracks on YouTube.

The times where I did work at home, I usually put in headphones and listened to classical music.

For code and debugging samples, it was much easier to just do that at home where I have two large screens.

Editing

I am lucky. I knew someone who would be a great technical editor. He was on the CLR team before joining my team and he became a great mentor. I asked if he would be my technical editor, and he readily agreed. This meant that my technical content was in good hands. He wouldn’t let anything egregious slip by.

Since I also had a relationship with people on the CLR team, I asked some of them to proofread smaller portions of the book, in particular Chapter 2, which I consider the most important part.

Once I was done with the first draft, I gave the whole thing to my TE. After he had it for a few weeks, we got together in person and walked through a major portion of the manuscript. I came away with about three pages of notes. Everything from minor technical errors, need for a source, a better wording, to a better way of summarizing content in the chapters and at the end of the book. Not all of them were big changes, but some were. A couple involved major restructuring. This was about a month worth of work.

For grammar, style, and formatting, I asked my wife and father to review. My wife is an amazing editor. She has worked at a couple of jobs that required very precise abilities in analyzing documents, so she was the perfect person for this job. I also asked my dad to take a look as well, and between them I got a lot of good feedback.

To allow them to edit, I shared the document with them via OneDrive and asked them to only leave comments—not change the body text. Once I resolved their comment, I deleted it.

One other thing I did was a bit a crazy. I turned on ALL of Word’s grammar suggestions and analyzed the final manuscript. Don’t do this unless you like pain. If you do like pain, then go to Options | Proofing | When correcting spelling and grammar in Word | Settings…, which brings up this window:

Turn on the settings you want and prepare to be nitpicked to death. I turned it off after a while, but it was helpful, and helped forestall some comments from my wife.

One of the most common things that was corrected by my editors was the use of contractions. If you look in the final book text, you will find very, very few contractions. Contractions are a very informal way of writing. Fine for a blog entry, but in a book, they do stand out a bit. I only left them in when they sounded better.

There are dozens of nitpicky details you have to take care of. It’s hard to think of them all right now, but I strongly suggest you look at another book in the same genre you’re thinking of writing and analyze all aspects of it, especially the parts that aren’t just paragraphs of text. Things like:

Heading Capitalization.
Number of heading levels.
Structure of lists. Sentences or not? Periods at the end or not? My rule was that each list had to be internally consistent, but not necessarily consistent with other lists. Catching all of these can be hard.
Sentences that are too long, too short, oddly phrased, confusing, etc.
Colloquialisms.

A lot of this stuff you will be absolutely blind to. You MUST have another person help you find a lot of it. It’s also helpful if that person is familiar with the genre or is at least well-read and understands the language very well. Don’t ask your buddy who reads < 1 book a year to proofread.

Formatting

If you’re writing a book that is guaranteed to never be printed, then your primary concern with formatting should be how little of it you can get away with. Use one or two fonts. Use as few header styles as possible. There are plenty of formatting guides out there. One that I read and was very helpful was Mark Coker’s Smashwords Style Guide.

Right up until the first week of publication, I was planning on keeping this eBook-only. As soon as I told people it was available, though, some people asked about a print edition, and I realized I would need to do that.

Formatting for print is significantly more work. I wanted a number of things:

Bold, obvious styles for chapter titles, section headers, and more. Some of this includes underlining.
Graphics with captions throughout the book.
Different font and background shading for code samples.
Different font for code elements cited in paragraphs.
Appropriate white space after tables, code samples, section titles.
Appropriate white space before section titles.
Special callouts and borders for anecdotes and tips.
Page numbers, with alternating sides for even/odd.
Page headers consisting of the chapter number and title.
Chapters always start on the right side (I had originally planned on some graphics that bled to the edge, but this would require a more expensive printing option, so I abandoned it)

Word was great for all of this. If you’re not familiar with these features, it can take a bit to make it work, and it probably doesn’t have the power of a true desktop publishing application, but it worked great for me.

There are a lot more things I could have done. Take a look at another programming book and check out the advanced formatting and features they include. Most of it is unnecessary, but it can add a certain flair.

Cover Design

I knew almost from the beginning that I wanted the cover to have gears on it (see the introduction to the book for an explanation why), and have a very distinctive look. Most programming books are very bland. I wanted mine to stand out and still be professional-looking.

I am fortunate to have an artist sister, Claire Watson, who was willing to do some Photoshop work for me. She has a great eye for design, and after I told her my general ideas, gave her some sample stock photos, she produced no less than 30 variations of multiple cover designs. The one I ultimately chose was a variation of her very first design, and I love it.

You can find artwork and jewelry by my sister Claire at http://www.bluekittycreations.co.uk/.

Code Samples

I used Visual Studio 2012 Ultimate for the code samples. I could have used the 2013 version, but decided 2012 was a safer bet for minimum hassle for the majority of readers.

There were a few challenges with code samples:

Keeping them extremely short while still demonstrating the point I’m trying to convey.
Line length, especially on an eBook is an issue. Small screens and limited resolution play havoc with code formatting. If you’re on a really small or old device like an iPhone or original Kindle, then sorry, I did not bother trying to format for your device—it doesn’t even make sense at that point. I optimized for devices like the Kindle Fire and better.

Moonlighting Policy

Before I completed the book, I checked and double-checked the current moonlighting policy of Microsoft, which does allow me to write and publish books without prior approval. The only thing that can get me in trouble is revealing proprietary information, and I was very careful about this, especially when talking about .NET and GC internals (I had the GC owner review the whole chapter, and she did ask me to remove some small details that are prone to change in the future). So no worries here about publishing.

Prepping for Publish

If you thought you were done when the manuscript was complete, edited, and finalized, then you are in for a surprise. Depending on how complicated your manuscript is, you still have days worth of work ahead of you, so plan accordingly.

At some point I set a hard deadline for myself: July 12. That was the day it was going to go live on at least Amazon.com. The weeks before that, I basically worked every night and all weekend to complete as much of the final edits that I could and get final feedback from my editors, both technical and grammar. Then I took off work on Thursday and Friday to work on the eBook conversion. I knew this was going to be a chore, but I had no idea how grueling it would be. I understand now the value that professional publishers, editors, typesetters, etc. have. I decided to do the Amazon Kindle conversion first. For this, I just needed to upload the Word document to Amazon and their system would check it and provide a list of errors to me. I could also proofread it on various Kindle simulators, which would attempt to show me what it would look like. This was very helpful.

The single biggest problem from this process was that paragraphs with different fonts in them did not render properly. I used different font styles for code keywords within body text and this was throwing off the whole system. So I had to go fix a bunch of styles in a copy of my manuscript. I didn’t want to modify the original document because I knew that I would want those different styles for EPUB and print.

I also quickly realized that tables, despite being supported in all eBook formats will not work except for the most trivial of tables. It just doesn’t look good. This was too bad, because I had spent quite a while converting some bullets to tables. I spent a few hours converting back.

It was invaluable going through the book on the various Kindle simulators. I got to see a lot of things that just didn’t work on a small screen. It’s always a challenge putting a well-formatted book into an eBook, especially one with tons of code samples. Some Kindles displayed the shaded backgrounds, others didn’t. None will use different fonts, so you don’t get the benefits of uniform spacing for code samples. All of this I just let slide–I figure if you’re reading a coding book on a Kindle, you kind of know what you’re getting. I didn’t even bother proofreading (beyond curiosity) on the older Kindles or iPhone Kindle App. You deserve what you get if you try it. Sorry.

By Friday night, I hit the Publish button on Amazon KDP. A few hours later, it was up for sale.

On Saturday, I went through the EPUB conversion process. This went a bit faster than the Kindle process, but was still a little bit labor intensive. I used Calibre to convert the Word document to EPUB format, which is really just a ZIP file containing HTML, styles, and images. Then I used in the Calibre eBook editor to modify the HTML as necessary to fix up styles and formatting inconsistencies. This took a few hours and required a bit of hand-editing of styles and individual locations throughout each chapter.

Once all of that was done, the EPUB could be uploaded to all of the retailers besides Amazon.

By the end of Saturday, I was published on all major eBook platforms.

On Monday when I announced it to my work colleagues, a bunch of people said they wanted a print edition, which up until this point, I was not sure I was going to do. But given that response, I instantly realized that a dead tree version of a programming book is still in-demand. It’s the code samples—some people just prefer a physical book for code.

So I spent the next week getting it ready for publication via CreateSpace. Basically, this was just ensuring that all of the formatting I wanted (see above) was present and correct. My first proof copy actually had some bad fonts for most of the code samples–I had selected Courier New, and it just didn’t look as good as Consolas, so that was the major fix before publication.

I ordered a single proof copy, made the font change (and a few other minor fixes as well), and then hit the publish button. Within a day or so it was on Amazon for sale. A couple of days after that, it was matched up with the existing Kindle book so people could easily see both editions. (The matching process is normally automatic, but if it doesn’t happen for you, you can contact KDP support and let them know the ISBN/ASIN numbers and they can match them up for you.)

A note on DRM (Digital Rights Management): Avoid it. Unfortunately, your book is going to be hacked, stolen, put on Pirate Bay, whatever, anyway. DRM only punishes the law-abiding consumer. Just don’t, and make peace with the fact that people will steal things. If you sell to organizations, then it is fair to ask them to pay for multiple copies for multiple people, but don’t try to enforce this. It’s a losing battle and isn’t going to help your bottom line much anyway.

Pricing

By the end of Friday, I had the Kindle ready to go. I just had to decide on markets and pricing. I went with $9.99 because that’s the maximum price you’re allowed to do on KDP and still get a 70% royalty rate. This was actually disappointing to me. For me, the sweet spot in pricing would have been around $15. This is competitive with existing programming eBooks while being slightly cheaper. I had a fear that if I priced it too low, I wouldn’t have credibility compared to more expensive offerings. I don’t worry about this now. The book has taken off and gotten enough reviews that it can stand on its own, and the price now only helps.

For the print edition, I set the price much more in line with current offerings (slightly cheaper, of course). I also made EU prices similar to other books in those markets, but I have since changed them to be tied to US price and the exchange rate. This makes my book VERY competitive in those markets, and it has definitely netted me more sales.

Marketing

I tried to have a comprehensive marketing plan, but I don’t know how well I followed it. There were so many things to do that a lot of things just fell through the cracks for weeks. However, when you self-publish, time doesn’t really matter. Yes, more sales in a short period of time can lead to follow-on sales as the book will get featured in some places, especially on Amazon, but in the long run, you can try lots of things and let your book’s demand naturally grow.

Here is a short summary of what I did:

Reddit – I had no idea how much good this would do for me, but it really did. I didn’t just post a link to my book and beg people to buy it—that’s guaranteed to fail on any social media site, but especially on Reddit. Instead, I discussed my own background and motivation for the book. Just those facts contained enough interesting tidbits that people wanted to know more. My biggest single-day spike in sales came from that thread. Plus it led to a lot of interesting discussion. The point here is that I tried to provide value external to my book.
Twitter – I was never a big Twitter user, and I’m still not, but I decided to make some efforts and reach out to people, participate in discussions, comment on tech news, share personal tidbits, and more. Yes, I plug my book, especially retweeting other people’s comments, but I don’t make my Twitter feed exclusively book plugs—that’s annoying. Remember, try to provide value.
Free Samples – The book’s site contains a PDF of the book up through Chapter 1. People can see the full Table of Contents and get a feel for what’s in the book, without me giving it all away. I believe a free sample is critical. I even went further, releasing the entirety of Chapter 5 (Class Design and General Coding) as an article on CodeProject. It won Best C# Article for August and is ranked as one of the Top 5 articles posted on the site.
More articles on CodeProject, such as this one about object allocation. That article starts with something that’s in the book, but I provide a lot more information and examples that are not in the book. This article won Best C# Article of September. (Sadly, I didn’t have an article for October, but I’ll work on another one soon.)
Letters to influential bloggers and podcasters. I knew this was a long shot. They probably get product and media pitches all the time, right? But a few did respond. Some written reviews will hopefully show up on some influential blogs. I was a guest on the fabulous .NET podcast, .NET Rocks in episode 1041. My book was also mentioned in This Week on Channel 9.
Facebook Page – an easy way for people to follow updates on the book, reviews, articles, blog entries. It’s not a super busy page, but it’s a useful central place for making announcements.
Ads – I tried advertisements on Google, Bing, Facebook, and Twitter, with very limited effect. I mostly tried out of curiosity, and shut down these experiments fairly quickly after it was clear they weren’t doing much.
Letters to family and friends – I sent an email about my book to literally everyone in my address book, not asking them to buy it, but to forward to programmer friends or colleagues of theirs.
Ask for reviews – I’m not shy about it. Anybody who I know read it and loved it, I will ask them to leave a review on Amazon. These make a difference. I hear from a lot of people on Twitter, and I’ll give them a few weeks or so, and then ask them to write a review. I never tell people to leave a five star review or give any other guidance. More reviews the better.
Book’s website – Where to get all information about the book, a complete retail location listing, a place to buy the PDF or EPUB directly from me, errata page, download the book’s source code samples, and more. This site is important.
Posters – I printed 50 copies of a small poster of my book and put it up on every floor of my building and two others near me. I’m not sure what effect this had. Probably not worth it.
GoodReads – I made sure my book was in GoodReads, with the right cover, editions, descriptions, etc. You can add all of this yourself.
Blogging – I started blogging again, before the book came out. This provides another outlet for providing extra value as well as places to mention the book.

Remember to go beyond just “Please buy my book” – Always Be Providing Value.

In the end, the buzz from the podcasts, and other online reviews that have popped up have definitely helped, but most people are finding this book just through searching Amazon. My book comes up when you search for “.NET Performance” and that’s probably the most effective thing for lasting sales performance.

Sales Results

I put my book up for sale literally everywhere I could: Kindle, CreateSpace (for print), Kobo, Nook, Smashwords, Google Play, Apple iBooks, and my own site in EPUB and PDF. Smashwords, in turn, distributes to other retailers. I wanted it available everywhere, and it mostly is. Through CreateSpace’s extended distribution network, it even goes to some physical stores I believe–I know I’ve seen some fairly decent-sized orders for that network.

But looking at the numbers, it’s interesting just how much Amazon dominates the field here. Here are the percentages:

If you add Kindle and Print (both primarily on Amazon), that’s nearly 90% of all my sales coming from Amazon. The next-best thing is the PDF from the book’s site. You might be tempted then to only publish on Amazon, but I think this is a mistake in the long run. Even though the percentages are small, I do get sales from the other channels. There is a psychological benefit as well—many people are happy just to see that it’s available everywhere, even if they buy from one of the usual places.

Also, don’t forget the international market. Nearly all of my sales via my own website are from overseas customers who can’t easily use one of the big eBook retailers. By providing an EPUB and PDF to them, they can use the document however they need.

Contact

If you have any questions about this whole process, please just let me know either in the comments or on Twitter.

Book Review: High-Performance Windows Store Apps

I recently read Brian Rasmussen’s book High-Performance Windows Store Apps, and I think it’s an excellent companion to my own Writing High-Performance .NET Code.

It’s a good companion because while my book is all about the nitty-gritty details of .NET and how they effect performance, I don’t go too much into things at the UI layer. My perspective is much more systems and servers based, while Brian is coming at this from a UI/XAML focus. Brian covers a bunch of topics in a very good way.

The title specifically mentions “Windows Store” apps, but the principles and techniques described in here apply to any desktop/UI application, not just the types you see typically on tablets or phones. It’s worthwhile even if you develop just desktop apps.

Particular things I liked:

Chapter 3 – Designing for Performance – It presents a number of interesting performance scenarios, many of which aren’t obvious if you’re new to the new world of multi-device, cloud-connected apps we’re in. The discussion on resource management and prioritization was good too.
Chapter 4 – Instrumentation – A very good walkthrough of Event Tracing for Windows, which is what every developer needs to know these days, for tracking app events for performance, debugging, or just plain logging. Lots of good stuff here, and critical for all performance analyses these days.
Chapter 5 – Performance testing – Lots of good stuff in here I didn’t know about, particularly around building an automated performance testing environment for UI apps. Automation is critical, and this gives some good examples to get you on your way.

The book used the Windows Performance Recorder for most of its examples. It’s an excellent tutorial on how to make this tool useful to you, without getting bogged down into the complexities.

Overall, highly recommended. There are not very many guides on how to do performance engineering for XAML apps, and Brian writes a great one.

You can follow Brian on Twitter @kodehoved. Check out High-Performance Windows Store Apps at Amazon.

Using MemoryStream to Wrap Existing Buffers: Gotchas and Tips

A MemoryStream is a useful thing to have around when you want to process an array of bytes as a stream, but there are a few gotchas you need to be aware of, and some alternatives that may be better in a few cases. In Writing High-Performance .NET Code, I mention some situations where you may want to use pooled buffers, but in this article I will talk specifically about using MemoryStream specifically to wrap existing buffers to avoid additional memory allocations.

There are essentially two ways to use a MemoryStream:

MemoryStream creates and manages a resizable buffer for you. You can write to and read from it however you want.
MemoryStream wraps an existing buffer. You can choose how the underlying buffer is treated.

Let’s look at the constructors of MemoryStream and how they lead to one of those situations.

MemoryStream() – The default constructor. MemoryStream owns the buffer and resizes it as needed. The initial capacity (buffer size) is 0.
MemoryStream(int capacity) – Same as default, but initial capacity is what you pass in.
MemoryStream(byte[] buffer) – MemoryStream wraps the given buffer. You can write to it, but not change the size—basically, this buffer is all the space you will have. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, bool writable) – MemoryStream wraps the given buffer, but you you can choose whether to make the stream writable at all. You could make it a pure read-only stream. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count) – Wraps an existing buffer, allowing writes, but allows you to specify an offset (aka origin) into the buffer that the stream will consider position 0. It also allows you to specify how many bytes to use after that origin as part of the stream. The stream is read-only. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count, bool writable) – Same as previous, but you can choose whether the stream is read-only. The buffer is still not resizable, and you cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count, bool writable, bool exposable)– Same as previous, but now you can specify whether the buffer should be exposed via GetBuffer(). This is the ultimate control you’re given here, but using it comes with an unfortunate catch, which we’ll see later.

Stream-Managed Buffer

If MemoryStream is allowed to manage the buffer size itself (the first two constructors above), then understanding the growth algorithm is important. The algorithm as currently coded looks like this:

If requested buffer size is less than the current size, do nothing.
If requested buffer size is less than 256 bytes, set new size to 256 bytes.
If requested buffer size is less than twice the current buffer size, set the new size to twice the current size.
Otherwise set capacity to exactly what was requested.

Essentially, if you’re not careful, you will start doubling the amount of memory you’re using, which may be overkill for some situations.

Wrapping an Existing Buffer

You would want to wrap an existing buffer in any situation where you have an existing array of bytes and don’t want to needlessly copy them, causing more memory allocations and thus a higher rate of garbage collections. For example, you’ve read a bunch of bytes from the wire via HTTP, you’ve got an existing buffer. You need to pass those bytes to a parser that expects a Stream. So far, so good.

However, there is a gotcha here. To illustrate, let’s see a hypothetical example.

You pull some bytes off the wire. The first few bytes are a standard header that all input has, and can be ignored. For the actual content, you have a parser that expects a Stream. Within that stream, suppose there are a subsections of data that have their own parsers. In other words, you have a buffer that looks something like this:

To deal with this, you wrap a MemoryStream around the content section, like this:

// comes from network 
byte[] buffer = …// Content starts at byte 24
MemoryStream ms = new MemoryStream(buffer, 24, buffer.Length – 24, writable:false, publiclyVisible:true);

So far so good, but what if the parser for the sub-section really needs to operate on the raw bytes instead of as a Stream? This should be possible, right? After all, publiclyVisible was set to true, so you can call GetBuffer(), which returns the original buffer. There’s just one (major) problem: You don’t know where you are in that buffer.

This may sound like a contrived situation, but it’s completely possible and I’ve run into it multiple times.

See, when you wrapped that buffer and told MemoryStream to consider byte 24 as the start, it set a private field called origin to 24. If you set the stream’s Position to 24, the index into the array is set to 24. That’s position 0. Unfortunately, MemoryStream will not surface the origin to you. You can’t even deduce it from other properties like Capacity, Length, and Position. The origin just disappears, which means that the buffer you get back from GetBuffer() is useless. I consider this a bug in the .NET Framework—why have the ability to retrieve the original buffer if you don’t surface the offset being used? It may be that it would be more confusing, with an additional property that many people won’t understand.

There are a few ways you can choose to handle this:

Derive a child class from MemoryStream that doesn’t do anything except mimic the underlying constructors and store this offset itself and make it available via a property.
Pass down the offset separately. Works, but feels very annoying.
Instead of using MemoryStream, use an ArraySegment<byte> and wrap that (or parts of it) as needed in a MemoryStream.

Probably the cleanest option is #1. If you can’t do that, convert the deserialization methods to take ArraySegment<byte>, which wraps a buffer, an offset, and a length into a cheap struct, which you can then pass to all deserialization functions. If you need a Stream, you can easily construct it from the segment:

byte[] buffer = …
var segment = new ArraySegment<byte>(buffer, 24, buffer.Length – 24);
…
var stream = new MemoryStream(segment.Array, segment.Offset, segment.Count);

Essentially, make your master buffer source the ArraySegment, not the Stream.

If you find this kind of tip useful, you will love my book Writing High-Performance .NET Code, which will teach you hundreds of optimizations like this as well as the fundamentals of .NET performance engineering.