How to Write and Publish a Technical Book and Not Lose Your Sanity

Earlier this year, I published my third book, Writing High-Performance .NET Code (2nd Edition), my second that was fully self-published. In this article I want to detail a topic that I found incredibly under-documented as I went through the process of writing this revision: the technical process of managing a book’s manuscript.

This won’t cover how to actually write content, or how to work with Amazon, CreateSpace, Smashwords, or the other publishers out there. It’s just about how you go from manuscript to PDF (or EPUB) in a way that is infinitely and quickly repeatable.

This can be more complicated than you think. To illustrate this, let me explain my (very broken) process for the first edition of this book.

First Edition

The first edition of Writing High-Performance .NET Code was written entirely in Microsoft Word. While easy to get started because of its familiarity (as well as the great reviewing tools when it came time for proofreading from my editor), it ended up being a curse in the long run.

  • Reorganization of content was awful. Trying to move entire chapters, or even just sections, can be very difficult once you have a lot of layout already done. I often had to rework page headings, page numbering, and various style issues after the move.
  • Adding formatting is very tedious. Since this is a programming book that will actually be printed, there are a huge number of stylistic factors that need to be considered. There are different fonts for code blocks, inline code, and tons of other features. Code samples need to be shaded and boxed. There needs to be correct spacing in lists, around lists, after paragraphs, before and after headings, and tons more. A point-and-click/WYSIWYG interface actually becomes a severe bottleneck in this case. Then, as mentioned in the first bullet–once you reorganize something, there’s a good chance you’ll need to double-check a lot of formatting. No matter how nit-picky you’re being, you need to be much more. It’s excruciating doing this in Word.
  • Control over PDF creation is very limited. Word only gives you a few knobs. I didn’t feel like springing for the full version of Acrobat.
  • Exporting to EPUB/HTML creates extremely messy HTML. Hundreds of styles, very bloated HTML. The HTML actually needed to be hand-edited in many cases. And in the end, I actually needed three versions of the EPUB file, so making post-publishing edits became triply awful.
  • Every time I needed to make a correction, the editing process was tedious beyond belief. Re-exporting from Word was a non-starter because of how much manual cleanup I had to do. Instead, I made fixes in up to 6 individual files (Master copy, EPUB, EPUB for Smashwords, EPUB for Kindle conversion, Word file for online PDF, Word file for print PDF). Yuck.

2nd Edition

When I decided to update the book for the 2nd edition, I knew I wasn’t going to repeat this awful process. I needed a completely new way of working.

My main requirement was: Trivial to go from a single source document to all publishable formats. I wanted one source of truth that generated all destination formats. I wanted to type “build” and be able to upload the resulting file immediately.

Enter Scrivener. During the review process for the first edition, a friend’s wife mentioned Scrivener to me. So it was the first thing I tried. Scrivener has a lot of features for writers of all sorts, but the things that I liked were:

  • Trivial reorganization of sections and chapters. Documents are organized in a tree hierarchy which becomes your chapters and sections. Reordering content is merely a matter of dragging nodes around.
  • Separates formatting from content. There is no WYSIWYG editing here. (You can change the text formatting–but it’s just for your comfort in editing. It has nothing to do with the final product.) In the end, I’m fudging this line a bit because I’m using a Markdown syntax.
  • Integration with Multimarkdown. This formed the basis for both print and electronic editions. Since everything is just text and Multimarkdown supports inclusion of Latex commands
  • Simple export process. I just tell it to compile my book and it combines all the nodes in a tree (you have a lot of control over how this happens) and produces a single file that can either be published directly or processed more.

Setting up the right process took quite a bit of trial and error (months), but I ended up with something that worked really well. The basic process was this:

  • Top-level nodes are chapters, with multiple sections underneath.
  • First-level formatting in Multimarkdown syntax.
  • Second-level formatting in Latex syntax, hidden inside Multimarkdown comments.
  • Using Scrivener, export book to Multimarkdown (MMD) file.
  • EPUB conversion happens directly from MMD file.
  • PDF conversion happens by converting MMD file to a TEX file, then from TEX to PDF.

Let’s look at a few of these pieces in more detail.

Multimarkdown

Multimarkdown is a specific variant of a family of markdown products which attempts to represent simple formatting options in plain text.
For example, this line:

* **# Induced GC**: Number of times `GC.Collect` was called to explicitly start garbage collection.

indicates that this is a bullet point with some bolded text, and GC.Collect is in a different font, indicating a code-level name.

There is Multimarkdown syntax for headers, links, images, tables, and much more.

However, it doesn’t have everything. If you’re doing a print book, it likely won’t be enough if you use anything but the most basic of formatting.

Latex

For advanced formatting, the industry standard is Latex. It is its own programming language and can get quick complex. However, I found getting started fairly easy. One of the best advantages of using Latex is that it has sensible defaults for everything–it just makes a good book automatically. You just change the things you notice need changing. There are tons of examples and communities out there that can help. See the section at the end for useful links.

Latex was included by putting commands inside the manuscript, inside HTML comments (which Multimarkdown would ignore), like this:

<!-- \medskip -->

The most important part of using Latex is setting up the environment–this is basically where you set up how you want all the various elements in your book to look by default. Here’s an excerpt:

<!--
\documentclass[12pt,openright]{book}
\usepackage[paperwidth=7.5in, paperheight=9.25in, left=1.0in, right=0.75in, top=1.0in, 
                                bottom=1.0in,headheight=15pt, ]{geometry}
\usepackage{fancyhdr}
\usepackage[pdftex,
pdfauthor={Ben Watson},
pdftitle={Writing High-Performance .NET Code},
pdfsubject={Programming/Microsoft/.NET},
pdfkeywords={Microsoft, .NET, JIT, Garbage Collection},
pdfproducer={pdflatex},
pdfcreator={pdflatex}]{hyperref}

\usepackage[numbered]{bookmark}

% Setup Document
% Setup Headings

\pagestyle{fancy}
\fancyhead[LE]{\leftmark}
\fancyhead[RE]{}
\fancyhead[LO]{}
\fancyhead[RO]{\rightmark}

... hundreds more lines...added as needed over time...
-->

By doing all of that, I get so many things for free:

  • Default formatting for almost everything, including code listings, table of contents, index, headers
  • Page setup–size, header layout and content
  • Fonts
  • Colors for various things (EPUB and PDF editions)
  • Paragraph, image, code spacing
  • Tons more.

One problem that vexed me for a few days was dealing with equations. Latex has equations, and there is even a Latex equation translator for EPUB, but it doesn’t work in Kindle. Since I only had three equations in the whole book, I decided not to do something special for Kindle, but simplify the EPUB formatting significantly. The other option is to convert the equations to images, but this can be tricky, and it would not be automated. I still wanted a fancy-looking equation in the PDF and print editions. So I came up with something like this:

<!--\iffalse-->
0.01 * P * N
<!--\fi-->
<!--\[ \frac{P}{100}{N} \]-->

For anything where Latex is doing the document processing, the first statement will be ignored because it’s in an \iffalse, and the second statement will be used. If Latex isn’t being processed, then text in the first statement is used and the rest ignored.

I used similar tricks elsewhere when I needed the formatting in the printed edition to be different than the simpler electronic versions.

MMD –> Latex Problems

There were still some issues that don’t automatically translate between MMD and Latex–there were fundamental incompatibilities in the EPUB and PDF translation processes, so I still had to write a custom preprocessor that replaced some MMD code with something more suitable for print. For example, when mmd.exe translates an MMD file to a TEX file, it has certain latex styles it uses by default. I could never figure out how to customize these styles, so I just did myself with a preprocessor.

Benefits

So what have I gained now?

  • All my content is just text–no fiddling with visual styles.
  • Formatting control is trivial with Multimarkdown syntax.
  • Additional specificity for PDFs is achieved with Latex inline in the document (or in the document header).
  • A single command exports the whole book to a single MMD file.

The MMD file is the output of the writing process. But it’s just the input to the build system.

Build System

Yep, I wrote a build system for a book. This allowed me to do things like “build pdf” and 30 seconds later out pops a PDF that is ready to go online.

Confession: the build system is just a Windows batch file. And a lot of cobbled tools.

The top of the file was filled with setting up variables like this:

@echo off

REM Setup variables

SET PRINTTRUEFALSE=false{print}
SET KINDLETRUEFALSE=false{kindle}
SET VALIDEDITION=false


IF /I "%1"=="Kindle" (
set ISBN13=978-0-990-58348-6
set ISBN10=0-990-58348-1
set EDITION=Kindle
set EXT=epub
SET KINDLETRUEFALSE=true{kindle}
set VALIDEDITION=true)
…

Then is setting up build output and locating my tools:

set outputdir=%bookroot%\output
SET intermediate=%outputdir%\intermediate
SET metadata=%bookroot%\metadata
SET imageSource=%bookroot%\image
IF /I "%EDITION%" == "Print" (SET imageSource=%bookroot%\image\hires)
SET bin=%bookroot%\bin
SET pandoc=%bin%\pandoc.exe
SET pp=%bin%\pp.exe
SET preprocessor=%bin%\preprocessor.exe
SET mmd=%bin%\mmd54\multimarkdown.exe
SET pdflatex=%bin%\texlive\bin\win32\pdflatex.exe
SET makeindex=%bin%\texlive\bin\win32\makeindex.exe
SET kindlegen=%bin%\kindlegen.exe
…

%bookroot% is defined in the shortcut that launches the console.

Next is preprocessing to put the right edition label and ISBN in there. It also controls some formatting by setting a variable if this is the print edition (I turn off colors, for example).

IF EXIST %intermediate%\%mmdfile% (%preprocessor% %intermediate%\%mmdfile% !EDITION=%EDITION% !ISBN13=%ISBN13% !ISBN10=%ISBN10% !EBOOKLICENSE="%EBOOKLICENSE%" !PRINTTRUEFALSE=%PRINTTRUEFALSE% !KINDLETRUEFALSE=%KINDLETRUEFALSE% > %intermediate%\%mmdfile_edition%)

 

Then comes the building. EPUB is simple:

IF /I "%EXT%" == "epub" (%pandoc% --smart -o %outputfilename% --epub-stylesheet styles.css --epub-cover-image=cover-epub.jpg -f %formatextensions% --highlight-style pygments --toc --toc-depth=2 --mathjax -t epub3 %intermediate%\%mmdfile_edition%)

 

Kindle also generates an EPUB, which it then passes to kindlegen.exe:

IF /I "%EDITION%" == "Kindle" (%pandoc% --smart -o %outputfilename% --epub-stylesheet styles.css --epub-cover-image=cover-epub.jpg -f %formatextensions% --highlight-style haddock --toc --toc-depth=2  -t epub3 %intermediate%\%mmdfile_edition%

%kindlegen% %outputfilename%

PDF (whether for online or print) is significantly more complicated. PDFs need a table of contents and an index, which have to be generated from the TEX file. But first we need the TEX file. But before that, we have to change a couple of things in the MMD file before the TEX conversion, using the custom preprocessor tool. The steps are:

  • Preprocess MMD file to replace some formatting tokens for better Latex code.
  • Convert MMD file to TEX file using mmd.exe
  • Convert TEX file to PDF using pdflatex.
  • Extract index from the PDF using makeindex.
  • Re-run conversion, using the index.
IF /I "%EXT%" == "pdf" (copy %intermediate%\%mmdfile_edition% %intermediate%\temp.mmd
%preprocessor% %intermediate%\temp.mmd ```cs=```{[Sharp]C} ```fasm=```{[x86masm]Assembler} > %intermediate%\%mmdfile_edition%
%mmd% -t latex --smart --labels --output=%intermediate%\%latexfile_edition% %intermediate%\%mmdfile_edition%
%pdflatex% %intermediate%\%latexfile_edition% > %intermediate%\pdflatex1.log
%makeindex% %intermediate%\%indexinputfile% > %intermediate%\makeindex.log
%pdflatex% %intermediate%\%latexfile_edition% %intermediate%\%indextexfile% > %intermediate%\pdflatex2.log

Notice the dumping of output to log files. I used those to diagnose when I got either the MMD or Latex wrong.

There is more in the batch file. The point is to hide the complexity–there are NO manual steps in creating the publishable files. Once export that MMD from Scrivener, I just type “build.bat [edition]” and away I go.

Source Control

Everything is stored in a Subversion repository which I host on my own server. I would have used Git since that’s all the rage these days, but I already had this repo from the first edition. The top level of the 2nd edition directory structure looks like this:

  • BetaReaders – instructions and resources for my beta readers
  • Bin – Location of build.bat as well as all the tools used to build the book.
  • Business – Contracts and other documents around publishing.
  • Cover – Sources and Photoshop file for the cover.
  • CustomToolsSrc – Source code for my preprocessor and other tools that helped me build book.
  • Graphics – Visio and other source files for diagrams.
  • HighPerfDotNet2ndEd.scriv — where Scrivener keeps its files (in RTF format for the content, plus a project file)
  • Image – All images for the book, screenshots or exported from diagrams.
  • Marketing – Some miscellaneous images I used for blog posts, Facebook stuff.
  • Metadata – License text for eBook and styles.css for EPUB formatting.
  • mmd – output folder for MMD file (which I also check into source control–so I can do a build without having Scrivener installed, if necessary).
  • Output – Where all output of the build goes (working files go to output\intermediary)
  • Resources – Documents, help files, guides, etc. that I found that were helpful, e.g., a Latex cheat sheet.
  • SampleChapter – PDF of sample chapter.
  • SampleCode – Source code for all the code in the book.

Tools

  • Scrivener – What I manage the book’s manuscript in. Has a bit of a learning curve, but once you get it set up, it’s a breeze. Only thing I wish is for a command-line version that I could integrate into my build.bat.
  • PdfLatex.exe – You need to get a Latex distribution. I used texlive, but there are more–probably some a lot more minimal than the one I got. It’s big. This is what creates the PDF files.
  • Pp.exe – an opensource preprocessor I used for a while, but it ended up not being able to handle one situation.
  • Preprocessor.exe – Custom tool I wrote to use instead of pp.exe.
  • Multimarkdown – Scrivener has an old version built-in, but I ended up using the new version outside of it for some more control.
  • Pandoc – I use this for the conversion from MMD to EPUB (and Word for internal use).
  • Adobe Digital Editions 3.0 –  This is useful for testing EPUB files for correct display. I couldn’t get the latest version to work correctly.
  • Calibre – In particular, the EPUB editor. For the first edition, I actually edited all the EPUB editions using this to clean up and correct the garbage Word output. For the second edition, I just used it for double-checking and debugging conversion issues.

Useful Web Sites


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Sample Chapter of Writing High-Performance .NET Code, 2nd Edition

Head on over to http://www.writinghighperf.net to read a sample chapter of Writing High-Performance .NET Code, 2nd Edition. The sample PDF also includes some of the front and back material to give you a taste for the rest of the book.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Writing High-Performance .NET Code on the No Dogma podcast

I was recently interviewed by the wonderful Bryan Hogan for his No Dogma podcast. Check out the episode here. If you want to subscribe, which you should, podcast feed is here. You can follow Bryan or myself on Twitter.

Check it out! We had a great discussion about performance, Bing, Azure, and more.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Announcing Writing High-Performance .NET Code, 2nd Edition!

The new book will be available for sale on or around May 1, 2018. You can pre-order right now for Kindle in many Amazon markets, and for print as well in the US. 

The book is out! Get it in the US, or go here for more markets and options.

The new book is a mix of greatly expanded coverage of old topics, as well as new material throughout. The fundamentals of .NET and how the GC and JIT work have not changed, not even in the face of .NET Core, but there have been many improvements, new APIs, and new ways of doing things that necessitated some updating and coverage of new topics.

It was also a great opportunity to incorporate a lot of the feedback I’ve received on the first edition.

Here’s a visualization of the book’s structure. I wrote the book in Scrivener and marked each section with a label depending on whether it was unchanged (green), modified (yellow), or completely new (purple). Click to expand to full size.

This is a good overview, but it doesn’t really convey what is exactly “new”–the newness is spread throughout the book in many forms. Here are some highlights:

Meta-content:

  • 50% increase in content overall (from 68K words to over 100K words, or over 200 new pages).
  • Fixed all known errata (quite a bit more than on that list, honestly).
  • Incorporated feedback from hundreds of readers.
  • A new foreword by Vance Morrison, .NET Performance Architect at Microsoft.
  • Better-looking graphics.
  • Completely new typesetting system to give a more professional look. (You’ll only notice this in print and PDF formats.)
  • Automated build process that can produce new revisions of all formats much easier than before, allowing me to publish updates as needed.
  • Cut the Windows Phone chapter. (No one will miss this, right?)
  • Updated cover.

Content:

  • List of CLR performance improvements over time.
  • Significant increase in examples of using Visual Studio to diagnose CPU usage, heap state, memory allocations, and more, throughout entire book.
  • Introduced a new tool for analyzing .NET processes: Microsoft.Diagnostics.Runtime (“CLR MD”). It’s kind of like having programmatic access to a debugger, and it’s very powerful. I have many examples of its usage throughout the book, tied to specific diagnostic scenarios.
  • Benchmarking! I include actual benchmark results in many sections. Many code samples use a benchmarking library to show the differences in various approaches.
  • More on garbage collection, including advanced configuration options; reworked and more detailed explanations of the GC process including less-known information; discussion of recent libraries to help with pooling; examples of using weak references; object resurrection; stackalloc; more on finalization; more ways to diagnose problems; and much, much more. This was already the biggest chapter, and it got bigger.
  • More details about JIT, how to do custom warmup, how to figure out which code is jitted, and many more improvements.
  • Greatly expanded coverage of TPL, including the TPL Dataflow library, how to use tasks effectively, avoiding lock convoys, and many other smaller improvements to existing topics.
  • More guidance on struct and class design, such as immutability, thread safety, and more. When to use tuples, the new ValueTuple type.
  • Ref-returns and locals and when to use.
  • Much more content on collections, including effectively taking advantage of initial capacity; sorting; key comparisons; jagged vs. multi-dimensional arrays; analyzing the performance of various collection types; and more.
  • Discussion of SIMD commands, including examples and benchmarks.
  • A detailed analysis of LINQ and why it’s more expensive than you think.
  • More ways to consume and process ETW events.
  • A detailed example of building a Roslyn-style Code Analyzer and automatic fixer (FxCop replacement).
  • A new appendix with high-level tips for ADO.NET, ASP.NET, and WPF.
  • And a lot more…

See the book’s web-site for up-to-date details on where to buy, and other news.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Don’t Make This Dumb Locking Mistake

What’s wrong with this code:

try
{
    if (!Monitor.TryEnter(this.lockObj))
    {
        return;
    }
    
    DoWork();
}
finally
{
    Monitor.Exit(this.lockObj);
}

This is a rookie mistake, but sadly I made it the other day when I was fixing a bug in haste. The problem is that the Monitor.Exit in the finally block will try to exit the lock, even if the lock was never taken. This will cause an exception. Use a different TryEnter overload instead:

bool lockTaken = false;
try
{
    Monitor.TryEnter(this.lockObj, ref lockTaken))
    if (!lockTaken)
    {
        return;
    }
    
    DoWork();
}
finally
{
    if (lockTaken)
    {
        Monitor.Exit(this.lockObj);
    }
}

You might be tempted to use the same TryEnter overload we used before and rely on the return value to set lockTaken. That might be fine in this case, but it’s better to use the version shown here. It guarantees that lockTaken is always set, even in cases where an exception is thrown, in which case it will be set to false.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Free Kindle version of Writing High-Performance .NET Code when you buy the print version

BookCover -ThumbAs of last week, when you buy the print version of Writing High-Performance .NET Code from Amazon, you can get the Kindle version for free.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Get Your Thread Synchronization Right the First Time

I was recently debugging a problem that just didn’t make any sense at first.

The code looked like this:

class App
{
    public bool IsRunning {get; set;}
    private Thread houseKeepingThread;
    
    public void Start()
    {
        this.IsRunning = true;
        this.houseKeepingThread = new Thread(ThreadFunc);
        this.houseKeepingThread.Start();
    }
    
    private void ThreadFunc()
    {
        while (this.IsRunning)
        {
            DoWork();
            // wait for 30 seconds
        }
    }
};

The actual code was of course a bit more complicated, but this demonstrates the essence of the problem.

The outward manifestation of the bug was that there was evidence that DoWork wasn’t being called over time as it should have.

To debug this, I first concentrated on reasons that the thread could end early, and none of them made any sense.

To finally figure it out, I attached a debugger to a process known to be in a suspect state and discovered evidence in the state of the App object that DoWork had never run, not even once.

I stared at the code for five seconds and then said out loud: “IsRunning isn’t volatile!”

Despite setting IsRunning to true before starting the thread, it is completely possible that the thread starts running before the memory is coherent across all of the threads.

What’s amazing about this bug is that it existed from the very beginning of this application. As in checkin #1. It has probably been causing problems for a while at some low level, but it recently must have gotten worse—it could be for any reason, some change in the pattern of execution, the load on the system, who knows.

The fix was quite easy:

Make a volatile backing field for the IsRunning property.

By using volatile, there will be a memory barrier that will enforce coherency across all threads that access this variable. Problem solved.

The lesson here isn’t about any particular debugging skills. The real lesson is to make sure you get this right in the first place, because debugging it takes far longer and is much trickier than just reasoning about it in the first place. These types of problems can stay hidden and extremely rare for a very long time until some other innocuous change causes them to start manifesting in strange, seemingly impossible ways.

If you want to learn more about memory barriers and when they’re required, you can check out Chapter 4 of Writing High-Performance .NET Code, and there is also this tutorial by Joseph Albahari.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Announcing Microsoft.IO.RecycableMemoryStream

It is with great pleasure that I announce the latest open source release from Microsoft. This time it’s coming from Bing.

Before explaining what it is and how it works, I have to mention that nearly all of the work for actually getting this setup on GitHub and preparing it for public release was done by by Chip Locke [Twitter | Blog], one of our superstars.

What It Is

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

  • Eliminate Large Object Heap allocations by using pooled buffers
  • Incur far fewer gen 2 GCs, and spend far less time paused due to GC
  • Avoid memory leaks by having a bounded pool size
  • Avoid memory fragmentation
  • Provide excellent debuggability
  • Provide metrics for performance tracking

In my book Writing High-Performance .NET Code, I had this anecdote:

In one application that suffered from too many LOH allocations, we discovered that if we pooled a single type of object, we could eliminate 99% of all problems with the LOH. This was MemoryStream, which we used for serialization and transmitting bits over the network. The actual implementation is more complex than just keeping a queue of MemoryStream objects because of the need to avoid fragmentation, but conceptually, that is exactly what it is. Every time a MemoryStream object was disposed, it was put back in the pool for reuse.

-Writing High-Performance .NET Code, p. 65

The exact code that I’m talking about is what is being released.

How It Works

Here are some more details about the features:

  • A drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics, as close as possible.
  • Rather than pooling the streams themselves, the underlying buffers are pooled. This allows you to use the simple Dispose pattern to release the buffers back to the pool, as well as detect invalid usage patterns (such as reusing a stream after it’s been disposed).
  • Completely thread-safe. That is, the MemoryManager is thread safe. Streams themselves are inherently NOT thread safe.
  • Each stream can be tagged with an identifying string that is used in logging. This can help you find bugs and memory leaks in your code relating to incorrect pool use.
  • Debug features like recording the call stack of the stream allocation to track down pool leaks
  • Maximum free pool size to handle spikes in usage without using too much memory.
  • Flexible and adjustable limits to the pooling algorithm.
  • Metrics tracking and events so that you can see the impact on the system.
  • Multiple internal pools: a default “small” buffer (default of 128 KB) and additional, “large” pools (default: in 1 MB chunks). The pools look kind of like this:

RecylableMemoryStream

In normal operation, only the small pool is used. The stream abstracts away the use of multiple buffers for you. This makes the memory use extremely efficient (much better than MemoryStream’s default doubling of capacity).

The large pool is only used when you need a contiguous byte[] buffer, via a call to GetBuffer or (let’s hope not) ToArray. When this happens, the buffers belonging to the small pool are released and replaced with a single buffer at least as large as what was requested. The size of the objects in the large pool are completely configurable, but if a buffer greater than the maximum size is requested then one will be created (it just won’t be pooled upon Dispose).

Examples

You can jump right in with no fuss by just doing a simple replacement of MemoryStream with something like this:

var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7}; 
var manager = new RecyclableMemoryStreamManager(); 
using (var stream = manager.GetStream()) 
{ 
    stream.Write(sourceBuffer, 0, sourceBuffer.Length); 
}

Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.

To facilitate easier debugging, you can optionally provide a string tag, which serves as a human-readable identifier for the stream. In practice, I’ve usually used something like “ClassName.MethodName” for this, but it can be whatever you want. Each stream also has a GUID to provide absolute identity if needed, but the tag is usually sufficient.

using (var stream = manager.GetStream("Program.Main"))
{
    stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}

You can also provide an existing buffer. It’s important to note that this buffer will be copied into the pooled buffer:

var stream = manager.GetStream("Program.Main", sourceBuffer, 
                                    0, sourceBuffer.Length);

You can also change the parameters of the pool itself:

int blockSize = 1024;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;

var manager = new RecyclableMemoryStreamManager(blockSize, 
                                                largeBufferMultiple, 
                                                maxBufferSize);

manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;

Is this library for everybody? No, definitely not. This library was designed with some specific performance characteristics in mind. Most applications probably don’t need those. However, if they do, then this library can absolutely help reduce the impact of GC on your software.

Let us know what you think! If you find bugs or want to improve it in some way, then dive right into the code on GitHub.

Links


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

2014 Year In Review

Just a recap of the things that have occurred this year. I have been truly blessed, and I have lot to be thankful for. Just a small part of them here…

Writing High-Performance .NET Code was published!

BookCover -Thumb

I spent the first half of the year heads down and working on the book until it was finally published in mid-July.

Since then, I have been amazed at the positive response it has gotten. For more information, you can read more at the book’s site.

One choice review: “It’s engaging, and reads like a technical memoirs of the lessons learned from the genesis of a very large scale .NET service. Readability is great, but the technical content is invaluable…. I personally felt that the chapter on GC alone is easily worth the price of the book and the time to read it….it’s an excellent resource to fully understand what you need to know about GC…. This book does a very good job and breaking down the myths of poor performing managed code, and the optimal strengths managed code has over native…. I gave this book 5 stars for several reasons. First, it delivers on the content and readability the book is selling. The price is also very generous for the content you’re getting.” 5 stars – I. Paradiso

6 Years at Microsoft

I hit 6 years at Microsoft, all in Bing. Still loving it. Google and Facebook seem to get all the press these days, but Microsoft is a truly amazing place to work and I’m looking forward to many more years of great work.

In-Depth Technical Articles

Blog Entries

Some of the highlights of 2014:

Road Trip!

We drove down to northern California and spent a few days among the redwoods forests. Truly amazing.

800_4345

Looking forward to 2015

There is a lot to look forward to in 2015. I’ll continue to blog. My book should see some new international developments that I’m excited about. More road trips are in order… Things on my list to visit with family around the PNW:

  • Mt. St. Helens
  • Leavenworth
  • Cascade loop
  • The Olympics
  • Oregon Coast

Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order: