MemoryStream | Philosophical Geek

It is with great pleasure that I announce the latest open source release from Microsoft. This time it’s coming from Bing.

Before explaining what it is and how it works, I have to mention that nearly all of the work for actually getting this setup on GitHub and preparing it for public release was done by by Chip Locke [Twitter | Blog], one of our superstars.

What It Is

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

Eliminate Large Object Heap allocations by using pooled buffers
Incur far fewer gen 2 GCs, and spend far less time paused due to GC
Avoid memory leaks by having a bounded pool size
Avoid memory fragmentation
Provide excellent debuggability
Provide metrics for performance tracking

In my book Writing High-Performance .NET Code, I had this anecdote:

In one application that suffered from too many LOH allocations, we discovered that if we pooled a single type of object, we could eliminate 99% of all problems with the LOH. This was MemoryStream, which we used for serialization and transmitting bits over the network. The actual implementation is more complex than just keeping a queue of MemoryStream objects because of the need to avoid fragmentation, but conceptually, that is exactly what it is. Every time a MemoryStream object was disposed, it was put back in the pool for reuse.

-Writing High-Performance .NET Code, p. 65

The exact code that I’m talking about is what is being released.

How It Works

Here are some more details about the features:

A drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics, as close as possible.
Rather than pooling the streams themselves, the underlying buffers are pooled. This allows you to use the simple Dispose pattern to release the buffers back to the pool, as well as detect invalid usage patterns (such as reusing a stream after it’s been disposed).
Completely thread-safe. That is, the MemoryManager is thread safe. Streams themselves are inherently NOT thread safe.
Each stream can be tagged with an identifying string that is used in logging. This can help you find bugs and memory leaks in your code relating to incorrect pool use.
Debug features like recording the call stack of the stream allocation to track down pool leaks
Maximum free pool size to handle spikes in usage without using too much memory.
Flexible and adjustable limits to the pooling algorithm.
Metrics tracking and events so that you can see the impact on the system.
Multiple internal pools: a default “small” buffer (default of 128 KB) and additional, “large” pools (default: in 1 MB chunks). The pools look kind of like this:

In normal operation, only the small pool is used. The stream abstracts away the use of multiple buffers for you. This makes the memory use extremely efficient (much better than MemoryStream’s default doubling of capacity).

The large pool is only used when you need a contiguous byte[] buffer, via a call to GetBuffer or (let’s hope not) ToArray. When this happens, the buffers belonging to the small pool are released and replaced with a single buffer at least as large as what was requested. The size of the objects in the large pool are completely configurable, but if a buffer greater than the maximum size is requested then one will be created (it just won’t be pooled upon Dispose).

Examples

You can jump right in with no fuss by just doing a simple replacement of MemoryStream with something like this:

var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7}; 
var manager = new RecyclableMemoryStreamManager(); 
using (var stream = manager.GetStream()) 
{ 
    stream.Write(sourceBuffer, 0, sourceBuffer.Length); 
}

Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.

To facilitate easier debugging, you can optionally provide a string tag, which serves as a human-readable identifier for the stream. In practice, I’ve usually used something like “ClassName.MethodName” for this, but it can be whatever you want. Each stream also has a GUID to provide absolute identity if needed, but the tag is usually sufficient.

using (var stream = manager.GetStream("Program.Main"))
{
    stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}

You can also provide an existing buffer. It’s important to note that this buffer will be copied into the pooled buffer:

var stream = manager.GetStream("Program.Main", sourceBuffer, 
                                    0, sourceBuffer.Length);

You can also change the parameters of the pool itself:

int blockSize = 1024;
int largeBufferMultiple = 1024 * 1024;
int maxBufferSize = 16 * largeBufferMultiple;

var manager = new RecyclableMemoryStreamManager(blockSize, 
                                                largeBufferMultiple, 
                                                maxBufferSize);

manager.GenerateCallStacks = true;
manager.AggressiveBufferReturn = true;
manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
manager.MaximumFreeSmallPoolBytes = 100 * blockSize;

Is this library for everybody? No, definitely not. This library was designed with some specific performance characteristics in mind. Most applications probably don’t need those. However, if they do, then this library can absolutely help reduce the impact of GC on your software.

Let us know what you think! If you find bugs or want to improve it in some way, then dive right into the code on GitHub.

Links

Microsoft.IO.RecyclableMemoryStream on GitHub
NuGet package
Writing High-Performance .NET Code [Book that will explain the situations in which you may need something like this in a high amount of detail]

A MemoryStream is a useful thing to have around when you want to process an array of bytes as a stream, but there are a few gotchas you need to be aware of, and some alternatives that may be better in a few cases. In Writing High-Performance .NET Code, I mention some situations where you may want to use pooled buffers, but in this article I will talk specifically about using MemoryStream specifically to wrap existing buffers to avoid additional memory allocations.

There are essentially two ways to use a MemoryStream:

MemoryStream creates and manages a resizable buffer for you. You can write to and read from it however you want.
MemoryStream wraps an existing buffer. You can choose how the underlying buffer is treated.

Let’s look at the constructors of MemoryStream and how they lead to one of those situations.

MemoryStream() – The default constructor. MemoryStream owns the buffer and resizes it as needed. The initial capacity (buffer size) is 0.
MemoryStream(int capacity) – Same as default, but initial capacity is what you pass in.
MemoryStream(byte[] buffer) – MemoryStream wraps the given buffer. You can write to it, but not change the size—basically, this buffer is all the space you will have. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, bool writable) – MemoryStream wraps the given buffer, but you you can choose whether to make the stream writable at all. You could make it a pure read-only stream. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count) – Wraps an existing buffer, allowing writes, but allows you to specify an offset (aka origin) into the buffer that the stream will consider position 0. It also allows you to specify how many bytes to use after that origin as part of the stream. The stream is read-only. You cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count, bool writable) – Same as previous, but you can choose whether the stream is read-only. The buffer is still not resizable, and you cannot call GetBuffer() to retrieve the original array.
MemoryStream(byte[] buffer, int index, int count, bool writable, bool exposable)– Same as previous, but now you can specify whether the buffer should be exposed via GetBuffer(). This is the ultimate control you’re given here, but using it comes with an unfortunate catch, which we’ll see later.

Stream-Managed Buffer

If MemoryStream is allowed to manage the buffer size itself (the first two constructors above), then understanding the growth algorithm is important. The algorithm as currently coded looks like this:

If requested buffer size is less than the current size, do nothing.
If requested buffer size is less than 256 bytes, set new size to 256 bytes.
If requested buffer size is less than twice the current buffer size, set the new size to twice the current size.
Otherwise set capacity to exactly what was requested.

Essentially, if you’re not careful, you will start doubling the amount of memory you’re using, which may be overkill for some situations.

Wrapping an Existing Buffer

You would want to wrap an existing buffer in any situation where you have an existing array of bytes and don’t want to needlessly copy them, causing more memory allocations and thus a higher rate of garbage collections. For example, you’ve read a bunch of bytes from the wire via HTTP, you’ve got an existing buffer. You need to pass those bytes to a parser that expects a Stream. So far, so good.

However, there is a gotcha here. To illustrate, let’s see a hypothetical example.

You pull some bytes off the wire. The first few bytes are a standard header that all input has, and can be ignored. For the actual content, you have a parser that expects a Stream. Within that stream, suppose there are a subsections of data that have their own parsers. In other words, you have a buffer that looks something like this:

To deal with this, you wrap a MemoryStream around the content section, like this:

// comes from network 
byte[] buffer = …// Content starts at byte 24
MemoryStream ms = new MemoryStream(buffer, 24, buffer.Length – 24, writable:false, publiclyVisible:true);

So far so good, but what if the parser for the sub-section really needs to operate on the raw bytes instead of as a Stream? This should be possible, right? After all, publiclyVisible was set to true, so you can call GetBuffer(), which returns the original buffer. There’s just one (major) problem: You don’t know where you are in that buffer.

This may sound like a contrived situation, but it’s completely possible and I’ve run into it multiple times.

See, when you wrapped that buffer and told MemoryStream to consider byte 24 as the start, it set a private field called origin to 24. If you set the stream’s Position to 24, the index into the array is set to 24. That’s position 0. Unfortunately, MemoryStream will not surface the origin to you. You can’t even deduce it from other properties like Capacity, Length, and Position. The origin just disappears, which means that the buffer you get back from GetBuffer() is useless. I consider this a bug in the .NET Framework—why have the ability to retrieve the original buffer if you don’t surface the offset being used? It may be that it would be more confusing, with an additional property that many people won’t understand.

There are a few ways you can choose to handle this:

Derive a child class from MemoryStream that doesn’t do anything except mimic the underlying constructors and store this offset itself and make it available via a property.
Pass down the offset separately. Works, but feels very annoying.
Instead of using MemoryStream, use an ArraySegment<byte> and wrap that (or parts of it) as needed in a MemoryStream.

Probably the cleanest option is #1. If you can’t do that, convert the deserialization methods to take ArraySegment<byte>, which wraps a buffer, an offset, and a length into a cheap struct, which you can then pass to all deserialization functions. If you need a Stream, you can easily construct it from the segment:

byte[] buffer = …
var segment = new ArraySegment<byte>(buffer, 24, buffer.Length – 24);
…
var stream = new MemoryStream(segment.Array, segment.Offset, segment.Count);

Essentially, make your master buffer source the ArraySegment, not the Stream.

If you find this kind of tip useful, you will love my book Writing High-Performance .NET Code, which will teach you hundreds of optimizations like this as well as the fundamentals of .NET performance engineering.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Philosophical Geek

Code and musings by Ben Watson

Tag Archives: MemoryStream

Announcing Microsoft.IO.RecycableMemoryStream