Using MemoryStream to Wrap Existing Buffers: Gotchas and Tips

A MemoryStream is a useful thing to have around when you want to process an array of bytes as a stream, but there are a few gotchas you need to be aware of, and some alternatives that may be better in a few cases. In Writing High-Performance .NET Code, I mention some situations where you may want to use pooled buffers, but in this article I will talk specifically about using MemoryStream specifically to wrap existing buffers to avoid additional memory allocations.

There are essentially two ways to use a MemoryStream:

  1. MemoryStream creates and manages a resizable buffer for you. You can write to and read from it however you want.
  2. MemoryStream wraps an existing buffer. You can choose how the underlying buffer is treated.

Let’s look at the constructors of MemoryStream and how they lead to one of those situations.

  • MemoryStream() – The default constructor. MemoryStream owns the buffer and resizes it as needed. The initial capacity (buffer size) is 0.
  • MemoryStream(int capacity) – Same as default, but initial capacity is what you pass in.
  • MemoryStream(byte[] buffer)MemoryStream wraps the given buffer. You can write to it, but not change the size—basically, this buffer is all the space you will have. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, bool writable)MemoryStream wraps the given buffer, but you you can choose whether to make the stream writable at all. You could make it a pure read-only stream. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count) – Wraps an existing buffer, allowing writes, but allows you to specify an offset (aka origin) into the buffer that the stream will consider position 0. It also allows you to specify how many bytes to use after that origin as part of the stream. The stream is read-only. You cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count, bool writable) – Same as previous, but you can choose whether the stream is read-only. The buffer is still not resizable, and you cannot call GetBuffer() to retrieve the original array.
  • MemoryStream(byte[] buffer, int index, int count, bool writable, bool exposable)– Same as previous, but now you can specify whether the buffer should be exposed via GetBuffer(). This is the ultimate control you’re given here, but using it comes with an unfortunate catch, which we’ll see later.

Stream-Managed Buffer

If MemoryStream is allowed to manage the buffer size itself (the first two constructors above), then understanding the growth algorithm is important. The algorithm as currently coded looks like this:

  1. If requested buffer size is less than the current size, do nothing.
  2. If requested buffer size is less than 256 bytes, set new size to 256 bytes.
  3. If requested buffer size is less than twice the current buffer size, set the new size to twice the current size.
  4. Otherwise set capacity to exactly what was requested.

Essentially, if you’re not careful, you will start doubling the amount of memory you’re using, which may be overkill for some situations.

Wrapping an Existing Buffer

You would want to wrap an existing buffer in any situation where you have an existing array of bytes and don’t want to needlessly copy them, causing more memory allocations and thus a higher rate of garbage collections.  For example, you’ve read a bunch of bytes from the wire via HTTP, you’ve got an existing buffer. You need to pass those bytes to a parser that expects a Stream. So far, so good.

However, there is a gotcha here. To illustrate, let’s see a hypothetical example.

You pull some bytes off the wire. The first few bytes are a standard header that all input has, and can be ignored. For the actual content, you have a parser that expects a Stream. Within that stream, suppose there are a subsections of data that have their own parsers. In other words, you have a buffer that looks something like this:

image

To deal with this, you wrap a MemoryStream around the content section, like this:

// comes from network

byte[] buffer = …

// Content starts at byte 24

MemoryStream ms = new MemoryStream(buffer, 24, buffer.Length – 24, writable:false, publiclyVisible:true);

So far so good, but what if the parser for the sub-section really needs to operate on the raw bytes instead of as a Stream? This should be possible, right? After all, publiclyVisible was set to true, so you can call GetBuffer(), which returns the original buffer. There’s just one (major) problem: You don’t know where you are in that buffer.

This may sound like a contrived situation, but it’s completely possible and I’ve run into it multiple times.

See, when you wrapped that buffer and told MemoryStream to consider byte 24 as the start, it set a private field called origin to 24. If you set the stream’s Position to 24, the index into the array is set to 24. That’s position 0. Unfortunately, MemoryStream will not surface the origin to you. You can’t even deduce it from other properties like Capacity, Length, and Position. The origin just disappears, which means that the buffer you get back from GetBuffer() is useless. I consider this a bug in the .NET Framework—why have the ability to retrieve the original buffer if you don’t surface the offset being used? It may be that it would be more confusing, with an additional property that many people won’t understand.

There are a few ways you can choose to handle this:

  1. Derive a child class from MemoryStream that doesn’t do anything except mimic the underlying constructors and store this offset itself and make it available via a property.
  2. Pass down the offset separately. Works, but feels very annoying.
  3. Instead of using MemoryStream, use an ArraySegment<byte> and wrap that (or parts of it) as needed in a MemoryStream.

Probably the cleanest option is #1. If you can’t do that, convert the deserialization methods to take ArraySegment<byte>, which wraps a buffer, an offset, and a length into a cheap struct, which you can then pass to all deserialization functions. If you need a Stream, you can easily construct it from the segment:

byte[] buffer = …

var segment = new ArraySegment<byte>(buffer, 24, buffer.Length – 24);

var stream = new MemoryStream(segment.Array, segment.Offset, segment.Count);

Essentially, make your master buffer source the ArraySegment, not the Stream.

If you find this kind of tip useful, you will love my book Writing High-Performance .NET Code, which will teach you hundreds of optimizations like this as well as the fundamentals of .NET performance engineering.