Category Archives: .NET

Digging Into .NET Object Allocation Fundamentals

[Note: this article also appeared on CodeProject]

Introduction

While understanding garbage collection fundamentals is vital to working with .NET, it is also important to understand how object allocation works. It shows you just how simple and performant it is, especially compared to the potentially blocking nature of native heap allocations. In a large, native, multi-threaded application, heap allocations can be major performance bottleneck which requires you to perform all sorts of custom heap management techniques. It’s also harder to measure when this is happening because many of those details are hidden behind the OS’s allocation APIs. More importantly, understanding this will give you clues to how you can mess up and make object allocation far less efficient.

In this article, I want to go through an example taken from Chapter 2 of Writing High-Performance .NET Code and then take it further with some additional examples that weren’t covered in the book.

Viewing Object Allocation in a Debugger

Let’s start with a simple object definition: completely empty.

class MyObject 
{
}

static void Main(string[] args)
{
    var x = new MyObject();
}

In order to examine what happens during allocation, we need to use a “real” debugger, like Windbg. Don’t be afraid of this. If you need a quick primer on how to get started, look at the free sample chapter on this page, which will get you up and running in no time. It’s not nearly as bad you think.

Build the above program in Release mode for x86 (you can do x64 if you’d like, but the samples below are x86).

In Windbg, follow these steps to start and debug the program:

  1. Ctrl+E to execute a program. Navigate to and open the built executable file.
  2. Run command: sxe ld clrjit (this tells the debugger to break on loading any assembly with clrjit in the name, which you need loaded before the next steps)
  3. Run command: g (continues execution)
  4. When it breaks, run command: .loadby sos clr (loads .NET debugging tools)
  5. Run command: !bpmd ObjectAllocationFundamentals Program.Main (Sets a breakpoint at the beginning of a method. The first argument is the name of the assembly. The second is the name of the method, including the class it is in.)
  6. Run command: g

Execution will break at the beginning of the Main method, right before new() is called. Open the Disassembly window to see the code.

Here is the Main method’s code, annotated for clarity:

; Copy method table pointer for the class into
; ecx as argument to new()
; You can use !dumpmt to examine this value.
mov ecx,006f3864h
; Call new
call 006e2100 
; Copy return value (address of object) into a register
mov edi,eax

Note that the actual addresses will be different each time you execute the program. Step over (F10, or toolbar) a few times until call 006e2100 (or your equivalent) is highlighted. Then Step Into that (F11). Now you will see the primary allocation mechanism in .NET. It’s extremely simple. Essentially, at the end of the current gen0 segment, there is a reserved bit of space which I will call the allocation buffer. If the allocation we’re attempting can fit in there, we can update a couple of values and return immediately without more complicated work.

If I were to outline this in pseudocode, it would look like this:

if (object fits in current allocation buffer)
{
   Increment a pointer, return address;
}
else
{
   call JIT_New to do more complicated work in CLR
}

The actual assembly looks like this:

; Set eax to value 0x0c, the size of the object to
; allocate, which comes from the method table
006e2100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:006f3868=0000000c
; Put allocation buffer information into edx
006e2103 648b15300e0000  mov     edx,dword ptr fs:[0E30h]
; edx+40 contains the address of the next available byte
; for allocation. Add that value to the desired size.
006e210a 034240          add     eax,dword ptr [edx+40h]
; Compare the intended allocation against the
; end of the allocation buffer.
006e210d 3b4244          cmp     eax,dword ptr [edx+44h]
; If we spill over the allocation buffer,
; jump to the slow path
006e2110 7709            ja      006e211b
; update the pointer to the next free
; byte (0x0c bytes past old value)
006e2112 894240          mov     dword ptr [edx+40h],eax
; Subtract the object size from the pointer to
; get to the start of the new obj
006e2115 2b4104          sub     eax,dword ptr [ecx+4]
; Put the method table pointer into the
; first 4 bytes of the object.
; eax now points to new object
006e2118 8908            mov     dword ptr [eax],ecx
; Return to caller
006e211a c3              ret
; Slow Path - call into CLR method
006e211b e914145f71      jmp     clr!JIT_New (71cd3534)

In the fast path, there are only 9 instructions, including the return. That’s incredibly efficient, especially compared to something like malloc. Yes, that complexity is traded for time at the end of object lifetime, but so far, this is looking pretty good!

What happens in the slow path? The short answer is a lot. The following could all happen:

  • A free slot somewhere in gen0 needs to be located
  • A gen0 GC is triggered
  • A full GC is triggered
  • A new memory segment needs to be allocated from the operating system and assigned to the GC heap
  • Objects with finalizers need extra bookkeeping
  • Possibly more…

Another thing to notice is the size of the object: 0x0c (12 decimal) bytes. As covered elsewhere, this is the minimum size for an object in a 32-bit process, even if there are no fields.

Now let’s do the same experiment with an object that has a single int field.

class MyObjectWithInt { int x; }

Follow the same steps as above to get into the allocation code.

The first line of the allocator on my run is:

00882100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:00893874=0000000c

The only interesting thing is that the size of the object (0x0c) is exactly the same as before. The new int field fit into the minimum size. You can see this by examining the object with the !DumpObject command (or the abbreviated version: !do). To get the address of the object after it has been allocated, step over instructions until you get to the ret instruction. The address of the object is now in the eax register, so open up the Registers view and see the value. On my computer, it has a value of 2372770. Now execute the command: !do 2372770

You should see similar output to this:

0:000> !do 2372770
Name:        ConsoleApplication1.MyObjectWithInt
MethodTable: 00893870
EEClass:     008913dc
Size:        12(0xc) bytes
File:        D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70f63b04  4000001        4         System.Int32  1 instance        0 x

This is curious. The field is at offset 4 (and an int has a length of 4), so that only accounts for 8 bytes (range 0-7). Offset 0 (i.e., the object’s address) contains the method table pointer, so where are the other 4 bytes? This is the sync block and they are actually at offset -4 bytes, before the object’s address. These are the 12 bytes.

Try it with a long.

class MyObjectWithLong { long x; }

The first line of the allocator is now:

00f22100 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:00f33874=00000010

Showing a size of 0x10 (decimal 16 bytes), which we would expect now. 12 byte minimum object size, but 4 already in the overhead, so an extra 4 bytes for the 8 byte long. And an examination of the allocated object shows an object size of 16 bytes as well.

0:000> !do 2932770
Name:        ConsoleApplication1.MyObjectWithLong
MethodTable: 00f33870
EEClass:     00f313dc
Size:        16(0x10) bytes
File:        D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70f5b524  4000002        4         System.Int64  1 instance 0 x

If you put an object reference into the test class, you’ll see the same thing as you did with the int.

Finalizers

Now let’s make it more interesting. What happens if the object has a finalizer? You may have heard that objects with finalizers have more overhead during GC. This is true–they will survive longer, require more CPU cycles, and generally cause things to be less efficient. But do finalizers also affect object allocation?

Recall that our Main method above looked like this:

mov ecx,006f3864h
call 006e2100 
mov edi,eax

If the object has a finalizer, however, it looks like this:

mov     ecx,119386Ch
call    clr!JIT_New (71cd3534)
mov     esi,eax

We’ve lost our nifty allocation helper! We have to now jump directly to JIT_New. Allocating an object that has a finalizer is a LOT slower than a normal object. More internal CLR structures need to be modified to track this object’s lifetime. The cost isn’t just at the end of object lifetime.

How much slower is it? In my own testing, it appears to be about 8-10x worse than the fast path of allocating a normal object. If you allocate a lot of objects, this difference is considerable. For this, and other reasons, just don’t add a finalizer unless it really is required.

Calling the Constructor

If you are particularly eagle-eyed, you may have noticed that there was no call to a constructor to initialize the object once allocated. The allocator is changing some pointers, returning you an object, and there is no further function call on that object. This is because memory that belongs to a class field is always pre-initialized to 0 for you and these objects had no further initialization requirements. Let’s see what happens if we change to the following definition:

class MyObjectWithInt { int x = 13; }

Now the Main function looks like this:

mov     ecx,0A43834h
; Allocate memory
call    00a32100
; Copy object address to esi
mov     esi,eax
; Set object + 4 to value 0x0D (13 decimal)
mov     dword ptr [esi+4],0Dh

The field initialization was inlined into the caller!

Note that this code is exactly equivalent:

class MyObjectWithInt { int x; public MyObjectWithInt() { this.x = 13; } }

But what if we do this?

class MyObjectWithInt 
{ 
    int x; 

    [MethodImpl(MethodImplOptions.NoInlining)]  
    public MyObjectWithInt() 
    { 
        this.x = 13; 
    } 
}

This explicitly disables inlining for the object constructor. There are other ways of preventing inlining, but this is the most direct.

Now we can see the call to the constructor happening after the memory allocation:

mov     ecx,0F43834h
call    00f32100
mov     esi,eax
mov     ecx,esi
call    dword ptr ds:[0F43854h]

Exercise for the Reader

Can you get the allocator shown above to jump to the slow path? How big does the allocation request have to be to trigger this? (Hint: Try allocating arrays of various sizes.) Can you figure this out by examining the registers and other values from the running code?

Summary

You can see that in most cases, allocation of objects in .NET is extremely fast and efficient, requiring no calls into the CLR and no complicated algorithms in the simple case. Avoid finalizers unless absolutely needed. Not only are they less efficient during cleanup in a garbage collection, but they are slower to allocate as well.

Play around with the sample code in the debugger to get a feel for this yourself. If you wish to learn more about .NET memory handling, especially garbage collection, take a look at the book Writing High-Performance .NET Code.

Practical uses of WeakReference

In Part 1, I discussed the basics of WeakReference and WeakReference<T>. Part 2 introduced short and long weak references as well as the concept of resurrection. I also covered how to use the debugger to inspect your memory for the presence of weak references. This article will complete this miniseries with a discussion of when to use weak references at all and a small, practical example.

When to Use

Short answer: rarely. Most applications won’t require this.

Long answer: If all of the following criteria are met, then you may want to consider it:

  1. Memory use needs to be tightly restricted – this probably means mobile devices these days. If you’re running on Windows RT or Windows Phone, then your memory is restricted.
  2. Object lifetime is highly variable – if you can predict the lifetime of your objects well, then using WeakReference doesn’t really make sense. In that case, you should just control their lifetime directly.
  3. Objects are relatively large, but easy to create – WeakReference is really ideal for that large object that would be nice to have around, but if not, you could easily regenerate it as needed (or just do without).
  4. The object’s size is significantly more than the overhead of using WeakReference<T> – Using WeakReference<T> adds an additional object, which means more memory pressure, an extra dereference step. It would be a complete waste of time and memory to use WeakReference<T> to store an object that’s barely larger than WeakReference<T> itself. However, there are some caveats to this, below.

There is another scenario in which WeakReference may make sense. I call this the “secondary index” feature. Suppose you have an in-memory cache of objects, all indexed by some key. This could be as simple as Dictionary<string, Person>, for example. This is the primary index, and represents the most common lookup pattern, the master table, if you will.

However, you also want to look up these objects with another key, say a last name. Maybe you want a dozen other indexes. Using standard strong references, you could have additional indexes, such as Dictionary<DateTime, Person> for birthdays, etc. When it comes time to update the cache, you then have to modify all of these indexes to ensure that the Person object gets garbage collected when no longer needed.

This might be a pretty big performance hit to do this every time there is an update. Instead, you could spread that cost around by having all of the secondary indexes use WeakReference instead: Dictionary<DateTime, WeakReference<Person>>, or, if the index has non-unique keys (likely), Dictionary<DateTime, List<WeakReference<Person>>>.

By doing this, the cleanup process becomes much easier: you just update the master cache, which removes the only strong reference to the object. The next time a garbage collection runs (of the appropriate generation), the Person object will be cleaned up. If you ever access a secondary index looking for those objects, you’ll discover the object has been cleaned up, and you can clean up those indexes right then. This spreads out the cost of cleanup of the index overhead, while allowing the expensive cached objects to be cleaned up earlier.

Other Uses

This Stack Overflow thread has some additional thoughts, with some variations of the example below and other uses.

A rather famous and involved example is using WeakReferences to prevent the dangling event handler problem (where failure to unregister an event handler keeps objects in memory, despite them having no explicit references anywhere in your code).

Practical Example

I had mentioned in Chapter 2 (Garbage Collection) of Writing High-Performance .NET Code that WeakReference could be used in a multilevel caching system to allow objects to gracefully fall out of memory when pressure increases. You can start with strong references and then demote them to weak references according to some criteria you choose.

That is the example I’ll show here. Note that this not production-quality code. It’s only about 5% of the code you would actually need, even assuming this algorithm makes sense in your scenario. At a minimum, you probably want to implement IDictionary<TKey, TValue> on it, perhaps tighten up some of the temporary memory allocations, and more.

This is a very simple implementation. When you add items to the cache, it adds them as strong references (removing any existing weak references for that key). When you attempt to read a value from the cache, it tries the strong references first, before attempting the weak references.

Objects are demoted from strong to weak references based simply on a maximum age. This is admittedly rather simplistic, but it gets the point across.

using System; using System.Collections.Concurrent; using System.Collections.Generic; using System.Diagnostics; namespace WeakReferenceCache { sealed class HybridCache<TKey, TValue>

where TValue:class { class ValueContainer<T> { public T value; public long additionTime; public long demoteTime; } private readonly TimeSpan maxAgeBeforeDemotion; private readonly ConcurrentDictionary<TKey, ValueContainer<TValue>> strongReferences = new ConcurrentDictionary<TKey, ValueContainer<TValue>>(); private readonly ConcurrentDictionary<TKey, WeakReference<ValueContainer<TValue>>> weakReferences = new ConcurrentDictionary<TKey, WeakReference<ValueContainer<TValue>>>(); public int Count { get { return this.strongReferences.Count; } } public int WeakCount { get { return this.weakReferences.Count; } } public HybridCache(TimeSpan maxAgeBeforeDemotion) { this.maxAgeBeforeDemotion = maxAgeBeforeDemotion; } public void Add(TKey key, TValue value) { RemoveFromWeak(key); var container = new ValueContainer<TValue>(); container.value = value; container.additionTime = Stopwatch.GetTimestamp(); container.demoteTime = 0; this.strongReferences.AddOrUpdate(key, container, (k, existingValue) => container); } private void RemoveFromWeak(TKey key) { WeakReference<ValueContainer<TValue>> oldValue; weakReferences.TryRemove(key, out oldValue); } public bool TryGetValue(TKey key, out TValue value) { value = null; ValueContainer<TValue> container; if (this.strongReferences.TryGetValue(key, out container)) { AttemptDemotion(key, container); value = container.value; return true; } WeakReference<ValueContainer<TValue>> weakRef; if (this.weakReferences.TryGetValue(key, out weakRef)) { if (weakRef.TryGetTarget(out container)) { value = container.value; return true; } else { RemoveFromWeak(key); } } return false; } public void DemoteOldObjects() { var demotionList = new List<KeyValuePair<TKey, ValueContainer<TValue>>>(); long now = Stopwatch.GetTimestamp(); foreach(var kvp in this.strongReferences) { var age = CalculateTimeSpan(kvp.Value.additionTime, now); if (age > this.maxAgeBeforeDemotion) { demotionList.Add(kvp); } } foreach(var kvp in demotionList) { Demote(kvp.Key, kvp.Value); } } private void AttemptDemotion(TKey key, ValueContainer<TValue> container) { long now = Stopwatch.GetTimestamp(); var age = CalculateTimeSpan(container.additionTime, now); if (age > this.maxAgeBeforeDemotion) { Demote(key, container); } } private void Demote(TKey key, ValueContainer<TValue> container) { ValueContainer<TValue> oldContainer; this.strongReferences.TryRemove(key, out oldContainer); container.demoteTime = Stopwatch.GetTimestamp(); var weakRef = new WeakReference<ValueContainer<TValue>>(container); this.weakReferences.AddOrUpdate(key, weakRef, (k, oldRef) => weakRef); } private TimeSpan CalculateTimeSpan(long offsetA, long offsetB) { long diff = offsetB - offsetA; double seconds = (double)diff / Stopwatch.Frequency; return TimeSpan.FromSeconds(seconds); } } }

That’s it for the series on Weak References–I hope you enjoyed it! You may never need them, but when you do, you should understand how they work in detail to make the smartest decisions.

Short vs. Long Weak References and Object Resurrection

Last time, I talked about the basics of using WeakReference, what they meant and how the CLR treats them. Today in part 2, I’ll discuss some important subtleties. Part 3 of this series can be found here.

Short vs. Long Weak References

First, there are two types of weak references in the CLR:

  • Short – Once the object is reclaimed by garbage collection, the reference is set to null. All of the examples in the previous article, with WeakReference and WeakReference<T>, were examples of short weak references.
  • Long – If the object has a finalizer AND the reference is created with the correct options, then the reference will point to the object until the finalizer completes.

Short weak references are fairly easy to understand. Once the garbage collection happens and the object has been collected, the reference gets set to null, the end. A short weak reference can only be in one of two states: alive or collected.

Using long weak references is more complicated because the object can be in one of three states:

  1. Object is still fully alive (has not been promoted or garbage collected).
  2. Object has been promoted and the finalizer has been queued to run, but has not yet run.
  3. The object has been cleaned up fully and collected.

With long weak references, you can retrieve a reference to the object during stages 1 and 2. Stage 1 is the same as with short weak references, but stage 2 is tricky. Now the object is in a possibly undefined state. Garbage collection has started, and as soon as the finalizer thread starts running pending finalizers, the object will be cleaned up. This can happen at any time, so using the object is very tricky. The weak reference to the target object remains non-null until the target object’s finalizer completes.

To create a long weak reference, use this constructor:

WeakReference<MyObject> myRefWeakLong 
    = new WeakReference<MyObject>(new MyObject(), true);

The true argument specifies that you want to track resurrection. That’s a new term and it is the whole point of long weak references.

Aside: Resurrection

First, let me say this up front: Don’t do this. You don’t need it. Don’t try it. You’ll see why. I don’t know if there is a special reason why resurrection is allowed in .NET, or it’s just a natural consequence of how garbage collection works, but there is no good reason to do something like this.

So here’s what not to do:

class Program
{
    class MyObject
    {
        ~MyObject()
        {
        myObj = this;
        }
    }

    static MyObject myObj = new MyObject();

    static void Main(string[] args)
    {
        myObj = null;
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}

By setting the myObj reference back to an object, you are resurrecting that object. This is bad for a number of reasons:

  • You can only resurrect an object once. Because the object has already been promoted to gen 1 by the garbage collector, it has a guaranteed limited lifetime.
  • The finalizer will not run again, unless you call GC.ReRegisterForFinalize() on the object.
  • The state of the object can be indeterminate. Objects with native resources will have released those resources and they will need to be reinitialized. It can be tricky picking this apart.
  • Any objects that the resurrected object refers to will also be resurrected. If those objects have finalizers they will also have run, leaving you in a questionable state.

So why is this even possible? Some languages consider this a bug, and you should to. Some people use this technique for object pooling, but this is a particularly complex way of doing it, and there are many better ways. You should probably consider object resurrection a bug as well. If you do happen upon a legitimate use case for this, you should be able to fully justify it enough to override all of the objections here.

Weak vs. Strong vs. Finalizer Behavior

There are two dimensions for specifying a WeakReference<T>: the weak reference’s creation parameters and whether the object has a finalizer. The WeakReference’s behavior based on these is described in this table:

  No finalizer Has finalizer
trackResurrection = false short short
trackResurrection = true short long

An interesting case that isn’t explicitly specified in the documentation is when trackResurrection is false, but the object does have a finalizer. When does the WeakReference get set to null? Well, it follows the rules for short weak references and is set to null when the garbage collection happens. Yes, the object does get promoted to gen 1 and the finalizer gets put into the queue. The finalizer can even resurrect the object if it wants, but the point is that the WeakReference isn’t tracking it–because that’s what you said when you created it. WeakReference’s creation parameters do not affect how the garbage collector treats the target object, only what happens to the WeakReference.

You can see this in practice with the following code:

class MyObjectWithFinalizer 
{ 
    ~MyObjectWithFinalizer() 
    { 
        var target = myRefLong.Target as MyObjectWithFinalizer; 
        Console.WriteLine("In finalizer. target == {0}", 
            target == null ? "null" : "non-null"); 
        Console.WriteLine("~MyObjectWithFinalizer"); 
    } 
} 

static WeakReference myRefLong = 
    new WeakReference(new MyObjectWithFinalizer(), true); 

static void Main(string[] args) 
{ 
    GC.Collect(); 
    MyObjectWithFinalizer myObj2 = myRefLong.Target 
          as MyObjectWithFinalizer; 
    
    Console.WriteLine("myObj2 == {0}", 
          myObj2 == null ? "null" : "non-null"); 
    
    GC.Collect(); 
    GC.WaitForPendingFinalizers(); 
    
    myObj2 = myRefLong.Target as MyObjectWithFinalizer; 
    Console.WriteLine("myObj2 == {0}", 
         myObj2 == null ? "null" : "non-null"); 
}

The output is:

myObj2 == non-null 
In finalizer. target == non-null 
~MyObjectWithFinalizer 
myObj2 == null 

Finding Weak References in a Debugger

Windbg can show you how to find where your weak references, both short and long.

Here is some sample code to show you what’s going on:

using System; 
using System.Diagnostics; 

namespace WeakReferenceTest 
{ 
    class Program 
    { 
        class MyObject 
        { 
            ~MyObject() 
            { 
            } 
        } 

        static void Main(string[] args) 
        { 
            var strongRef = new MyObject(); 
            WeakReference<MyObject> weakRef = 
                new WeakReference<MyObject>(strongRef, trackResurrection: false); 
            strongRef = null; 

            Debugger.Break(); 

            GC.Collect(); 

            MyObject retrievedRef; 

            // Following exists to prevent the weak references themselves 
            // from being collected before the debugger breaks 
            if (weakRef.TryGetTarget(out retrievedRef)) 
            { 
                Console.WriteLine(retrievedRef); 
            } 
        } 
    } 
} 

Compile this program in Release mode.

In Windbg, do the following:

  1. Ctrl+E to execute. Browse to the compiled program and open it.
  2. Run command: sxe ld clrjit (this tells the debugger to break when the clrjit.dll file is loaded, which you need before you can execute .loadby)
  3. Run command: g
  4. Run command .loadby sos clr
  5. Run command: g
  6. The program should now break at the Debugger.Break() method.
  7. Run command !gchandles

You should output similar to this:

0:000> !gchandles
  Handle Type          Object     Size     Data Type
011112f4 WeakShort   02d324b4       12          WeakReferenceTest.Program+MyObject
011111d4 Strong      02d31d70       36          System.Security.PermissionSet
011111d8 Strong      02d31238       28          System.SharedStatics
011111dc Strong      02d311c8       84          System.Threading.ThreadAbortException
011111e0 Strong      02d31174       84          System.Threading.ThreadAbortException
011111e4 Strong      02d31120       84          System.ExecutionEngineException
011111e8 Strong      02d310cc       84          System.StackOverflowException
011111ec Strong      02d31078       84          System.OutOfMemoryException
011111f0 Strong      02d31024       84          System.Exception
011111fc Strong      02d3142c      112          System.AppDomain
011113ec Pinned      03d333a8     8176          System.Object[]
011113f0 Pinned      03d32398     4096          System.Object[]
011113f4 Pinned      03d32178      528          System.Object[]
011113f8 Pinned      02d3121c       12          System.Object
011113fc Pinned      03d31020     4424          System.Object[]

Statistics:
      MT    Count    TotalSize Class Name
70e72554        1           12 System.Object
01143814        1           12 WeakReferenceTest.Program+MyObject
70e725a8        1           28 System.SharedStatics
70e72f0c        1           36 System.Security.PermissionSet
70e724d8        1           84 System.ExecutionEngineException
70e72494        1           84 System.StackOverflowException
70e72450        1           84 System.OutOfMemoryException
70e722fc        1           84 System.Exception
70e72624        1          112 System.AppDomain
70e7251c        2          168 System.Threading.ThreadAbortException
70e35738        4        17224 System.Object[]
Total 15 objects

Handles:
    Strong Handles:       9
    Pinned Handles:       5
    Weak Short Handles:   1

The weak short reference is called a “Weak Short Handle” in this output.

Next Time

The first article explained how WeakReference works, and this one explained a few of the subtleties, including some behavior you probably don’t want to use. Next time, I’ll go into why you would want to use WeakReference in the first place, and provide a sample application.

Prefer WeakReference<T> to WeakReference

In Writing High-Performance .NET Code, I mention the WeakReference type briefly in Chapter 2 (Garbage Collection), but I don’t go into it too much. However, for the blog, I want to start a small series of posts going into some more detail about WeakReference, how it works, and when to use it, with some example implementations. In this first post, I’ll just cover what it is, what options there are, and how to use it.

A WeakReference is a reference to an object, but one that still allows the garbage collector to destroy the object and reclaim its memory. This is in contrast to a strong (i.e., normal) reference that does prevent the GC from cleaning up the object.

There are two versions of WeakReference:

First, let’s take a look a WeakReference, which has been around since .NET 1.1.

You allocate a weak reference like this:

var weakRef = new WeakReference(myObj);
myObj = null;

myObj is an existing object of your choice. Once you assign it to the weakRef variable, you should set the original strong reference to null (otherwise, what’s the point?). Now, whenever there is a garbage collection the object weakRef is referring to may be collected. To access this object, you may be tempted to make use WeakReference’s IsAlive property, as in this example:

WeakReference ref1 = new WeakReference(new MyObject());
if (ref1.IsAlive)
{
    // wrong!
    DoSomething(ref1.Target as MyObject);
}

IsAlive is a rather silly property. If it returns false, it’s fine–you know the object has been collected. However, if it returns true, then you don’t actually know anything useful because the object could still be garbage collected before you get a chance to make a strong reference to it!

The correct way to use this is to ignore the IsAlive property completely and just attempt to make a strong reference from Target:

MyObject obj = ref1.Target as MyObject;
if (obj != null)
{
    // correct
    DoSomething(obj);
}

Now there is no race. If obj ends up being non-null, then you’ve got a strong reference that is guaranteed to not be garbage collected (until your own strong reference goes out of scope).

Recognizing some of the silliness and umm…weakness of WeakReference, WeakReference<T> was introduced in .NET 4.5 and it formalizes the above procedure by removing both the Target and IsAlive properties from the class and providing you with these two methods:

  • SetTarget – Set a new target object
  • TryGetTarget – Attempt to retrieve the object and assign it to a strong reference

This example demonstrates the usage, which is essentially the same as the correct way to use WeakReference from above:

WeakReference<MyObject> ref2 = new WeakReference<MyObject>(new MyObject());
MyObject obj2;
if (ref2.TryGetTarget(out obj2))
{
    DoSomething(obj2);
}

You could also ask yourself: Why is there a SetTarget method at all? After all, you could just allocate a new WeakReference<T>.

If you are using WeakReference<T> at all, it likely means you are somewhat memory conscious . In this case, allocating new WeakReference<T> objects will contribute extra, unnecessary memory pressure, potentially worsening the overall performance. Why do this, when you can just reuse the WeakReference<T> object for new targets as needed?

Next time, more details on weak references, particularly the differences between short and long weak references, and taking a peek at them in the debugger. We’ll also cover when you should actually use WeakReference<T> at all.

Part 2, Short vs. Long Weak References and Object Resurrection, is up.

Part 3, Practical Uses, is up.

Using Windbg to answer implementation questions for yourself (Can a Delegate Invocation be Inlined?)

The other day, a colleague of mine asked me: Can a generated delegate be inlined? Or something similar to this. My answer was that the generated code is going to be JITted and optimized like any other code, but later I started thinking…. “Wait a sec, can the actual call to the delegate be inlined?”

I’m going to give you the answer before I even start this article: no.

I cover the rules of method inlining that the JITter uses in my book, Writing High-Performance .NET Code, but I don’t discuss this specific situation. You could logically make the leap, however, that there are two other rules that imply this:

  • Virtual methods will not be inlined
  • Interface calls with multiple concrete implementations in a single call site will not be inlined.

While neither of those rules are delegate-specific, you can infer that a delegate call might have similar constraints. You could ask around on the Internet. Somebody on stackoverflow.com will surely answer you, but I want to show you how to find out the answer to this for yourself, which is an invaluable skill for harder questions, where you might not be able to find out the answer unless you know people on the CLR team (which I do, but I *still* try to find out answers before I bother them).

First, let’s see a test program that will exercise various types of function calls, starting with a simple method call that we would expect to be inlined.

using System;
using System.Runtime.CompilerServices;

namespace DelegateInlining
{
    class Program
    {
        static void Main(string[] args)
        {
            TestNormalFunction();
        }
        
        private static int Add(int x, int y) { return x + y; }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void TestNormalFunction()
        {
            int z = Add(1, 2);
            Console.WriteLine(z);
        }
    }
}

The code we’re interested in inlining is the Add method. Don’t confuse that with the NoInlining option on TestNormalFunction, which is there to prevent the test method itself from being inlined The test method is there to allow breakpoint setting and debugging.

Build this code in Release mode for x86. Then open Windbg.

If you’re not used to using Windbg, I highly encourage you to start. It is far more powerful than Visual Studio’s debugger, especially when it comes to debugging the details of .NET. It is not strictly necessary for this particular exercise, but it is what I recommend.

To get, Windbg, install the Windows SDK—there is the option to install only the debugger if you wish.

In Windbg:

  1. Ctrl-E to open an executable program. Navigate to and open the Release build of the above program. It will start executing and immediately break
  2. Type the command: sxe ld clr. What we want to do is set a breakpoint inside the TestNormalFunction. To do that, we need to use the SOS debugger extension, which relies on clrjit.dll, which hasn’t been loaded in the process yet. So the first thing to do is set a breakpoint on loading clrjit.dll: sxe ld clrjit
  3. Enter the command g for “go” (or hit F5). The program will then break on the load of clrjit.dll.
  4. Enter the command .loadby sos clr – this will load the SOS debugging helper.
  5. Enter the command !bpmd DelegateInlining Program.TestNormalFunction – this will set a managed breakpoint on this method.
  6. Enter the command g to continue execution. Execution will break when it enters TestNormalFunction.
  7. Now you can see the disassembly for this method (menu View | Dissassembly).
00b80068 55              push    ebp
00b80069 8bec            mov     ebp,esp
00b8006b e8e8011b70      call    mscorlib_ni+0x340258 (70d30258)
00b80070 8bc8            mov     ecx,eax
00b80072 ba03000000      mov     edx,3
00b80077 8b01            mov     eax,dword ptr [ecx]
00b80079 8b4038          mov     eax,dword ptr [eax+38h]
00b8007c ff5014          call    dword ptr [eax+14h]
00b8007f 5d              pop     ebp
00b80080 c3              ret

There are some calls there, but none of them are to Add—they are all functions inside of mscorlib. The call to the dword ptr is virtual function call. These are all related to calling Console.WriteLine.

The key is the instruction at address 00b80072, which moves the value 3 directly into register edx. This is the inlined Add call. The compiler inlined not only the function call, but the trivial math as well (an easy optimization the compiler will do for constants).

So far so good. Now let’s look at the same type of thing through a delegate.

delegate int DoOp(int x, int y);

[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestDelegate()
{
    DoOp op = Add;
    int z = op(1, 2);
    Console.WriteLine(z);
}

Change the Main method above to call TestDelegate instead. Follow the same steps given previously for Windbg, but this time set a breakpoint on TestDelegate.

00610077 42              inc     edx
00610078 00e8            add     al,ch
0061007a 8220d0          and     byte ptr [eax],0D0h
0061007d ff8bc88d5104    dec     dword ptr [ebx+4518DC8h]
00610083 e8481b5671      call    clr!JIT_WriteBarrierECX (71b71bd0)
00610088 c7410cc4053304  mov     dword ptr [ecx+0Ch],43305C4h
0061008f b870c04200      mov     eax,42C070h
00610094 894110          mov     dword ptr [ecx+10h],eax
00610097 6a02            push    2
00610099 ba01000000      mov     edx,1
0061009e 8b410c          mov     eax,dword ptr [ecx+0Ch]
006100a1 8b4904          mov     ecx,dword ptr [ecx+4]
006100a4 ffd0            call    eax
006100a6 8bf0            mov     esi,eax
006100a8 e8ab017270      call    mscorlib_ni+0x340258 (70d30258)
006100ad 8bc8            mov     ecx,eax
006100af 8bd6            mov     edx,esi
006100b1 8b01            mov     eax,dword ptr [ecx]
006100b3 8b4038          mov     eax,dword ptr [eax+38h]
006100b6 ff5014          call    dword ptr [eax+14h]
006100b9 5e              pop     esi
006100ba 5d              pop     ebp
006100bb c3              ret

Things got a bit more complicated. As you’ll read in Writing High-Performance .NET Code, assigning a method to a delegate actually results in a memory allocation. That’s fine as long that operation is cached and reused. What we’re really interested in here starts at address 00610097, where you can see the value 2 being pushed onto the stack. The next line moves the value 1 to the edx register. There are our two function arguments. Finally, at address 006100a4, we’ve got another function call, which is the call to Add, and the key to this whole thing becomes clear. The address of that function had to be retrieved via pointer, which means it’s essentially like a virtual method call for the purposes of inlining.

You can also do the same exercise with a lambda expression (it will look similar to the delegate disassembly above).

So there’s the simple answer.

There is one more interesting case: a delegate that calls into method A that calls method B. We already know that method A won’t be inlined, but can method B be inlined into method A?

[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestDelegateWithFunctionCall()
{
    DoOp op = (x, y) => Add(x, y);
    int z = op(1, 2);
    Console.WriteLine(z);
} 

You can do the same analysis as above. You will see the call into the delegate/lambda will not be inlined, but there is no further function call, so yes, Method B can be inlined.

There you have it. Even though, the answer was pretty clear from the start, you at least have the tools to answer it or yourself. Don’t be afraid of the debugger, or of looking at assembly code, even for .NET programs.