Using Windbg to answer implementation questions for yourself (Can a Delegate Invocation be Inlined?)

The other day, a colleague of mine asked me: Can a generated delegate be inlined? Or something similar to this. My answer was that the generated code is going to be JITted and optimized like any other code, but later I started thinking…. “Wait a sec, can the actual call to the delegate be inlined?”

I’m going to give you the answer before I even start this article: no.

I cover the rules of method inlining that the JITter uses in my book, Writing High-Performance .NET Code, but I don’t discuss this specific situation. You could logically make the leap, however, that there are two other rules that imply this:

  • Virtual methods will not be inlined
  • Interface calls with multiple concrete implementations in a single call site will not be inlined.

While neither of those rules are delegate-specific, you can infer that a delegate call might have similar constraints. You could ask around on the Internet. Somebody on stackoverflow.com will surely answer you, but I want to show you how to find out the answer to this for yourself, which is an invaluable skill for harder questions, where you might not be able to find out the answer unless you know people on the CLR team (which I do, but I *still* try to find out answers before I bother them).

First, let’s see a test program that will exercise various types of function calls, starting with a simple method call that we would expect to be inlined.

using System;
using System.Runtime.CompilerServices;

namespace DelegateInlining
{
    class Program
    {
        static void Main(string[] args)
        {
            TestNormalFunction();
        }
        
        private static int Add(int x, int y) { return x + y; }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void TestNormalFunction()
        {
            int z = Add(1, 2);
            Console.WriteLine(z);
        }
    }
}

The code we’re interested in inlining is the Add method. Don’t confuse that with the NoInlining option on TestNormalFunction, which is there to prevent the test method itself from being inlined The test method is there to allow breakpoint setting and debugging.

Build this code in Release mode for x86. Then open Windbg.

If you’re not used to using Windbg, I highly encourage you to start. It is far more powerful than Visual Studio’s debugger, especially when it comes to debugging the details of .NET. It is not strictly necessary for this particular exercise, but it is what I recommend.

To get, Windbg, install the Windows SDK—there is the option to install only the debugger if you wish.

In Windbg:

  1. Ctrl-E to open an executable program. Navigate to and open the Release build of the above program. It will start executing and immediately break
  2. Type the command: sxe ld clr. What we want to do is set a breakpoint inside the TestNormalFunction. To do that, we need to use the SOS debugger extension, which relies on clrjit.dll, which hasn’t been loaded in the process yet. So the first thing to do is set a breakpoint on loading clrjit.dll: sxe ld clrjit
  3. Enter the command g for “go” (or hit F5). The program will then break on the load of clrjit.dll.
  4. Enter the command .loadby sos clr – this will load the SOS debugging helper.
  5. Enter the command !bpmd DelegateInlining Program.TestNormalFunction – this will set a managed breakpoint on this method.
  6. Enter the command g to continue execution. Execution will break when it enters TestNormalFunction.
  7. Now you can see the disassembly for this method (menu View | Dissassembly).
00b80068 55              push    ebp
00b80069 8bec            mov     ebp,esp
00b8006b e8e8011b70      call    mscorlib_ni+0x340258 (70d30258)
00b80070 8bc8            mov     ecx,eax
00b80072 ba03000000      mov     edx,3
00b80077 8b01            mov     eax,dword ptr [ecx]
00b80079 8b4038          mov     eax,dword ptr [eax+38h]
00b8007c ff5014          call    dword ptr [eax+14h]
00b8007f 5d              pop     ebp
00b80080 c3              ret

There are some calls there, but none of them are to Add—they are all functions inside of mscorlib. The call to the dword ptr is virtual function call. These are all related to calling Console.WriteLine.

The key is the instruction at address 00b80072, which moves the value 3 directly into register edx. This is the inlined Add call. The compiler inlined not only the function call, but the trivial math as well (an easy optimization the compiler will do for constants).

So far so good. Now let’s look at the same type of thing through a delegate.

delegate int DoOp(int x, int y);

[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestDelegate()
{
    DoOp op = Add;
    int z = op(1, 2);
    Console.WriteLine(z);
}

Change the Main method above to call TestDelegate instead. Follow the same steps given previously for Windbg, but this time set a breakpoint on TestDelegate.

00610077 42              inc     edx
00610078 00e8            add     al,ch
0061007a 8220d0          and     byte ptr [eax],0D0h
0061007d ff8bc88d5104    dec     dword ptr [ebx+4518DC8h]
00610083 e8481b5671      call    clr!JIT_WriteBarrierECX (71b71bd0)
00610088 c7410cc4053304  mov     dword ptr [ecx+0Ch],43305C4h
0061008f b870c04200      mov     eax,42C070h
00610094 894110          mov     dword ptr [ecx+10h],eax
00610097 6a02            push    2
00610099 ba01000000      mov     edx,1
0061009e 8b410c          mov     eax,dword ptr [ecx+0Ch]
006100a1 8b4904          mov     ecx,dword ptr [ecx+4]
006100a4 ffd0            call    eax
006100a6 8bf0            mov     esi,eax
006100a8 e8ab017270      call    mscorlib_ni+0x340258 (70d30258)
006100ad 8bc8            mov     ecx,eax
006100af 8bd6            mov     edx,esi
006100b1 8b01            mov     eax,dword ptr [ecx]
006100b3 8b4038          mov     eax,dword ptr [eax+38h]
006100b6 ff5014          call    dword ptr [eax+14h]
006100b9 5e              pop     esi
006100ba 5d              pop     ebp
006100bb c3              ret

Things got a bit more complicated. As you’ll read in Writing High-Performance .NET Code, assigning a method to a delegate actually results in a memory allocation. That’s fine as long that operation is cached and reused. What we’re really interested in here starts at address 00610097, where you can see the value 2 being pushed onto the stack. The next line moves the value 1 to the edx register. There are our two function arguments. Finally, at address 006100a4, we’ve got another function call, which is the call to Add, and the key to this whole thing becomes clear. The address of that function had to be retrieved via pointer, which means it’s essentially like a virtual method call for the purposes of inlining.

You can also do the same exercise with a lambda expression (it will look similar to the delegate disassembly above).

So there’s the simple answer.

There is one more interesting case: a delegate that calls into method A that calls method B. We already know that method A won’t be inlined, but can method B be inlined into method A?

[MethodImpl(MethodImplOptions.NoInlining)]
private static void TestDelegateWithFunctionCall()
{
    DoOp op = (x, y) => Add(x, y);
    int z = op(1, 2);
    Console.WriteLine(z);
} 

You can do the same analysis as above. You will see the call into the delegate/lambda will not be inlined, but there is no further function call, so yes, Method B can be inlined.

There you have it. Even though, the answer was pretty clear from the start, you at least have the tools to answer it or yourself. Don’t be afraid of the debugger, or of looking at assembly code, even for .NET programs.

2 thoughts on “Using Windbg to answer implementation questions for yourself (Can a Delegate Invocation be Inlined?)

  1. Pingback: Weekly digest 1 | Things I learnt today

  2. Pingback: The Art of Benchmarking | Matt on Software

Comments are closed.