Tag Archives: assembler

2-D Arrays versus Structs

I had a situation the other day where I needed an array of two values and the thought occurred to me: which is better? A 2-D array or a 1-D array of structs. I decided to come up with a quick test to see.

Here are my results:

2-D Arrays : 55.9376 cycles/row
Structs : 31.9937 cycles/row

That’s the number of cycles, on average, to add up the numbers in a row. Cycles makes more sense to me than microseconds when we’re talking about this level of code.

It turns out that the struct version is about 74% faster, which confused me slightly. Isn’t the memory layout for both options the same and should thus have the same assembly code?

It turns out…no. Before proceeding, download the test code here. It’s not pretty, but it works. Some of the mess is designed to keep the compiler from optimizing out my test cases. I tested with Visual Studio 2005 beta 2 with default release mode compiler options.

So lets look at two things I discovered:

Here are the addresses of the first two rows (of two elements each) of the two array styles:

sum1: 0x00354860 0x00354868 0x00354878 0x00354880
sum2: 0x026F0020 0x026F0028 0x026F0030 0x026F0038

Take a look at sum2 (the array of structs); each element starts exactly at 8-byte intervals–what we would expect because memory is often aligned on 8-byte boundaries on x86.

But look at sum1; the elements in the first row are 8 bytes apart, but the next row starts 16 bytes later! That means 8 additional bytes are wasted between each row. This means that fewer rows will fit in the CPU’s L1 cache, causing more cache misses, slowing down the program. (Why did this layout occur? I don’t know the answer to that yet)

Second of all, let’s look at the assembly code. This is just for adding the two elements of a row together.

;array addition
mov eax, DWORD PTR _arr$[esp+4] ; 1st operand addr
mov eax, DWORD PTR [eax+ebp*4] ; 2nd operand addr
fld QWORD PTR [eax+8] ; 1st operand
fadd QWORD PTR [eax] ;add 2nd operand
faddp ST(1), ST(0) ;add result to answer

;struct addition
fld QWORD PTR [eax+8];put first operand in register
fadd QWORD PTR [eax-16] ; add other operand
faddp ST(1), ST(0) ;add result to answer

So…the array version has to do two mov’s to fixup the correct addresses.

It seems now that the struct version is slightly more efficient here–both in space and in speed. However, this is really a microcosm of a problem. This kind of thing probably wouldn’t matter unless you were doing a lot of it. Personally, I’d go with the struct more often than not because it would be much easier to update the code to use a 3rd field, for example.

Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Code Security and Typed Assembly Language

Over the summer I’m taking a class called Programming Languages and Security. This is the first time I’ve delved into security at this level before. It’s a seminar style, which means lots of paper reading, and I am going to give two presentations.

My first presentation was this past Thursday. I spoke about typed assembly language and security automata. It was absolutely fascinating, ignoring the formality of proofs, and all the mathematical notations.

The two papers I discussed were:

The TALx86 begins by describing many shortcomings of the Java Virtual Machine Language (bytecode), including such things as:

  • Semantic errors in the bytecode that could have been discovered if a formal model had been used in its design.
  • Difficulty in compiling languages other than Java into bytecode. For example, it’s literally impossible to correctly compile Scheme into bytecode. OK, Scheme is a pretty esoteric language, but…
  • Difficulty even in extending the Java language because of the bytecode limitations
  • Interpretation is slow, and even though JIT is often used these days, that’s not built-in to the VM

My immediate thought on reading this was, “Hey! .Net addresses each and every single one of these points!”

  • The CLR defines a minimal subset of functionality that must be supported by every .Net language–allowing over 40 languages to be compiled to MSIL
  • As a bonus, MSIL is typed (as is Java bytecode)
  • Just-In-Time compilation was designed in from the beginning and generally has superior performance to Java (in my experience)

It also seems that many of the experimental features present in such early research, such as TALx86, has ended up in .Net and compilers these days. Type safety is being pushed lower and lower. Security policies are being implemented into frameworks, operating systems and compilers, and there are other tools that analyze your code for adherence to security best practices.

On platforms such as .Net, type safety is more important because you can have modules written in VB.Net interacting with objects written in C++ or Python, for example. Those languages don’t know about each other’s types, but at the MSIL level you can ensure safety.

If you’d like, a copy of the presentation is available.

Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order: