2-D Arrays versus Structs

I had a situation the other day where I needed an array of two values and the thought occurred to me: which is better? A 2-D array or a 1-D array of structs. I decided to come up with a quick test to see.

Here are my results:

2-D Arrays : 55.9376 cycles/row
Structs : 31.9937 cycles/row

That’s the number of cycles, on average, to add up the numbers in a row. Cycles makes more sense to me than microseconds when we’re talking about this level of code.

It turns out that the struct version is about 74% faster, which confused me slightly. Isn’t the memory layout for both options the same and should thus have the same assembly code?

It turns out…no. Before proceeding, download the test code here. It’s not pretty, but it works. Some of the mess is designed to keep the compiler from optimizing out my test cases. I tested with Visual Studio 2005 beta 2 with default release mode compiler options.

So lets look at two things I discovered:

Here are the addresses of the first two rows (of two elements each) of the two array styles:

sum1: 0x00354860 0x00354868 0x00354878 0x00354880
sum2: 0x026F0020 0x026F0028 0x026F0030 0x026F0038

Take a look at sum2 (the array of structs); each element starts exactly at 8-byte intervals–what we would expect because memory is often aligned on 8-byte boundaries on x86.

But look at sum1; the elements in the first row are 8 bytes apart, but the next row starts 16 bytes later! That means 8 additional bytes are wasted between each row. This means that fewer rows will fit in the CPU’s L1 cache, causing more cache misses, slowing down the program. (Why did this layout occur? I don’t know the answer to that yet)

Second of all, let’s look at the assembly code. This is just for adding the two elements of a row together.

;array addition
mov eax, DWORD PTR _arr$[esp+4] ; 1st operand addr
mov eax, DWORD PTR [eax+ebp*4] ; 2nd operand addr
fld QWORD PTR [eax+8] ; 1st operand
fadd QWORD PTR [eax] ;add 2nd operand
faddp ST(1), ST(0) ;add result to answer

;struct addition
fld QWORD PTR [eax+8];put first operand in register
fadd QWORD PTR [eax-16] ; add other operand
faddp ST(1), ST(0) ;add result to answer

So…the array version has to do two mov’s to fixup the correct addresses.

It seems now that the struct version is slightly more efficient here–both in space and in speed. However, this is really a microcosm of a problem. This kind of thing probably wouldn’t matter unless you were doing a lot of it. Personally, I’d go with the struct more often than not because it would be much easier to update the code to use a 3rd field, for example.