List<> vs. ArrayList

I read Rico Mariani’s latest quiz, and decided to check out the results for myself in BRayTracer.

I already have some simple performance benchmark tests in my NUnit tests, so I ran some before and after. I changed only the ArrayList’s used in the scene object to hold shapes, materials, and lights.

Before:

25.197 opaque spheres/sec (time: 3.969 )

After:

31.841 opaque spheres/sec (time: 3.141 )

Pretty impressive savings with minimal work! Almost a full second! In an enormous scene, that will add up to a LOT of time saved. I still have to change the implementation for polygons and polygonal meshes–this will greatly speed up those operations, which are currently fairly slow. In fact, the way I have polygonal meshes written at the moment, they’re nearly impossible to render in a decent amount of time.

More exciting things coming to BRayTracer soon…

Getting NUnit to work on .Net 2.0 and VS Studio 2005 Beta 2 (Whidbey)

I’ve been playing around VS.Net 2005 Beta 2 for a while now and like it so much that I’ve converted the BRayTracer to use it exclusively. Yes, this means you need .Net 2.0 to run all future versions, but it’s worth it.

I ran into the slight problem of not being able to use NUnit 2.2 with the latest .Net, but I found some help via google.

What worked for me specifically, was:

[code lang=”xml”]

[/code]

Of course, look in your C:\WINDOWS\Microsoft.NET\Framework folder to see what versions you have.

Curvy: Part 1

I’ve been wanting to learn COM lately, and not knowing where to start I began in some of the MFC books I have. Admittedly, MFC hides quite a bit of COM complexity, but I figured it would be a good place to get my feet wet without getting scared.

My first test program is called Curvy (Actually, Curvy2 since I restarted it with the doc/view architecture). It’s a curve drawing program. You can use the left mouse button to create points on a curve, the right mouse button closes the curve, and you can change the control points and move the curves.

The fun part was implementing copy & paste. It uses the OLE clipboard and puts both its native format and a bitmap on the clipboard.

The other fun part was implementing OLE Drag & Drop. This took me a few times to get quite right, but the results are spectacular. You can drag shapes within a window as well as to other windows. Holding down the ctrl key will copy the shape instead.

The project is in VS .Net 2005 format, but the code will compile under previous versions of VS.

Click here to download.

Next stop for this? Implement a full OLE server so other programs can host Curvy components inside them!

Mr. Hatch and the Internet

Phil Windley, commentator, author and IT industry expert (and also a former professor and supervisor of mine) has a wonderful little statement about Senator Hatch.

I try to avoid posting politics on this blog, but I have to mostly-agree with Dr. Windley. I’ve long been very wary of Senator Hatch’s dangerous and lopsided proposals. He seems very much ignorant of the technical aspects of these issues and I think that he, frankly, shouldn’t be allowed to touch anything having to with the Internet or computers.

London

Leticia and I want to express our sorrow and empathy for the brave people of London. While used to terror attacks in the past, this was a particularly egregious act of barbarism. May the families of those who lost loved ones be comforted and may those responsible be brought to certain and swift justice.

Checking if a Directory Exists

I recently had to write a utility that moved hundreds of thousands of files to a new location under a different directory organization. As part of it, I checked to see if the destination directory already existed and if not, created it. At one point I wondered if it would just be faster to try and create it, and if it fails, assume that it already exists (remember, I’m dealing with hundreds of thousands of files here–anything to speed it up is very welcome).

Determining if a directory exists isn’t entirely straightforward. If you use .Net, you can use Directory.Exists(), but that function must use the Win32 API at some point and there is no Win32 API that determines the existence of a directory, so what is it doing?

Ah, but there is an API to get the attributes of a given filename.

[code lang=”cpp”]
BOOL DirectoryExists(const char* dirName)
{
DWORD attribs = ::GetFileAttributesA(dirName);
if (attribs == INVALID_FILE_ATTRIBUTES) {
return false;
}
return (attribs & FILE_ATTRIBUTE_DIRECTORY);
}[/code]

Note that if the function call fails it doesn’t necessarily mean that the directory doesn’t exist–it could be that the device is inaccessible, you don’t have security permissions, or any number of other things. To know for sure what’s going on, you would need to call GetLastError().

So what if you’re creating directories? Why not try to create them no matter what? That’s fine, but is that faster than checking to see if it exists first? Let’s test it.

[code lang=”cpp”]
BOOL CreateDirectory(const char* dirName)
{
return ::CreateDirectoryA(dirName,NULL);
}

for (int i=0;i

CreateDirectory(dirName);

}
[/code]

Results (10,000,000 iterations):
265.227 second(s) total
2.65227e-005 second(s) average iteration

Now let’s try checking first:

[code lang=”cpp”]
for (int i=0;i BOOL bExists = DirectoryExists(dirName);
if (!bExists) {
CreateDirectory(dirName);
}
}
[/code]

Results (10 million iterations):
103.24 second(s) total
1.0324e-005 second(s) average iteration

Over 2 .5 times faster!

Now, my simple test is retrying a single folder over and over, and it never actually creates anything. In my case for the utility I mentioned above, I’m creating far fewer directories than the number of files I’m moving to them (though still in the thousands). In that case, it’s definitely worth my time to check to see if the folder exists before trying to create it.

To me, it appears that unless the number of folders you’re creating is of the same magnitude as the number of files, it definitely makes sense to check first.

This goes to show that you can’t believe anything related to performance until you measure it for your application.

Code Timing functions

When it comes to timing your code, there are a lot of options, and depending on what you need to do, not all of them make sense. There are timing functions which utilize callbacks and other ways to periodically call a designated function at a regular interval–I’m not going to discuss those types of timers.

Instead, I’ll focus on the kinds of timing you can do to assess code performance.

To illustrate the differences between these methods, let’s have a simple C++ program:

[code lang=”cpp”]
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing

cout << “sum: “<

cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

I deliberately chose something simple and fast. Here’s why: if you’re timing something that always takes an hour–a second doesn’t matter, and the first option will work perfectly fine. On the other hand, if you need to time something small that will potentially be repeated dozens, hundreds, thousands, or millions of times, every microsecond can matter.

time_t and CTime

time() is the good old C standby. It’s simple, it’s easy, it’s historic, and it’ll break after 19:14:07, January 18, 2038, UTC.

To remedy this, Microsoft has the _time64() function, which uses the type __time64_t. It improves the lifespan, but not the precision.

Our code is now:
[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
time_t start = time(NULL);

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
time_t end = time(NULL);
cout << “sum: “< long elapsed = static_cast(end – start);
double avg = (double)elapsed / nLimit;
cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10819e+013
16 second(s) total
0 second(s) average iteration

MFC’s CTime is just a wrapper for time_t (VC++6.0) or __time64_t (VC++7), so nothing new here.

time_t is good when you don’t need very precise measurements, when you’re just interested in the date and/or time, as in wall-clock time, or when it’s your only option.

For timing code, this is not a great option.

GetTickCount

Windows contains a function called GetTickCount which returns the number of milliseconds since the machine was turned on. This gives us 1,000 times better accuracy then time().

[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
DWORD start = GetTickCount();

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
DWORD end = GetTickCount();
cout << “sum: “< double elapsed = (end – start) / 1000.0;//ms –> s
double avg = ((double)elapsed)/ nLimit;
cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
0.172 second(s) total
1.72e-008 second(s) average iteration

As seen above, this value is returned in a DWORD, which is 32-bits wide. 232 milliseconds comes out to about 49 days, at which time the counter rolls over to 0. If you’re timing very short intervals, this probably won’t be a problem. The MSDN documentation has a code example to detect timer wrap-around.

GetTickCount provides much more precision than time_t, but with processor speeds at multiple gigahertz, a lot can happen in a millisecond. Enter…

QueryPerformanceCounter and QueryPerformanceFrequency

This set of functions is also available in the Win32 API and they are much more accurate than time_t and GetTickCount.

Their usage is often paired. QueryPerformanceCounter returns the current count on the timer, and QueryPerformanceFrequency returns how many counts per second there are. The documentation makes clear that timer frequency is not the same as processor clock frequency. This timer is hardware-based, however, and if your computer doesn’t have a timing device, then calling QueryPerformanceCounter returns the same thing as GetTickCount above, and QueryPerformanceFrequency returns 1000 (1000 ms = 1 s).

The code reads:

[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
LARGE_INTEGER start;
::QueryPerformanceCounter(&start);

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
LARGE_INTEGER end;
LARGE_INTEGER countsPerSecond;
::QueryPerformanceCounter(&end);
::QueryPerformanceFrequency(&countsPerSecond);

double elapsed = (double)(end.QuadPart – start.QuadPart) / countsPerSecond.QuadPart;
double avg = elapsed / nLimit;
cout << “sum: “< cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
0.165298 second(s) total
1.65298e-008 second(s) average iteration

You can easily see the much improved precision.

OK, well these are all well and good if you have Windows, but what if you don’t have Windows, or the time isn’t as important as the number of cycles required to compute something?

Here’s…

GetMachineCycleCount

…a function to read the current CPU clock cycle. Note that there aren’t any API or standard library functions to do this so we need to use assembly. Note also that this wasn’t possible until the Pentium came along, so don’t run this code on a pre-Pentium computer. 🙂

[code lang=”cpp”]
#include
#include

using namespace std;

inline __int64 GetMachineCycleCount()
{
__int64 cycles; //64-bit int

_asm rdtsc; // won’t work on 486 or below – only pentium or above
_asm lea ebx,cycles; //ebx = &cycles
_asm mov [ebx],eax; //get low-order dword
_asm mov [ebx+4],edx; //get hi-order dword

return cycles;
}

int main(int argc, char** argv)
{
//begin timing

__int64 start = GetMachineCycleCount();

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
__int64 end = GetMachineCycleCount();

cout << “sum: “< __int64 elapsed = end – start;
double avg = (double)elapsed / nLimit;
cout << elapsed << ” cycles total”< cout << avg << ” cycles average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
263631085 cycles total
26.3631 cycles average iteration

What are some situations where this is useful? If you want to make a piece of code absolutely the fastest on any piece of hardware, regardless of clock-time, this can give you a truer reflection of the effectiveness of your algorithm. If you don’t have a high-performance timer API (and you don’t want to write one), then this is a quick-and-dirty, but effective, solution.

So what about discovering the speed of your processor? Well, there’s no machine instruction to do this. Basically, you have to combine the cycle count with the performance timers and run through a lot of loops to calculate an average.

I hope this has helped some people wade through the various options of timing performance-critical code.

25 Years

Well, today marks my 25th year of personal reign upon this earth….

I’m still young ‘un. I’m still in grad school, but I’ve got a good job, a wonderful wife, and things are going pretty well with everything else.

One of my favorite presents is a lego set. When I was a kid I played with legos constantly. Whenever I’m done with school and have time I plan to get back into it with legos and model building.

This week also celebrated my dad’s retirement from the U.S. Navy. He has had a long, illustrious and incredibly interesting career spanning the last three decades. I’m intensely proud of him and all his accomplishments. He had to fly into DC to take care of some paperwork, so we got to spend some time with him. I took him to the “other” air and space museum.

Visual Studio 2005 TS Beta 2

I finally received my Visual Studio 2005 Team System Beta 2 in the mail!

It took about 3 weeks to get here. I’m very excited. I plan on converting BRayTracer to use VS 2005.

It’s on track to take about 90 minutes to install. Installation of the compact framework failed, so I’m repairing that component. According to the log, it failed because another installation was in progress.

Features I’m looking forward to?

  • Refactoring – many free IDEs have them, it’s about time VS did too.
  • Improved IntelliSense – I know, IntelliSense is a crutch, but it’s a nice crutch to have–better than memorizing the API for Windows, .Net, MFC, and whatever sub-systems you’re working with.
  • The class designer – the old Add Method/Add Field was too slow and unwieldy to use.

There are a lot of other nice things that are in there. Check out the full list here.

Something I’d like to see that I don’t think is in there: change the type of a variable. I can foresee difficulties in this, but it would be a valuable feature.

[code lang=”cpp”]
float X;
[/code]

could be changed to

[code lang=”cpp”]
Coordinate X;
[/code]

and everywhere that variable is used, all the functions it gets passed to could be changed. As I said, there could be difficulties in this, but it would be powerful functionality.

Cars and the Idiots who Own Them – Part II

Well, I took it in this morning for a full checkup. I should preface this with by saying that for the last month, and especially the last week or so, I’ve been feeling strongly that I needed to take the car in for a serious look. Things just felt different–and not better.

I left it this morning at 8, and at 11 they had a diagnosis–made worse by the fact that my Honda Care extended warranty had run out a few months ago. Basically, new CV joints, new engine mounts, oil change (I just tacked that on), and something else that was fairly expensive.

The mechanic told me that otherwise the car is in great shape. The good news is that this kind of work won’t need to be done for a while.

I should be leaving to pick it up in about 45 minutes or so…