Tag Archives: windows

SM_SECURE

In the system metrics that you can retrieve with GetSystemMetrics, one of the available options is SM_SECURE.

The only explanation provided is “Nonzero if security is present; otherwise, 0.” What does that mean? On my computer, the value is 0. There are a few different ways security can be “present” in my mind–what specifically are they talking about here?

Google, MSN, MSDN yield no useful information.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

A Debugging Exercise

A few weeks ago I had an interesting debugging problem. A program we develop had a memory leak in it that Visual Studio was catching when it ended. The trace text was something like this:

Detected memory leaks!
Dumping objects ->
{292300} normal block at 0x05590040, 2771928 bytes long.
Data: <> FF FF FF 00 FF FF FF 00 FF FF FF 00 FF FF FF 00
{292294} normal block at 0x05110040, 2613312 bytes long.
Data: <> FF FF FF 00 FF FF FF 00 FF FF FF 00 FF FF FF 00

Visual Studio 2003 is often pretty good at telling you where a memory leak occured, giving you a source code file and line number of where the memory allocation originated. In the case above, I don’t have that. There are two relatively easy ways of solving this.

The first method is to examine the data that isn’t being freed. In this case, I have two large objects of about 2.5 MB each. That’s pretty big–there aren’t too many (if any) individual objects in the software that are that large. So it could be an array. The data begins with a repeated pattern {FF FF FF 00}. Definitely looks like an array of 4-byte segments. Applying a little knowledge of the application itself–it displays large bitmaps–and we realize that the memory comes from an array of RGB values.

Another way to investigate this is to set a breakpoint on the memory allocation that isn’t being freed. The number in braces {292300} is the memory allocation number in debug mode in Visual C++. In order for this technique to work, you must first ensure that the number is the same each time. In my case, the memory leak would happen if I just opened and closed the program (because an image is always being drawn) without doing anything else.

First, set a breakpoint at the beginning of the program (or any place definitely before the memory leak). Start the program and when it breaks, enter the following into the watch window:

{,,msvcr71d.dll}_crtBreakAlloc

And for the value enter 292300.

If you continue the execution, the program will break on the 292,300th memory allocation. It will stop in the memory allocator, and you can then look at your stack frame and see exactly where the memory allocation is taking place.

MSDN has a great article about this.

In my problem it was for a set of memory blocks being used as working space for resizing and smoothing bitmaps. They were allocated once during execution and thus did not really provide a long-term threat. This leads to a possible third solution:

Ignore. All memory is reclaimed once the program exits. Nobody would ever do that, though…. would they?

…of course not…


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Credibility

One thing I cannot stand that is so prevalent in the computer industry is criticism by people of ideas, products, and technologies that they don’t understand. You see this a lot in the OS wars–especially of Windows, but Linux and Apple are not immune.

In very few cases do people have a well-reasoned and thought out explanation for their feelings. People who bash other ideas for their “religious” reasons are not intelligent–they are freaks who should not be trusted to make good decisions about technology.

Sure, there are horribly bad products out there, but those are mostly ignored and quickly die off. Religious wars start over successful products. A little study and research into the reasons for various design decisions would go a lot towards increasing the intelligence of most of these people.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

On Fixing Computers

As the family’s computer guy, I often get asked to fix, maintain, look at, improve, or otherwise modify my family’s computers over vacations. This Christmas vacation is no exception. While I was at my brother’s house he asked me to look at his computer–remove spyware, make sure everything was running smoothly. I updated Spybot to 1.4, ran Ad Aware, and removed a bogus web search toolbar. And that’s about it. All-in-all, the computer was running smoothly as-is.

A few hours later, I got called to look at it because a new error message was being displayed–by Norton Internet Security. The anti-virus e-mail checker was disabled. I think this is what happened: It turns out that running Spybot had turned on the Windows Firewall, even though Norton had its own.

On the one hand, I probably could have read the descriptions of the problems Spybot had found and realized what it would do. On the other hand, why did Spybot find a problem when there really wasn’t one?

Computer software, including operating systems and the applications we run on them, has gotten far too complex for us to understand the implications of even simple operations.

Everytime we install an application on Windows, it makes changes to the file system, the registry, the start menu, and more. If it’s a system utility, it can modify much more.

And don’t tell me it’s any better on Linux. You have to deal with the potential of mismatched shared libraries every time an application is upgraded.

There a couple of technologies that I see as easing some of these problems:

  • .Net – xcopy deployment (for the most part), easy to use XML config files. Drop in and play for simple apps. Many Linux apps work well like this, why not Windows?
  • System change monitoring – like MS Anti-spyware, it notifies you whenever a system change occurs and lets you stop it before damage is done. What if we had a system that allows you to set what kind of changes you want to be notified about? What if it monitored a much broader set of features than today? This method presupposes a fairly good knowledge of computers. I like MS anti-spyware, but I wonder what people thing when it brings up a window asking whether to continue with a registry change, or run a script. Most people simply dismiss things like that, which is unfortunate.

Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Code Timing functions

When it comes to timing your code, there are a lot of options, and depending on what you need to do, not all of them make sense. There are timing functions which utilize callbacks and other ways to periodically call a designated function at a regular interval–I’m not going to discuss those types of timers.

Instead, I’ll focus on the kinds of timing you can do to assess code performance.

To illustrate the differences between these methods, let’s have a simple C++ program:

[code lang=”cpp”]
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing

cout << “sum: “<

cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

I deliberately chose something simple and fast. Here’s why: if you’re timing something that always takes an hour–a second doesn’t matter, and the first option will work perfectly fine. On the other hand, if you need to time something small that will potentially be repeated dozens, hundreds, thousands, or millions of times, every microsecond can matter.

time_t and CTime

time() is the good old C standby. It’s simple, it’s easy, it’s historic, and it’ll break after 19:14:07, January 18, 2038, UTC.

To remedy this, Microsoft has the _time64() function, which uses the type __time64_t. It improves the lifespan, but not the precision.

Our code is now:
[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
time_t start = time(NULL);

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
time_t end = time(NULL);
cout << “sum: “< long elapsed = static_cast(end – start);
double avg = (double)elapsed / nLimit;
cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10819e+013
16 second(s) total
0 second(s) average iteration

MFC’s CTime is just a wrapper for time_t (VC++6.0) or __time64_t (VC++7), so nothing new here.

time_t is good when you don’t need very precise measurements, when you’re just interested in the date and/or time, as in wall-clock time, or when it’s your only option.

For timing code, this is not a great option.

GetTickCount

Windows contains a function called GetTickCount which returns the number of milliseconds since the machine was turned on. This gives us 1,000 times better accuracy then time().

[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
DWORD start = GetTickCount();

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
DWORD end = GetTickCount();
cout << “sum: “< double elapsed = (end – start) / 1000.0;//ms –> s
double avg = ((double)elapsed)/ nLimit;
cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
0.172 second(s) total
1.72e-008 second(s) average iteration

As seen above, this value is returned in a DWORD, which is 32-bits wide. 232 milliseconds comes out to about 49 days, at which time the counter rolls over to 0. If you’re timing very short intervals, this probably won’t be a problem. The MSDN documentation has a code example to detect timer wrap-around.

GetTickCount provides much more precision than time_t, but with processor speeds at multiple gigahertz, a lot can happen in a millisecond. Enter…

QueryPerformanceCounter and QueryPerformanceFrequency

This set of functions is also available in the Win32 API and they are much more accurate than time_t and GetTickCount.

Their usage is often paired. QueryPerformanceCounter returns the current count on the timer, and QueryPerformanceFrequency returns how many counts per second there are. The documentation makes clear that timer frequency is not the same as processor clock frequency. This timer is hardware-based, however, and if your computer doesn’t have a timing device, then calling QueryPerformanceCounter returns the same thing as GetTickCount above, and QueryPerformanceFrequency returns 1000 (1000 ms = 1 s).

The code reads:

[code lang=”cpp”]
#include
#include
#include

using namespace std;

int main(int argc, char** argv)
{
//begin timing
LARGE_INTEGER start;
::QueryPerformanceCounter(&start);

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
LARGE_INTEGER end;
LARGE_INTEGER countsPerSecond;
::QueryPerformanceCounter(&end);
::QueryPerformanceFrequency(&countsPerSecond);

double elapsed = (double)(end.QuadPart – start.QuadPart) / countsPerSecond.QuadPart;
double avg = elapsed / nLimit;
cout << “sum: “< cout << elapsed << ” second(s) total”< cout << avg << ” second(s) average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
0.165298 second(s) total
1.65298e-008 second(s) average iteration

You can easily see the much improved precision.

OK, well these are all well and good if you have Windows, but what if you don’t have Windows, or the time isn’t as important as the number of cycles required to compute something?

Here’s…

GetMachineCycleCount

…a function to read the current CPU clock cycle. Note that there aren’t any API or standard library functions to do this so we need to use assembly. Note also that this wasn’t possible until the Pentium came along, so don’t run this code on a pre-Pentium computer. 🙂

[code lang=”cpp”]
#include
#include

using namespace std;

inline __int64 GetMachineCycleCount()
{
__int64 cycles; //64-bit int

_asm rdtsc; // won’t work on 486 or below – only pentium or above
_asm lea ebx,cycles; //ebx = &cycles
_asm mov [ebx],eax; //get low-order dword
_asm mov [ebx+4],edx; //get hi-order dword

return cycles;
}

int main(int argc, char** argv)
{
//begin timing

__int64 start = GetMachineCycleCount();

//test code
const int nLimit = 10000000;
double sum = 0;

for (int i=0;i {
double rt = sqrt(static_cast(i));
sum+=rt;
}

//end timing
__int64 end = GetMachineCycleCount();

cout << “sum: “< __int64 elapsed = end – start;
double avg = (double)elapsed / nLimit;
cout << elapsed << ” cycles total”< cout << avg << ” cycles average iteration”<

return 0;
}[/code]

Output:
sum: 2.10818e+010
263631085 cycles total
26.3631 cycles average iteration

What are some situations where this is useful? If you want to make a piece of code absolutely the fastest on any piece of hardware, regardless of clock-time, this can give you a truer reflection of the effectiveness of your algorithm. If you don’t have a high-performance timer API (and you don’t want to write one), then this is a quick-and-dirty, but effective, solution.

So what about discovering the speed of your processor? Well, there’s no machine instruction to do this. Basically, you have to combine the cycle count with the performance timers and run through a lot of loops to calculate an average.

I hope this has helped some people wade through the various options of timing performance-critical code.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Yankee Group vs. Linux Zealots

There is a wonderful little article about the Yankee Group’s reaction to criticism of their surveys.

I’m not as forgiving as Laura DiDio. My entire undergraduate education was heavily Linux-based, and as I saw it, there were very few reasonable opinions. It was either love linux above all else and bash Microsoft with idiotic assertions, or be quiet.

Linux might be cool and fun to geek around with, but I mostly want to get work done. Linux doesn’t do it as well as Microsoft.


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Threads in MFC Part III: Exceptions, Suspense, Murder, and Safety

Exceptions

In the previous tutorial, I described the various synchronization objects you can use to control access to shared objects. In most cases, these will work fine, but consider the following situation:

[code lang=”cpp”]
UINT ThreadFunc(LPVOID lParam)
{
::criticalSection.Lock();
::globalData.DoSomething();
SomeFunctionThatThrowsException();
::criticalSection.Unlock();
return 0;
}[/code]

What’s going to happen when that exception gets thrown? The critical section will never be unlocked. If you start the thread again, it will again try to lock it, and finding it already locked, it will sit there forever waiting. Of course, a mutex will unlock when the thread exits, but a critical section won’t. So MFC has a couple of wrapper classes that can incorporate any of the other basic synchronization classes. These are called CSingleLock and CMultiLock.

Here is how they are used:
[code lang=”cpp”]
UINT ThreadFunc(LPVOID lParam)
{
CSingleLock lock(&(::criticalSection));
lock.Lock();
::globalData.DoSomething();
FunctionThatThrowsException();
lock.Unlock();
return 0;
}
[/code]

You merely pass the address of the “real” synchronization object. CSingleLock lock is created on ThreadFunc’s stack, so when an exception is thrown, and that function exits prematurely without a chance to nicely clean up, CSingleLock’s destructor is called, which unlocks the data. This would not happen to criticalSection because, being a global variable, it will not go out of scope and be destroyed when ThreadFunc exits.

CMultiLock

This class allows you to block, or wait, on up to 64 synchornization objects at once. You create an array of references to the objects, and pass this to the constructor. In addition, you can specify whether you want it to unblock when one object unlocks, or when all of them do.

[code lang=”cpp”]
//let’s pretend these are all global objects, or defined other than in the local function
\tCCriticalSection cs;
CMutex mu;
CEvent ev;
CSemaphore sem[3];

CSynObject* objects[6]={&cs, &mu, &ev,
&sem[0], &sem[1], &sem[2]};
CMultiLock mlock(objects,6);
int result=mlock.Lock(INFINITE, FALSE);

[/code]

Notice you can mix synchronization object types. The two parameters I specified (both optional) specify the time-out period and whether to wait for all the objects to unlock before continuing. I saved the return value of Lock() because that is the index into the array of objects of the one that unblocked, in case I want to do special processing.

Killing a Thread

Generally, murder is very messy. You have blood and guts everywhere that certainly don’t clean up after themselves. But sometimes, sadly, it is necessary (no one call the cops–my metaphor is about to end).

If you start a child thread, and for some reason it is just not exiting when you need it to, and you’ve fixed your code, double-checked all your signaling mechanisms, and then and only then you want to kill it, here’s how. When you create the thread, you need to get its handle and save it for later use in your
class.

[code lang=”cpp”]
HANDLE hThread;//handle to thread
[/code]

A handle is only valid while the thread is running. What if we create a thread, start it off running, and it exits immediately for some reason? Back in our main thread, even if the very next statement after creating the thread is to grab its handle, it could very possibly be too late.

So we create a thread suspended! We just don’t even let it get to first base before we allow ourselves to get to the handle. This is a piece of cake, simply change the last parameter we’ve been giving fxBeginThread() from 0 toCREATE_SUSPENDED:

[code lang=”cpp”]
CWinThread* pThread=AfxBeginThread(ThreadFunc,NULL, THREAD_PRIORITY_NORMAL, 0, CREATE_SUSPENDED);
::DuplicateHandle(GetCurrentProcess(), pThread->m_hThread, GetCurrentProcess(), &hThread, 0, FALSE, DUPLICATE_SAME_ACCESS);
pThread->;
ResumeThread();
[/code]

We start the thread suspended, use an API call to duplicate the thread’s handle, saving it to our class variable, and then resuming the thread.

Then, if we want to commit this heinous crime:

[code lang=”cpp”]::TerminateThread(hThread,0);
[/code]

Don’t say I didn’t warn you.

Thread-Safe Classes

Thread-safety refers to the possibility of calling member functions across thread-boundaries. Their are two types of safety: Class-level and Object-level. Class level means that I can create two CStringT objects called a and b, and access each of them in separate threads, but I cannot safely access just a in
two threads. Object safety means that it’s perfectly ok to access a in two or more threads simultaneously. Thread-safety at the object level generally means using synchronization objects to control access to all internal datamembers. So why not make all classes thread-safe at the object level? Because that would just about kill your performance. You can lock objects yourself outside of the actual object (as shown in Part II) to make it safe.

This is not to say that your program will always crash if you try to access a single object from two threads, but it most likely will. Also, you should not generally lock access to MFC member functions or public variables–you don’t know when the MFC framework is going to need access to them. There really isn’t need to lock on a CWnd* object anyway.

Etc

There are many, many details I have neglected to cover in these three tutorials. You can look in the SDK or .NET documentation for more information on such things as pausing/resuming, scheduling, masks in CMultiLock(), or any of the other member functions of the thread classes. If you want to learn about the internal details of Windows, threads and fibers, (plus a lot of other important subjects) check out Programming Applications for Microsoft Windows by Jeffrey Richter.

I have yet to cover so-called user-interface threads (internally, there is no difference–all threads are created equal). Perhaps in a future tutorial…

Threads are a very powerful tool, but they can quickly increase the complexity of your application by an order of magnitude. Use wisely. As always, it takes some experimentation to get the hang of how to go about it. So have fun!

©2004 Ben Watson


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Threads in MFC II: Synchronization Objects

Introduction

In part I, I looked at getting threads communicating with each other. Now let’s look at how we can manage how multiple threads operate on single objects.

Let’s take an example. Suppose we have a global variable (or any variable that is accessible to two or more threads via scope, pointers, references, whatever). Let’s say this variable object is a CStringArray called stringArray
.

Now, let’s suppose our main thread wants to add something to the array. Fine enough. We can do that. Then, let’s throw in a second thread which can somehow access this object. It, too, wants to access stringArray . What would happen if both threads tried to simultaneously write to the first position in the array for example? Or even if one were just reading and the other writing? Well, if there is no synchronization between the two threads, you don’t know what would happen. The result is completely unpredicatable. One thread would write some bytes to memory, while another reads it, and you could have the correct answer or the wrong answer or a mix. Or it could crash. Who knows…

You can’t even assume safety when merely reading an object from two threads. Even if it seems like no bytes are changing, and both threads should get valid results, you have to think about a lower level: A single C++ statement compiles to many assembly or machine language instructions. These instructions directly access the processor, including the registers that keep track of where we are, what data we’re looking at. It’s possible to have one of those registers hold a pointer to the current character in the string, so if you have two threads that rely on that pointer in that register–they are obviously not both going to be correct except in a very rare circumstance.

OK, I think I’ve made the case. How do we control access to objects then?

Windows has a number of synchronization objects that you can use to effectively prevent accidents. MFC encapsulates these into CEvent , CCriticalSection, CMutex , and CSemaphore . To use these, include afxmt.hin your project.

CEvent

Let’s start with these so-called triggers. An event in this context is nothing more than a flag, a trigger. Imagine it as cocking a gun (Reset) and then firing it (Set). You can use events for setting of threads. Here’s how.
Remember how we created a structure that contained all the data we wanted to send the thread? Let’s add a new one. First create a CEvent object in the dialog (or any window or non-window) class called m_event . Now, in our [code lang=”cpp”]THREADINFOSTRUCT [/code], let’s add a pointer to an event:

[code lang=”cpp”]
typedef struct THREADINFOSTRUCT {

CEvent* pEvent;

} THREADINFOSTRUCT;
[/code]

When we initialize the structure, we must do the assignment:

[code lang=”cpp”]tis->pEvent=&m_event; [/code]

In our thread function, we call:

[code lang=”cpp”]tis->pEvent->Lock(); [/code]

This will “lock” on the event (the same event that is in our dialog class in the main thread). The thread will effectively stop. It will loop inside of CEvent::Lock() until that event is “Set.” Where do you set it? In the main thread. An event is initially reset–cocked. Create the thread. When you want the thread to unblock itself and continue, you call m_event.Set()–fire the gun.

So what are some practical examples? You could lock a thread before you access a global object. In your main thread, when you’re done using that object, you call Set(). You can also use an event to signal a thread to exit (such as if
you hit an abort button in the main thread). To see an example of this usage, look at the demo project I’ve uploaded to the code tool section.

There are two types of events: ones that automatically reset when you set them, and ones that don’t.

You can use a single event to trigger multiple threads, but the event had better be a manual-reset event or only one thread will be triggered at a time.

CCriticalSection

These are pretty simple to use. You simply surround every usage of the shared object by a lock and an unlock command:

[code lang=”cpp”]CCriticalSection cs;

cs.Lock();
stringArray.DoSomething();
cs.Unlock();
[/code]
Do that in every thread that uses that object. You must use the same critical section variable to lock the same object. If a thread tries to lock an object that’s already locked, it will just sit there waiting for it to unlock so it can safely access the object.
CMutex

A mutex works just like a critical section, but it can also work across different processes. But you don’t want to always use mutexes, because they are slower than critical sections.
You can declare a mutex like this:

[code lang=”cpp”]CMutex m_mutex(FALSE, “MyMutex”); [/code]
The first parameter specifies whether or not the mutex is initially locked or not. The second parameter is the identifier of the mutex so it can be accessed from two different processes.

If you lock a critical section in a thread and then the thread exits without unlocking it, then any other thread waiting on it will be forever blocked. Mutexes, however, will unlock automatically if the thread exits. Mutexes can
also have a time-out value (critical sections can too, but there are some doubts as to whether or not they work–perhaps the bugs are fixed in MFC 7.0).

Otherwise, it works the same:

[code lang=”cpp”]
m_muytex.Lock(60000);//time out in milliseconds
stringArray.DoSomething();
m_mutex.Unlock();
[/code]

CSemaphore

A semaphore is used to limit simultaneous access of a resource to a certain number of threads. Most commonly, this resource is a pool of a certain number of limited resources. If we had ten string arrays, we could set up a semaphore to guard them and let only ten threads at a time access them. Or COM ports, internet connections, or anything else.

It’s declared like this:

[code lang=”cpp”]CSemaphore m_semaphore(10,10); [/code]

The first argument is the initial reference count, while the second is the maximum reference count. Each time we lock the semaphore, it will decrement the reference count by 1, until it reaches zero. If another thread tries to lock the semaphore, then it will just go into a holding pattern until a thread unlocks it.

As with a mutex, you can pass it a time-out value.

It’s used with the same syntax:

[code lang=”cpp”]
m_semaphore.Lock(60000);
stringArray.DoSomething();
m_semaphore.Unlock();
[/code]

Conclusion

These two tutorials, along with the sample projects, should be enough to get you started using threads. There are a couple of other MFC objects and issues that I have yet to cover, so I’ll group all of these into Part III of this
tutorial. These topics include exception-handling and thread-safe classes. Make sure that you examine the documentation of all of these classes: there is more functionality than I could cover in this short tutorial. And if you really want to learn threads, get a good book that covers the Windows kernel (one called Programming Applications for MS Windows comes to mind, published by Microsoft).

The sample project for this tutorial has a time object that it shares between two threads. It’s protected by a critical section. There are also two events: for starting the thread and aborting it. The main thread uses a timer to add the current time to a list box every second, while the thread traces the current time to the debug window and sends a message to the main thread to remove the first time from the list. The thread is only started after the time
hits an even ten-second boundary.

©2004 Ben Watson


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order: