This is an issue I run into constantly at my job.
In fact, I’m sending this link out to everyone in my group here at work.
This is an issue I run into constantly at my job.
In fact, I’m sending this link out to everyone in my group here at work.
The importance of measuring performance changes is a topic that has been covered by others smarter and more experienced than me, but I have a recent simple tale.
I’ve simplified the code quite a bit in order to demonstrate the issue. Suppose I have a wrapper around an image (it has many more attributes):
1: class Picture
3: Image _image;
4: string _path;
6: public Image Photo
10: if (_image==null && !string.IsNullOrEmpty(_path))
12: _image = Bitmap.FromFile(_path);
14: return _image;
I had this and a view that loading about 2,700 of these into a customized ListView control at program startup. On a cold start (where none of the pictures were in the disk’s cache), it would take 27 seconds. Unacceptable.
What to do?
My first thought was to load the pictures asynchronously. I wrapped Bitmap.FromFile() into a function and called it asynchronously. When it was done, it fired an event that percolated up to the top.
Well, I spent about 30 minutes implementing that and ran it–horrible. The list showed up immediately, but it was unusable. The problem? Dumping 2,700 items into the ThreadPool queue is a problem. It doesn’t create 2,700 threads, but it causes enough problems to not be a viable option.
Asynchronicity is still the answer, though. But it’s at a different level. Instead of loading the individual images asynchronously, I skipped loading the images when creating the list control and instead launched a thread to load all the images and update them when done. The list loads in under a second, and the pictures show up little by little after that.
Measure, measure, measure. And pay attention.
One of the key requirements for being able to reliably update software is the confidence that the changes you are making are safe. The amount of confidence required increases with the complexity of the system.
In my day job I work on a real-time messaging system that can have very, very little downtime. As the service grows and sees more traffic, the amount of downtime shrinks. We start to worry now if upgrades take longer than 5 minutes. (It’s almost to the point where we’ll need redundant systems in order to do maintenance).
To upgrade this software, I have to have an awful lot of confidence in the code changes made. Sometimes that confidence varies.
What to do to gain confidence:
Having a set of rules to follow is a fundamental requirement of good software engineering. I’m not going to discuss what the process should be, but you should have one that works well.
Why is this important?
Programmers like order. We like well-defined problems where we can see the end from the beginning. We don’t like haziness, indeterminism, or too many choices.
A process nails down the unknowns–it tells you very specifically what the next thing to do is. A good process leaves no room for doubt.
A good development, testing, and deployment process is the first step to building confidence in what you’re doing.
For my messaging system example, here’s a short summary of what our upgrade process is. We didn’t just come up with it–it evolved as our business grew and the requirements grew with it.
At any point, I know where we are in the process and what needs to be done next. Sure, there may be details within these steps that require thought and creativity, but the process guides it all and makes us more confident that we’re not performing ad-hoc operations.
There are other types of testing, but it all starts at the unit level, with simple tests that exercise your code line by line, function by function, feature by feature. I recently wrote a few thoughts about unit testing. Unit testing is where you can see the overall wellness of your code–you want that green bar!
Without unit testing, how do you know the code you’re writing is doing what it should? do you just run it and push it through its paces? This is highly inefficient for most types of code. You’ll run out of steam before you start getting close to edge cases.
The fact is that automated unit tests are a baseline for confidence in your code. You need to be able demonstrate time and again that your code performs well.
This all presupposes that you are writing good unit tests. If you’re not sure, start studying. I don’t buy the arguments about lulling developers into a false sense of security–sure, that can happen, but having good developers who understand this is a prerequisite.
If you’re not unit testing–what is your basis for confidence in your code?
Code coverage goes hand-in-hand with unit testing as a good way to automatically discover what areas of your program are in need of more testing. I’ve found that one of the biggest barriers to unit testing a large C++ application we have is that we have no way of easily measuring test coverage. If we had time, we could definitely to the analysis ourselves, or we could spend a lot of money to get a C++ instrumentation profiler, but these are slow and very tedious to use in my experience.
In .Net, use the tools to your advantage.
The psychological benefits of seeing 75-, 90-, 95-, even 100-percent coverage are immense. You know that every line of the program has at least been touched.
Of course, most code coverage tools analyze line coverage, not path coverage. Combine complexity analysis with code coverage to determine which functionality should probably have better testing. There are plenty of free and commercial tools that will give you cyclomatic complexity, among other metrics.
Use other analysis tools like FxCop to make sure your other ducks are in a row. It can find easy-to-overlook problems like not validating arguments of public methods, which can then lead to more unit tests and more coverage to achieve.
Take yourself out of the equation as much as possible. The point of a process is to be repeatable–it’s like automating yourself. Not only should unit testing be automated (thankfully, most testing frameworks handle this easily), but so should coverage and quality analyses.
What about deployment? Automate it. Documentation generation? CD master creation? Web upload? E-mail notification? Automate them all. Production builds should be invoked with a single command.
Working on boring, repeatable code? Automate it with code-gen.
The bottom line is: Don’t waste your brain cells on stuff that is highly repeatable, especially when it is prone to mistakes.
Last week, a rather serious bug was discovered in some of our software (not released yet, thankfully, but close). The bug was mine, and I knew exactly what the problem was, but instead of designing a solution by myself, I brought a co-worker into the discussion just to bounce ideas off of. He had great suggestions, and made me think of things I might not necessarily have thought of on my own. We both went over the code and came to a solution that was simple and acceptable to both of us. The confidence level was much higher with this than it would have been otherwise.
This story is repeated daily by programmers throughout the world. Code review is a practice based on the simple notion that there is no one person smart enough to get it correct the first time.
Even if you’re working alone, which I often do, it pays huge dividends to regularly review your code with an eye for finding trouble. If you see any weakness at all, don’t ignore it–fix it. If you’re reviewing your own code, it’s a good idea to wait a bit after the time you wrote it. This gives your brain a chance to forget a little bit about it. Then, if you find you can’t understand it anymore, it’s either too complicated, or (if it fundamentally really is complicated) you need better comments.
Reviewing with other people has more benefit, however. Not everybody thinks the same way about problems. People have different experience, different expertise and focus, and you can’t take advantage of that if you don’t let them teach you. Even if the other people have less expertise than you, it is still beneficial (assuming they have some basic competency that they can bring to the discussion).
Once you let other people tear into your code (nicely, I hope), your confidence can be higher because you can add the confidence other people have in it (once your problems are corrected, of course!)
In the end, one of the best ways to increase your confidence in yourself, your code, and your practices is to have the evidence of repeated experiences behind you. You’re always learning, and that learning contributes to improvements in processes, testing, and your personal coding practices. Once you learn what works, especially during tricky upgrades, you can go into the next trial with increased confidence that you’re doing something right.
Have any other ideas on increasing confidence? Leave them in the comments!
Getting a weird COM Exception with the cryptic ID 0x8055001E?
We’ve been struggling with this problem for over a year now, and we finally have a solution.
We have some critical code that is contacting Exchange server via COM Interop and CDOEX.DLL to read some inboxes and process e-mails. About once a month or so, we get this error:
System.Runtime.InteropServices.COMException (0x8055001E): Unexpected
store error: %1!d! (0x%1!8.8x!)
at ADODB.RecordsetClass.Open(Object Source, Object
ActiveConnection, CursorTypeEnum CursorType, LockTypeEnum LockType,
at MessageService.Exchange.ExchangeClient.Connect(String folderUrl,
String userId, String password, Boolean useHttp)
After this point, restarting our software does not help. The only recourse is to restart the Exchange store completely. Did I mention that our software needs to run 24/7/365 with no downtime (a few minutes here and there are acceptable)?
So about once a month, I get a message on my phone, I log into the server, reboot Exchange, and all is well.
Searching on Google revealed nothing at all. Until recently.
I now believe the problem was we were checking two e-mail accounts back-to-back, in a loop like this (highly simplified):
Apparently, there is some bug in the CDO COM components’ code that will cause errors if you reconnect too fast. Occasionally, the Exchange code must have completed so quickly that it didn’t provide enough time for the COM components to clean up properly before the next solution attempt. Solution?
We implemented that change on a staging server that was also experiencing this problem and haven’t had a single reoccurrence since. The fix will be going into production very soon. No more 2AM alerts!
I’ve been meaning to write about this software for a while. When I started my current job, all software development was done by an outside contractor. I quickly took over, and that necessitated implementing a lot of tools and procedures to handle our large C++ and C# code base.
Choosing Subversion for source control was easy–free, open source, better than VSS and CVS.
Bug tracking software was a little harder. There are a lot of packages out there. I eventually decided on a great little package called BugTracker.Net. It’s written by a gentleman named Corey Trager who does it in his spare time. It’s a very simple system, and doesn’t provide a lot of the heavy-weight features of more complete packages, but if you’re a small team (like I’m in), then it could be perfect. I really appreciate Corey’s web-site, because he acknowledges that it’s not written with every scenario in mind. In fact, he even publicizes comparisons of his system with other popular tracking systems out there.
That said, there is a good degree of customizability in it, and it really was easy to setup, upgrade, configure, and customize.
Some of the features:
Suitable for tracking helpdesk customer support tickets as well as software bugs.
Sending and receiving emails is integrated with the tracker, so that the email thread about a bug is tracked WITH the bug.
Allows incoming emails to be recorded as bugs. So, for example, an email from your customer could automatically be turned into an bug/ticket in the tracker.
Allows you to attach files and screenshots to bugs. There is even a custom screen capture utility [screenshot] that lets you take a screenshot, annotate it, and post it as a bug with just a few clicks. (inspired by Fogbugz)
Add your own custom fields.
Custom bug lists, filtered and sorted the way you want, with the columns that you want.
You can display bugs of a certain priority and/or status in a different color, so that the most important items grab your attention.
Configure different user roles to see different lists of bugs. For example, a developer might see a list of open bugs. A QA analyst might want to see a list of bugs ready for testing.
Like I said, if you’re a small team that just needs to coordinate on issues, this platform could be perfect.
(BTW, this is not a sponsored post–I just want to point out some software that I like).
A colleague at work recently got a second video card–a bottom of the barrel (or close to it) nVidia MX 4000 (PCI). He had an existing AGP nVidia Vanta. Well…the installation did not go well. It did something to Windows so that it consistently blue-screened during the driver load process (the progress bar moving in the startup splash screen).
Windows would start in safe mode, but removing the non-working drivers for the new card did not work. Removing both drivers did not work. Choosing last-known good configuration got us up and running in Windows (finally), but with only the bare VGA driver. Installing a driver from either CD or nVidia’s site ended in the strange error “Access Denied.”
Then I remembered what I had read in Windows Internals about the location of driver configuration information in the registry. Driver info is stored with service configuration in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services.
First we removed all hints of nVidia apps and videos drivers using Add/Remove Programs. Then we went into regedit, into the above key and deleted the keys “nv”, “nv4”, and “nvsvc” (I think they were those, but looking on my own machine at home, they’re a bit different, so I’m half-guessing). I’m sure there are similar keys for ATI chips.
In the meantime, we had found an unused AGP version of the MX 4000 just lying around (no joke), and replaced the Vanta with this. We reinstalled the drivers and everything worked great.
Much is being made lately about vulnerabilities in Mac OS X, and various people are either haughtily dengrating the Mac while others are pooh-poohing the results with bad logic.
All of the ridiculous claims of “My OS is [better | more secure | safer] than your OS” is getting old. All these problems really do is serve to show us that, once again, that there really is no silver bullet in software design.
Yesterday was a golden day. Everything I touched turned to gold. I solved all the problems that came up, fixed bugs right and left, and even figured out the root cause of a bug that’s been plaguing us for a month or so.
Some days are like that. I like days like this, because I feel like I’m on top of the world and that on the one hand, I’m not getting paid enough, but on the other it’s so fun I’d do it for free! (if any of my bosses are reading this, concentrate on the first part of that! 😉
Today was merely a silver day. Thankfully, nothing went wrong, and I did quite a bit of good stuff. Not quite golden. Maybe I should have a calendar and put gold and silver stars on the days. That’s probably a bit much. But I could have rust-covered frown-faces for those unspeakable days.