Tag Archives: performance

Don’t Log Exception Stacks Unless You Can Afford It

A short, simple tip for this week: Don’t log exception stacks in managed applications unless you understand the performance of your system.

If you have a standard desktop client, or some other app where you can tolerate many-millisecond disruptions or a spike in CPU usage, then you don’t have to care about this.

If you are building a high-performance managed system, then you definitely do have to care about this.

The system I work on needs to handle exceptions coming from 3rd-party components. We can’t let the exceptions kills the process, and we can’t swallow them, ignoring the component failure, so we need to log them. The question is—what information do we log?

For managed exceptions, there are three properties that are most generally useful: type of the exception, Message, and StackTrace.

Getting the type and the Message are nearly free, but accessing StackTrace or calling ToString() on the exception object will cause a bunch of reflection to happen to build up a user-friendly stack trace string. If you can do without, go for it. It may be possible to augment the Message property of an exception to give some clues to the problem. Usually, however, this will not be possible.

Since getting the stack trace for an exception is relatively expensive, especially for a program that shouldn’t miss a beat and needs to continue running, many of the managed applications that I work on have a configuration setting to be able to turn off stack trace logging for exceptions. This enables us to either run it turned on until we see a performance problem, or keep it off until there is a reproable problem, when we can turn it on selectively.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Activator.CreateInstance: Slow vs. Less-Slow

In the system I’m working on, there is a lot of runtime type resolution, e.g., lines like this:

var obj = Activator.CreateInstance(type, param1, param2, param3)

The params are arguments to the constructor of the type. The type itself is ultimately given through some configuration or otherwise-dynamic means.

Since we do a lot of this (many times per second), we’d like to ensure that instance creation is as fast as it can be. The method given above is not fast.

When this overload of CreateInstance is called, .Net has to do a lot of work to make sure it works: find the right constructor, verify that all the argument types match up, etc. This can be painfully slow, especially when, in a closed system, you can do a lot of offline validation to ensure that types are correctly specified.

Fortunately, Activator has another overload that is quite a bit faster, if you can use it.

Activator.CreateInstance(type)

This will call the default constructor of the type. However, we still have the problem of passing in those arguments. There are a few ways we could do this, but one of the simplest is to ensure that all the types you’re doing this for implement a common initialization interface, as in this sample program:

interface IPerson
{
    void Initialize(string name, DateTime dob, int weight);
}

class Employee : IPerson
{
    public Employee()
    {
    }

    public Employee(string name, DateTime dob, int weight)
    {
        Initialize(name, dob, weight);
    }

    public Employee(string name, DateTime dob)
        : this(name, dob, -1)
    {
    }

    public void Initialize(string name, DateTime dob, int weight)
    {
        this.name = name;
        this.dob = dob;
        this.weight = weight;
    }

    private string name;
    private DateTime dob;
    private int weight;
}

class Program
{
    static void Main(string[] args)
    {
        const int iterations = 1000000;

        Console.WriteLine("Timing Activator.CreateInstance(type, params)...");
        var employees = new IPerson[iterations];

        var stopwatch = Stopwatch.StartNew();

        for (int i = 0; i < iterations; ++i)
        {
            employees[i] = Activator.CreateInstance(typeof(Employee), 
                "Bob", DateTime.Now, 220) as IPerson;
        }

        stopwatch.Stop();

        Console.WriteLine("Total time: {0} for {1:N0} employees", 
            stopwatch.Elapsed.ToString(), iterations);
        Console.WriteLine();

        Console.WriteLine("Timing Activator.CreateInstance(type) + Initialization...");
        
        var employees2 = new IPerson[iterations];

        var stopwatch2 = Stopwatch.StartNew();            

        for (int i = 0; i < iterations; ++i)
        {
            employees2[i] = Activator.CreateInstance(typeof(Employee)) as IPerson;
            employees2[i].Initialize("Bob", DateTime.Now, 220);
        }

        stopwatch2.Stop();

        Console.WriteLine("Total time: {0} for {1:N0} employees", 
            stopwatch2.Elapsed.ToString(), iterations);
    }
}

What do you think the time difference in the two approaches is?

Timing Activator.CreateInstance(type, params)…

Total time: 00:00:05.4765664 for 1,000,000 employees

Timing Activator.CreateInstance(type) + Initialization…

Total time: 00:00:01.6007687 for 1,000,000 employees

About 3.5x faster. Pretty good. In practice, the improvement will vary. In some of our own timing experiments, we actually saw a much better speedup.

Like this tip? Check out my book, C # 4.0 How-To, where you’ll finds hundreds of great tips like this one.

Performance and Other Issues with System.Enum

I’ve been doing a lot of very intense .Net coding at work for the last few months and it’s very gratifying to see that .Net can handle some pretty serious high-load scalable server work. We are doing some pretty amazing things. However, there are all sorts of gotchas—you do have to learn how to do it correctly (but that’s true of any technology).

One of the things we’re most concerned about right now is performance, and where performance is concerned, one of the big gotchas is the .Net Framework Class Library. Don’t get me wrong, for 99.99% of all .Net projects, the performance is perfectly adequate and you don’t have to care (much). On the other hand, if you’re dealing with highly-scalable systems, you should definitely validate the performance of any general-purpose library you use.

As with all performance question, the only 100% solid advice in all situations is: measure, measure, measure.

General Purpose Methods May Not Be Fast

I did a profiler run on one of our components the other day and a few methods showed up that I would never expect to see in a profiler trace:

We were calling these methods a lot. In my mind, they both basically boiled down to logical AND (&) statements. Unfortunately that’s not the case.

If you use an IL decompiler like ILSpy to examine the code of these methods, you’ll see they do quite a bit more. For example, HasFlag also checks whether the types are equal—useful, but that’s a lot more expensive than a simple & check.

IsDefined is even worse. There are half a dozen checks for specific types, followed by an allocation of an array containing all the enum values, and finally a binary search. If your enums are string types, then it’s just a linear search with string comparisons.

It’s easy to understand why these methods are so heavy—they’re being extra cautious for correctness-sake. That’s probably what you want in a general-purpose framework.

However, if your goal is performance and you have a closed system, complete control of your code, and adequate tests, then you can make better decisions.

Rather than do this:

if (bookFlags.HasFlag(BookTypes.Fiction))

Do this instead:

if ((bookFlags & BookTypes.Fiction) != 0)


As for IsDefined, just don’t use it. Or use it in debug-only assertions to validate functionality.

Another Enum method to avoid if you can is ToString(), especially on enumerations that have [Flags].

HasFlag Gotcha

Suppose you have this code:

[Flags]
enum NodeFlags
{
    Exception = 1,
    Failure = 2,
    Canceled = 4,
    TimedOut = 8,
 
    AnyFailure = Exception | Canceled | TimedOut
}
 
if (node.Flags.HasFlag(NodeFlags.Failure)
    || node.Flags.HasFlag(NodeFlags.TimedOut)
    || node.Flags.HasFlag(NodeFlags.Exception))

You may be tempted to do this:

if (node.Flags.HasFlag(NodeFlags.AnyFailure))

This will not work. It will check that ALL of the bits are set, not just one of them.

You will need to do this:

if ((node.Flags & NodeFlags.AnyFailure) != 0)

We saw that around 5% of our CPU time was being spent just in Enum.IsDefined and Enum.HasFlag. With these changes, we reduce that to approximately 0.

If you find tips like these helpful, you’ll love my book, C# 4.0 How-To.

Never make assumptions about performance

The importance of measuring performance changes is a topic that has been covered by others smarter and more experienced than me, but I have a recent simple tale.

I’ve simplified the code quite a bit in order to demonstrate the issue. Suppose I have a wrapper around an image (it has many more attributes):

   1: class Picture
   2: {
   3:     Image _image;
   4:     string _path;
   5: 
   6:     public Image Photo
   7:     {
   8:         get
   9:         {
  10:             if (_image==null && !string.IsNullOrEmpty(_path))
  11:             {
  12:                 _image = Bitmap.FromFile(_path);
  13:             }
  14:             return _image;
  15:         }
  16:     }
  17: 
  18: }

I had this and a view that loading about 2,700 of these into a customized ListView control at program startup. On a cold start (where none of the pictures were in the disk’s cache), it would take 27 seconds. Unacceptable.

What to do?

My first thought was to load the pictures asynchronously. I wrapped Bitmap.FromFile() into a function and called it asynchronously. When it was done, it fired an event that percolated up to the top.

Well, I spent about 30 minutes implementing that and ran it–horrible. The list showed up immediately, but it was unusable. The problem? Dumping 2,700 items into the ThreadPool queue is a problem. It doesn’t create 2,700 threads, but it causes enough problems to not be a viable option.

Asynchronicity is still the answer, though. But it’s at a different level. Instead of loading the individual images asynchronously, I skipped loading the images when creating the list control and instead launched a thread to load all the images and update them when done. The list loads in under a second, and the pictures show up little by little after that.

Measure, measure, measure. And pay attention.

Technorati Tags: ,,

Tip: Mouse back and forward work in Visual Studio 2005 too

You know how you can use the extra mouse buttons to move back and forward in Internet Explorer?

The same shortcuts work in Visual Studio. Suppose you right-click on a function call, and select Go To Definition. Once you’re done looking at the definition, hit the back button on your mouse: You’re taken right back to where you came from!

(The keyboard equivalents are Ctrl+ – and Ctrl+Shift+ -)

Technorati Tags:

Instant Searching and Filtering in .Net – Part 3

This is part three of my series on fast searching and filtering of text using C#.

The previous article developed an indexing method using a hash table. This article develops a method using a trie structure. If you don’t know tries, I highly encourage to go read about them before continuing.

This filtering method is much more complex than previous versions, but we’ll take it one step at a time.

Trie Overview

Our data structure is more complex than a hash table, and definitely more involved than a simple list. We start with a data structure called a trie that contains a list of results, and links to the next nodes, indexed by letter. The key represented by a trie is determined by the path to that trie from the root. The root represents all keys and values (or none, depending on how you look at it). The trie is built up as we index items, beginning with an empty root. An illustration would be helpful about now:

TrieStructure

This diagram shows the top portion of an example trie structure. The root node has no results (since there are no values in the path to that node). The _next variable indexes the next links in the chain. Here, there is only one link–to ‘h’. The node with the value “h” has two further links–to ‘e’ and ‘a’.

So far, so good. However, in our implementation, we limit the depth of this tree to three so the results will potentially contain many entries. For example, if we followed the ‘l’ link in the “he” node, we would get to a node with the value “hel”. If we then indexed the word “hello”, the same node would then contain “hel” and “hello” because we’ve stopped the tree growth here. You can experiment with different values, but I found limited value beyond 3.

Adding Overview

To add a new item, we need to get substrings of the key, like usual, but unlike before, we don’t need to retrieve all substrings, just the longest substrings that aren’t the initial portions of a substring already found. Clear?

No? Here’s an example:

With the previous indexer (using the hash table), the substrings of “hello” would be:

h, he, hel, e, el, ell, l, ll, llo, lo, o

Now, think about our trie structure. Is there any reason to consider ‘h’ if we’re going to consider ‘he’ anyway? Why not just store our value under the results for “he”, and then if we filter on ‘h’ , just return the unified results of the ‘h’ node and every subnode under it. We’ve just cut our memory requirements substantially.

For a trie, we only need substrings of our tree height (or shorter if a longer one doesn’t exist–i.e., at the end of keys). If our maximum keylength/tree-height is 3, then the substrings required for “hello” are now just:

hel, ell, llo, lo, o

Nice.

Once we have these substrings, we can generate the trie, inserting our value (and full key) into the results of the bottom-most trie node we can reach, creating new trie nodes as necessary.

Looking up Overview

To find all values pertaining to a filter text, we start with the root node, and look up the filter text character by character, traversing the trie. If we run out of nodes, then there are no results. IF we run out of characters the results will include all the results in the current node as well as the results in subnodes.*

(*FYI: if we stored all the results in all applicable nodes, i.e., if we stored the value “Hello” in the nodes “hel”, “he” and “h”, we could speedup searches, BUT we’d use a lot more memory…and it would be exactly equivalent to the hash table implementation–we wouldn’t gain anything.)

Data Structures

With that overview out of the way, let’s cover the basic data structures we’ll need to pull this off.

The first is something we’re familiar with, the IndexStruct–which is actually a class like in the previous article:

 1: private class IndexStruct
 2: {
 3: public UInt32 sortOrder;
 4: public string key;
 5: public T val;
 6:  
 7: public IndexStruct(string key, T val, UInt32 sortOrder)
 8: {
 9: this.key = key;
 10: this.val = val;
 11: this.sortOrder = sortOrder;
 12: }
 13: };

This time, the sort order will be important as we’ll see below.

Trie

The next data structure we need is a trie. Technically, a trie is the whole tree, but since you can define a tree as a node with subtrees, it works to define a node too.

Here are the fields and properties we’ll need for the Trie class:

 1: private class Trie
 2: {
 3: private char _val;
 4: private Dictionary<char, Trie> _next = new Dictionary<char, Trie>();
 5: private IList<IndexStruct> _results;
 6:  
 7: #region Properties
 8: public char Value
 9: {
 10: get
 11: {
 12: return _val;
 13: }
 14: set
 15: {
 16: _val = value;
 17: }
 18: }
 19:  
 20: public IList<IndexStruct> Results
 21: {
 22: get
 23: {
 24: return _results;
 25: }
 26: }
 27:  
 28: public Dictionary<char, Trie> Next
 29: {
 30: get
 31: {
 32: return _next;
 33: }
 34: }
 35:  
 36: #endregion
 37:  
 38: //...methods to follow...
 39: }

The _val fields simply refers to the character this trie represents, though we don’t actually need it. I included it just for completeness. The _next dictionary associates the next character with another Trie object. The _results list stores all items that need to be stored in the current Trie object. The properties just wrap the fields.

The constructor is simple:

 1: public Trie(char val)
 2: {
 3: _val = val;
 4: }

We need a function to add a result to the trie’s list of results:

 1: public void AddValue(IndexStruct indexStruct)
 2: {
 3: if (_results == null)
 4: {
 5: _results = new List<IndexStruct>();
 6: }
 7: _results.Add(indexStruct);
 8: }

Notice, that the node only adds results to itself–it doesn’t try to figure out if a result belongs in itself or a sub-node in _next. To do that would require that each node have knowledge about keys as well as its place in the entire tree. This logic is better kept at a higher level.

We also need to associate a new trie node with a character:

 1: public void AddNextTrie(char c, Trie nextTrie)
 2: {
 3: this.Next[c] = nextTrie;
 4: }

OK, now we need to get into something more complex. We need a function to return all the results in the current trie, plus all the results in subnodes. Also, they should all be sorted. We could just return a huge list and sort them at the highest level with something like quicksort, but I have some additional knowledge about the data. I know that it’s being added in sorted order (since that’s how I define it. If it weren’t, we could presort each node anyway once indexing was done.) So let’s assume that each node’s results are already sorted. If that’s the case, we have an ideal setup for a merge sort!

Now, it’s entirely possible that just doing quicksort at the end would be faster, but in my tests at least, the merging seemed a little faster, especially since the results were mostly sorted, which is quicksort’s worst-case scenario.

So let’s define a helper function that takes two lists of IndexStruct and returns a single merged list:

 

 1: private IList<IndexStruct> Merge(IList<IndexStruct> a, IList<IndexStruct> b)
 2: {
 3: if (a == null || a.Count == 0)
 4: {
 5: return b;
 6: }
 7: if (b == null || b.Count == 0)
 8: {
 9: return a;
 10: }
 11: int iA = 0, iB = 0;
 12: List<IndexStruct> results = new List<IndexStruct>(a.Count + b.Count);
 13: while (iA < a.Count || iB < b.Count)
 14: {
 15: if (iA < a.Count && 
 16: (iB == b.Count || a[iA].sortOrder < b[iB].sortOrder))
 17: {
 18: results.Add(a[iA]);
 19: iA++;
 20: }
 21: else if (iA<a.Count && iB<b.Count && 
 22: a[iA].sortOrder == b[iB].sortOrder)
 23: {
 24: //if they're equal, make sure we skip the other 
 25: //one so we don't add it later
 26: results.Add(a[iA]);
 27: iA++;
 28: iB++;
 29: }
 30: else
 31: {
 32: results.Add(b[iB]);
 33: iB++;
 34: }
 35: }
 36: return results;
 37: }

This function interweaves the two arrays into a single array by always grabbing the smallest item from the next positions in the two lists. Look at lines 21-28. This is critical. We need to be on the lookout for identical sortorders, which indicate the same values. We don’t want to include those twice. This situation comes up when our value is “hi-ho” and we filter on “h”–results for both “hi” and “ho” will appear, and we don’t need both.

OK, so we we have a generic merge function. Now let’s make it work on the Trie:

 

 1: public IList<IndexStruct> GetCombinedResults()
 2: {
 3: Queue<IList<IndexStruct>> queue = new Queue<IList<IndexStruct>>();
 4: //first enqueue items in this node
 5: if (_results != null && _results.Count > 0)
 6: {
 7: queue.Enqueue(_results);
 8: }
 9: 
 10: //get items from sub-nodes
 11: foreach (Trie t in _next.Values)
 12: {
 13: IList<IndexStruct> r = t.GetCombinedResults();
 14: queue.Enqueue(r);
 15: }
 16:  
 17: //merge all items together
 18: while (queue.Count > 1)
 19: {
 20: IList<IndexStruct> a = queue.Dequeue();
 21: IList<IndexStruct> b = queue.Dequeue();
 22: queue.Enqueue(Merge(a, b));
 23: }
 24: return queue.Dequeue();
 25: }

 

We maintain the list of items to merge together with a queue. We first enqueue the items in this node, then we call GetCombinedResults() for all subnodes, and enqueue those results. Finally, we merge the queued lists together, enqueueing the result, until a single list is formed.

Phew! OK, now let’s look at the rest of the indexer.

Some Helper Methods

RemoveUnneededCharacters is the same as before:

 

 1: private string RemoveUnneededCharacters(string original)
 2: {
 3: char[] array = new char[original.Length];
 4: int destIndex = 0;
 5: for (int i = 0; i < original.Length; i++)
 6: {
 7: char c = original[i];
 8: if (char.IsLetterOrDigit(c))
 9: {
 10: array[destIndex] = c;
 11: destIndex++;
 12: }
 13: }
 14: return new string(array, 0, destIndex);
 15: }

We talked about our new version of GetSubStrings() above, and here it is:

 

 1: private List<string> GetSubStrings(string key)
 2: {
 3: List<string> results = new List<string>();
 4: //we only need to return substrings that
 5: //themselves don't begin other substrings
 6: //easy to do--return first _maxKeyLength characters
 7: //or whatever's left, if shorter
 8: for (int start = 0; start < key.Length; start++)
 9: {
 10: int len = Math.Min(_maxKeyLength, key.Length - start);
 11: string sub = key.Substring(start, len);
 12: //remove this if() to speed up index creation 
 13: //at the cost of slightly longer lookup time
 14: if (key.IndexOf(sub) == start)
 15: {
 16: results.Add(sub);
 17: }
 18: }
 19: return results;
 20: }

As you can see, it’s mostly comments. Before we add our substring we make sure it’s the first occurence in the string of that particular sequence of characters (“don’t begin other substrings”). This prevents us from indexing “he” twice in the word “hehe”.

TrieIndexer

Now, to the real stuff! Actually, most of our work is done for us. So let’s declare our indexer:

 1: class TrieIndexer<T> : IIndexer<T>
 2: {
 3: private Trie _rootTrie = new Trie(/*null char goes here--HTML doesn't like it/*);
 4: private int _maxKeyLength = 3;
 5:  
 6: public TrieIndexer(int maxKeyLength)
 7: {
 8: _maxKeyLength = maxKeyLength;
 9: }
 10: //..methods to follow...
 11: }

We have our root trie node with a null character, and the constructor takes the maximum key length (which corresponds directly to the height of trie).

AddItem

 

 1: public void AddItem(string key, T value, UInt32 sortOrder)
 2: {
 3: string toAdd = key.ToLower();
 4:  
 5: toAdd = RemoveUnneededCharacters(toAdd);
 6: 
 7: IndexStruct indexStruct = new IndexStruct(toAdd, value, sortOrder);
 8:  
 9: List<string> subStrings = GetSubStrings(toAdd);
 10: 
 11: foreach (string ss in subStrings)
 12: {
 13: Trie currentTrie = _rootTrie;
 14: for (int i = 0; i < ss.Length; i++)
 15: {
 16: char c = ss[i];
 17: Trie nextTrie = null;
 18: if (!currentTrie.Next.TryGetValue(c, out nextTrie))
 19: {
 20: nextTrie = new Trie(c);
 21: currentTrie.AddNextTrie(c, nextTrie);
 22: }
 23: currentTrie = nextTrie;
 24: }
 25: currentTrie.AddValue(indexStruct);
 26: }
 27: }

Lines 3-10 are the typical preprocessing of the key, and creating a structure for it and the value to live. The fun stuff happens in 12-30. With each substring, we begin at the root, and try to branch out node-by-node, character-by-character to find the bottom-most trie in which to place our indexed value. If the next one doesn’t exist, we create it. Once we get to the bottom of the tree for this subkey we add our value to the structure.

Lookup

Now let’s turn our attention to the filtering part. It’s almost like adding new values:

 

 1: public IList<T> Lookup(string filterText)
 2: {
 3: Trie currentTrie = _rootTrie;
 4: 
 5: int maxTrieLength = Math.Min(filterText.Length, _maxKeyLength);
 6: for (int i = 0; i < maxTrieLength; i++)
 7: {
 8: Trie nextTrie = null;
 9: if (currentTrie.Next.TryGetValue(filterText[i], out nextTrie))
 10: {
 11: currentTrie = nextTrie;
 12: }
 13: else
 14: {
 15: //no results
 16: return new List<T>();
 17: }
 18: }
 19: Debug.Assert(currentTrie != null);
 20: 
 21: return GetResults(filterText, currentTrie);
 22: }

We set our current node to the root. We then figure out the length of the path we need to traverse. If our filter text is longer than our maximum key length, we only want to go as far as the key length (and vice versa).

We do the same type of lookups as in AddItem, but this time if the next node isn’t present, we just return an empty list–there were no results for that filter text.

Once we find the target node, we call another function to actually compile the results for us:

 

 1: private IList<T> GetResults(string filterText, Trie trie)
 2: {
 3: List<T> results = new List<T>();
 4:  
 5: IList<IndexStruct> preResults = trie.GetCombinedResults();
 6: if (preResults.Count <= 0)
 7: {
 8: return results;
 9: }
 10: results.Capacity = preResults.Count;
 11: uint prevSortOrder = 0;
 12: foreach (IndexStruct item in preResults)
 13: {
 14: Debug.Assert(item.sortOrder > prevSortOrder);
 15: prevSortOrder = item.sortOrder;
 16:  
 17: if (filterText.Length <= _maxKeyLength || 
 18: item.key.IndexOf(filterText, 
 19: StringComparison.InvariantCultureIgnoreCase) >= 0)
 20: {
 21: results.Add(item.val);
 22: }
 23: }
 24: return results;
 25: }

In line 5, we call GetCombinedResults() and store it in the variable preResults. Why preResults? Why aren’t they final? For one, they are the entire IndexStruct object, and we still need to extract the values. Secondly, it’s possible the user entered a longer filter text than we indexed, so we still have to do a linear walk of the list and do string searches to make sure the preResult is really valid. Thankfully, we can avoid this if the filterText is short.

Summary

And now…we’re done! This one was a beast! There are still some optimizations to be found in here, but it’s pretty good. OK, so what about performance of this thing?

Let’s run it on lesmis.txt first:

testsearch_lesmis_trie

OK, for raw speed it’s actually slower than naive–overall. But look closer. Nearly ALL the time penalty is coming from the lookup of “h”. The next results are over ten times faster than naive. Now, let’s compare to the substring method. The trie method is overall slower still, and the search times are comparable (except for the initial search). But, look at memory usage: the trie method uses less than HALF what the substring method used. Also, the index creation time is 4 times faster. Mixed bag, but impressive none-the-less.

Now let’s run it on huge.txt:

testsearch_huge_trie

Ouch, over a second–but again, it’s all because of that initial search for “h”. All the other times are about the same. Comparing to the substring method, it uses almost a third of the memory, and takes a third of the time to create the index.

So what can we say about this method?

Pros:

  • finding 0 results can be VERY fast (3-4 lookups in hash-tables to determine if a short filter text isn’t present).
  • memory use is much better than enormous hash tables.
  • Index creation time is better than other method.
  • Searching for filter texts longer than a single character can be very fast.

Cons:

  • short filter text is bad–has to combine lots of trie nodes.

(Note: in the course of writing this article, I changed my implementation to no longer need the Finish() function, which was part of the IIndexer<T> interface. I know I promised we’d use it, but I don’t need it!)

Download project files

Next time…

So where do we go from here? I have some notes about implementing this paired with a ListView control. Stay tuned for part 4!

 

Instant Searching and Filtering in .Net – Part 2

This is part two of my series on fast searching/filtering of text using C#.

In the previous article, we developed the filtering interface, built up a testing framework and implemented a naive indexer. For many purposes, that indexer performs more than adequately. Still, there are other possible implementations that might work better (or not…let’s wait and see).

SubString Indexer

In this installment, let’s develop something based on hash tables. With O(1) look-up time, they could be the ticket to blazing fast lookups.

We reuse the same internal structure as the NaiveIndexer, except I’ve changed it to a class. It needs to be reused many, many times for the same value so it’s much more efficient to share the object around instead of make copies:

   1: private class IndexStruct
   2: {
   3:     public string key;
   4:     public T val;
   5:  
   6:     public IndexStruct(string key, T val)
   7:     {
   8:         this.key = key;
   9:         this.val = val;
  10:     }
  11: };

The fields are thus:

   1: private int _maxKeyLength = 999;
   2: private Dictionary<int, List<IndexStruct>> _hashes;

Notice again that we still don’t have to keep track of sort order in the IndexStruct. This is because down at the bottom of the data structure, all the data is still stored in List<> objects. I’m still breaking the law of abstraction by associating sort order to the order in which I add items, but…hey, it’s just an example.

Constructor

Let’s take a look at the constructor and figure out what the maximum key length is for, and a few other issues we have to handle:

   1: public SubStringIndexer(int numItems, int maxKeyLength)
   2: {
   3:     _maxKeyLength = maxKeyLength;
   4:     _hashes = new Dictionary<int, List<IndexStruct>>(numItems);
   5: }

As you can see, the constructor takes two arguments.

The number of items is used to initialize the Dictionary<> ( a hash table class) to the number of items we can expect. If you know you’re hash table theory,  you know that the size of a hash table should ideally be a prime number much larger than the number of items you want to insert. I have not bothered to do that here and experimentation did not show a significant benefit. Also, the hash table will automatically expand itself anyway for a certain load factor.

The maxKeyLength parameter is crucial. Since this indexer works by calculating substrings, it’s important to specify just how big those substrings can be. There’s a tradeoff here. The longer the maximum, the more substrings can be precalculated, and the faster the searches will be. However, you pay an enormous price in memory usage. We’ll see that price below when we run this example. I’ve chosen 3 as a fairly good balance between speed and space.

Helper Functions

Before we override our interface methods, let’s define some helper functions we’ll need. The first is something we’re familiar with:

   1: private string RemoveUnneededCharacters(string original)
   2: {
   3:     char[] array = new char[original.Length];
   4:     int destIndex = 0;
   5:     for (int i = 0; i < original.Length; i++)
   6:     {
   7:         char c = original[i];
   8:         if (char.IsLetterOrDigit(c))
   9:         {
  10:             array[destIndex] = c;
  11:             destIndex++;
  12:         }
  13:     }
  14:     return new string(array, 0, destIndex);
  15: }

Just as with the naive indexer (in fact, with all the indexers), we need to strip out unimportant characters.

One of the most important functions we’ll need is something to generate substrings given a key.

   1: private List<string> GetSubStrings(string key)
   2: {
   3:     List<string> results = new List<string>();
   4:  
   5:     for (int start = 0; start < key.Length; start++)
   6:     {
   7:         /*get maximum length of substring based on current
   8:          * character position (constrain it to within the string
   9:          * and less than or equal to the maximum key length specified
  10:          * */
  11:         int lastLength = Math.Min(key.Length - start, _maxKeyLength);
  12:  
  13:         /* Get each substring from length 1 to lastLength
  14:          */
  15:         for (int length = 1; length <= lastLength; length++)
  16:         {
  17:             string sub = key.Substring(start, length);
  18:             if (!results.Contains(sub))
  19:             {
  20:                 results.Add(sub);
  21:             }
  22:         }
  23:     }
  24:     return results;
  25: }

GetSubStrings returns a list of all substrings of length 1 to _maxKeyLength. If you think about it, you can see why limiting this number to a small number is a good idea. If you have thousands of different keys, each of a fairly sizable length, you will generate thousands and thousands of unique substrings, not to mention how long it will take (a very long time).

Now let’s look at how this indexer works.

Overview

Here’s the way it works. Each substring of a key is converted to a hash number, which is the index into the hash table. The hash of the substring is used instead of the substring itself to avoid storing the substrings in the hash table’s list of keys–just for memory reasons.

The value of each slot in the table is a list of items. Lookup works by first narrowing down the list by doing a hash lookup on the filter text, then doing a linear search through all the items returned from the hash table. We’ll see the details below.

AddItem

   1: public void AddItem(string key, T value, UInt32 sortOrder)
   2: {
   3:     string toAdd = key.ToLower();
   4:  
   5:     toAdd = RemoveUnneededCharacters(toAdd);
   6:  
   7:     List<string> subStrings = GetSubStrings(toAdd);
   8:     IndexStruct indexStruct = new IndexStruct(toAdd, value);
   9:     foreach (string str in subStrings)
  10:     {
  11:         List<IndexStruct> items = null;
  12:         int hash = str.GetHashCode();
  13:  
  14:         bool alreadyExists = _hashes.TryGetValue(hash, out items);
  15:         if (!alreadyExists)
  16:         {
  17:             items = new List<IndexStruct>();
  18:             _hashes[hash] = items;
  19:         }
  20:         items.Add(indexStruct);
  21:     }
  22: }

After normalizing the key (lower-case, alphanumeric), we get the valid substrings. For each of those, we calculate it’s hash and try to look it up in our hash table. If it doesn’t exist, we create a new list for that substring. Then we add the new entry to that list.

Lookup

Lookup is a little more complicated, but still straightforward enough.

   1: public IList<T> Lookup(string subKey)
   2: {
   3:     string toLookup = subKey.ToLower();
   4:     List<IndexStruct> items = null;
   5:     List<T> results = new List<T>();
   6:     int hash = 0;
   7:  
   8:     if (subKey.Length > _maxKeyLength)
   9:     {
  10:         /*
  11:          * If the substring is too long, get the longest substring 
  12:          * we've indexed and use that for the initial search
  13:          */
  14:         toLookup = toLookup.Substring(0, _maxKeyLength);
  15:         hash = toLookup.GetHashCode();
  16:     }
  17:     else
  18:     {
  19:         hash = toLookup.GetHashCode();
  20:     }
  21:  
  22:     bool found = _hashes.TryGetValue(hash, out items);
  23:     if (found)
  24:     {
  25:         results.Capacity = items.Count;
  26:         foreach (IndexStruct s in items)
  27:         {
  28:             /*
  29:              * Have to check each item in this bucket's list
  30:              * because the substring might be longer than the indexed
  31:              * keys
  32:              */
  33:             if (s.key.IndexOf(subKey, StringComparison.InvariantCultureIgnoreCase) >= 0)
  34:             {
  35:                 results.Add(s.val);
  36:             }
  37:         }
  38:     }
  39:     
  40:     return results;
  41: }

We first convert the subKey parameter (what we’re searching on) to lower case to normalize it. If that text is longer than the maximum subkey we’ve indexed, we trim it down to match the maximum size. Then we calculate the hash code and see if it’s in our list. If it isn’t found, there are no results and we return the empty list.

If a list was returned, we still have to go search through the entire list and do string searches to make sure our entire subkey is present in the key before adding it to the result set.

(If we needed to be concerned about sort order, we would do another post-processing step and sort the results list by sortorder.)

Testing

So let’s see how this works in practice by running it against the same lesmis.txt file.

testsearch_capture2

It’s nearly 3 times faster! But at what cost? It now takes about 8 seconds to create the index in the first place, and it uses 31 MB of memory. Ouch!

But I wonder….what if I run this against truly huge data sets…

I create a million-line file out of various books available at the Gutenberg project. First, let’s run the naive indexer:

testsearch_huge_naive

Now the naive way is taking over a second to do the search. What about our new substring indexer?

testsearch_huge_substring

It takes nearly a minute to create the index, but lookups are now almost 5 times faster than the naive version–it definitely scales better. Well, except for that 270 MB index size!

Summary

So, this is an improvement in some ways, but it has some big costs (index creation time, index size). Next time, I’ll show yet another way that has some advantages.

Download sample project.

 

Don’t ignore naive or "stupid" algorithms — hardware is cheap and fast

I just had a nice reality check. Sort of pleasant in that I realized I could save a LOT of memory usage (like from 35MB down to 9 MB), but also aggravating because I have spent probably 10-20 hours developing a clever algorithm designed for speed.

Lesson learned. I should have built the naive version first. Instead, I wrote up two successively more “brilliant” versions that went through all sorts of hoops to get the most speed out of it. Of course, to do this, they took up all sorts of memory with indexes, and the index creation was starting to take about 10 seconds or longer. I should have just built the naive version.

I just wrote the naive version and realized I could have done that in about 5 minutes and saved many hours of tweaking. The component is a type of indexing component, so there were three metrics: index creation time, lookup time, index size. Here’s a rough comparison just to give an idea:

  Clever Algorithm Naive Algorithm
Index Creation Time 10s 0.3s
Lookup Time 0.0001s 0.005s
Index Size 35MB 9 MB
# items

~27,000

Pretty impressive speed numbers aren’t they! That clever algorithm really rocks. And it would be awesome to use if I was doing a lot of searching consecutively, but the searching in my app is tied to the UI, thus to the user, so in reality 0.005 seconds is not that much different than 0.0001 seconds. <sigh>

The numbers above are from my main machine, which is a Core 2 Duo. Just to be safe I tested the naive algorithm on my 4-year-old Pentium 4 laptop to validate that it still has acceptable performance on an older machine. The creation takes 0.05 seconds, but lookup time isn’t much slower, if at all.

And 9MB index is MUCH better than a 35MB index.

In summary, lessons learned:

  1. Hardware is cheap and fast. Don’t waste time optimizing for speed if you don’t have to. While there are signs the raw speed of a processor is plateauing as multiple cores become more important, in general, speed is always increasing.
  2. If you’re running something when a user inputs something, speed isn’t critical (as long as you have it faster than human response time)
  3. Every application is different, so measure and think critically. If my app needed to run the search 100 times per second, the clever algorithm would definitely be better.
  4. There is almost always a tradeoff between speed and size. Which is more important depends on the app.
  5. Write the dumb algorithm first. It might be good enough and you’ll save yourself hours of development and debugging time.

Technorati Tags: , , , , ,

2-D Arrays versus Structs

I had a situation the other day where I needed an array of two values and the thought occurred to me: which is better? A 2-D array or a 1-D array of structs. I decided to come up with a quick test to see.

Here are my results:

2-D Arrays : 55.9376 cycles/row
Structs : 31.9937 cycles/row

That’s the number of cycles, on average, to add up the numbers in a row. Cycles makes more sense to me than microseconds when we’re talking about this level of code.

It turns out that the struct version is about 74% faster, which confused me slightly. Isn’t the memory layout for both options the same and should thus have the same assembly code?

It turns out…no. Before proceeding, download the test code here. It’s not pretty, but it works. Some of the mess is designed to keep the compiler from optimizing out my test cases. I tested with Visual Studio 2005 beta 2 with default release mode compiler options.

So lets look at two things I discovered:

Here are the addresses of the first two rows (of two elements each) of the two array styles:

sum1: 0x00354860 0x00354868 0x00354878 0x00354880
sum2: 0x026F0020 0x026F0028 0x026F0030 0x026F0038

Take a look at sum2 (the array of structs); each element starts exactly at 8-byte intervals–what we would expect because memory is often aligned on 8-byte boundaries on x86.

But look at sum1; the elements in the first row are 8 bytes apart, but the next row starts 16 bytes later! That means 8 additional bytes are wasted between each row. This means that fewer rows will fit in the CPU’s L1 cache, causing more cache misses, slowing down the program. (Why did this layout occur? I don’t know the answer to that yet)

Second of all, let’s look at the assembly code. This is just for adding the two elements of a row together.

;array addition
mov eax, DWORD PTR _arr$[esp+4] ; 1st operand addr
mov eax, DWORD PTR [eax+ebp*4] ; 2nd operand addr
fld QWORD PTR [eax+8] ; 1st operand
fadd QWORD PTR [eax] ;add 2nd operand
faddp ST(1), ST(0) ;add result to answer

;struct addition
fld QWORD PTR [eax+8];put first operand in register
fadd QWORD PTR [eax-16] ; add other operand
faddp ST(1), ST(0) ;add result to answer

So…the array version has to do two mov’s to fixup the correct addresses.

It seems now that the struct version is slightly more efficient here–both in space and in speed. However, this is really a microcosm of a problem. This kind of thing probably wouldn’t matter unless you were doing a lot of it. Personally, I’d go with the struct more often than not because it would be much easier to update the code to use a 3rd field, for example.

List<> vs. ArrayList

I read Rico Mariani’s latest quiz, and decided to check out the results for myself in BRayTracer.

I already have some simple performance benchmark tests in my NUnit tests, so I ran some before and after. I changed only the ArrayList’s used in the scene object to hold shapes, materials, and lights.

Before:

25.197 opaque spheres/sec (time: 3.969 )

After:

31.841 opaque spheres/sec (time: 3.141 )

Pretty impressive savings with minimal work! Almost a full second! In an enormous scene, that will add up to a LOT of time saved. I still have to change the implementation for polygons and polygonal meshes–this will greatly speed up those operations, which are currently fairly slow. In fact, the way I have polygonal meshes written at the moment, they’re nearly impossible to render in a decent amount of time.

More exciting things coming to BRayTracer soon…