Tracking changes in database tables is an incredibly useful feature–especially for operational data that can change often. Having recently had to implement this feature, I thought I’d share some of the techniques I learned.
First, let’s conceptualize a very simple database consisting of user information (name, date of birth), and e-mails. A user can have more than one e-mail.
We want to track all changes to the FirstName, LastName, and birthdate fields. In addition we want to track when e-mails are added or removed from a user. As we’ll see, these aims are accomplished using two different methods.
My implementation is done in SQL Server 2000 and C#, but any database that supports triggers can be used.
Changes in a Single Table
With this method we want to track the changes to all fields of a table. In our example, we want to know when FirstName, LastName, and birthdate change values in the UserData table.
To accomplish this we need another table to track the history. This table is going to have the exact same fields as UserData, plus a few extra for the change tracking.
|ChangeID||int (PK, identity)|
Now the automated part–adding a trigger to populate this automatically:
This trigger will insert a new row into the UserDataChanges table whenever a row in the UserData table is updated or inserted. The IF (UPDATE(FirstName)…. ) is not strictly required in this scenario, but in other cases I did not want a change recorded when certain fields were updated (i.e., you have a field that tracks the last change time of that row, or the number of orders, or any other field that can change frequently and isn’t important to track–you don’t want to create too much noise in this or it will not be useful). The GetUtcDate() and user are SQL Server functions that retrieve the current UTC time and the username of the process that caused the change–very useful for tracking responsibility. The inserted table is created by the server for use by the trigger and contains all the new values.
Changes in a Foreign Key Table
The UserEmails has to be handled differently because there can be multiple e-mails for each user and we can assume they can be added, or removed at will (Remove + Add = Update, so I won’t consider direct updates here).
The solution I landed on was to have a generic event log table that stores manual log entries as well as “special” entries denoting adding or removing e-mails.
|EventID||int (PK, identity)|
This table can be used for both adding text notes to a user and, by using the EventType field, special events. In our example, we have two events we need to track:
(In code, I’ve made these enumerations)
Next we add a trigger on the UserEmails table:
The value 1 stands for EmailAdded. I’ve added braces around the actual e-mail address to set it apart from regular notes (we’ll see how to integrate everything later).
To handle the deletion of e-mails add another trigger:
The only things different: FOR DELETE (instead of INSERT), changed the EventType to 2 (EmailRemoved), and the values are taken from the SQL Server-supplied deleted table.
That’s enough to get a pretty good change-tracking system in place, but you’ll still have to build a UI to display it effectively.
Displaying the Changes in the UI
With the above work done, you end up with two types of entities: changes and events. While it would be possible to integrate all functionality into a single event/change table using a lot more logic in the SQL Trigger code, I’m personally more comfortable with the change logic being in my application code. I think this way the database is kept more “pure” and open to changes down the line.
That means we will need to integrate these two types of entities into a single list, ordered by date/time. I’m going to assume the existence of two classes or structs that represent each of these entities. They’ll be called UserChange and UserEvent. I’ll also assume that the lists of each of these are already sorted by time, since that’s trivial to do in a SQL query.
Given that, we need a function that takes both of these lists and produces a sorted, combined list with an easy-to-understand list.
How the function works:
- Go through both lists, and pick whichever one is next, time-wise.
- Translate the object into a string/list-view representation of that object.
- If it’s a UserChange object, compare it to the previous one to figure out what changed.
- Sort the list in reverse order to put newer items at the top.
Here’s the C# code which I’ve adapted from our production system. Don’t get hung up on the details:
Now we need to define GetChangeString, which figures out the differences in successive UserChange objects and displays only pertinent information.
And one last helper function which compares two objects and if different appends the change to a StringBuilder object.
In this way you can end up with an automated system that displays all changes in an easy-to-understand format.
Here’s a sample of what our system looks like (click to enlarge):
Other ways to accomplish this? Better ways? Please leave a comment!
Check out my latest book, the essential, in-depth guide to performance for all .NET developers:
Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order: