Tag Archives: database

Tracking database changes using triggers

Tracking changes in database tables is an incredibly useful feature–especially for operational data that can change often. Having recently had to implement this feature, I thought I’d share some of the techniques I learned.

Sample Database

First, let’s conceptualize a very simple database consisting of user information (name, date of birth), and e-mails. A user can have more than one e-mail.

 

Table: UserData

Field Type
ID int (PK, identitiy)
FirstName varchar
LastName varchar
birthdate date

Table: UserEmails

Field Type
UserID int (FK)
email varchar

 

 

We want to track all changes to the FirstName, LastName, and birthdate fields. In addition we want to track when e-mails are added or removed from a user. As we’ll see, these aims are accomplished using two different methods.

My implementation is done in SQL Server 2000 and C#, but any database that supports triggers can be used.

Changes in a Single Table

With this method we want to track the changes to all fields of a table. In our example, we want to know when FirstName, LastName, and birthdate change values in the UserData table.

To accomplish this we need another table to track the history. This table is going to have the exact same fields as UserData, plus a few extra for the change tracking.

Table: UserDataChanges

Field Type
ChangeID int (PK, identity)
ChangeTime datetime
ChangeUser varchar
ID int (FK)
FirstName varchar
LastName varchar
birthdate date

Now the automated part–adding a trigger to populate this automatically:

CREATE TRIGGER UserDataChangeTrigger ON UserData FOR UPDATE, INSERT
AS    
    IF (UPDATE (FirstName) OR UPDATE(LastName) OR UPDATE(birthdate))    
    BEGIN     
        INSERT UserDataChanges 
            (ChangeTime, ChangeUser, ID, FirstName, LastName, birthdate)
            (SELECT GetUtcDate(), user, ID, FirstName,LastName,birthdate 
                FROM inserted)     
    END     

This trigger will insert a new row into the UserDataChanges table whenever a row in the UserData table is updated or inserted. The IF (UPDATE(FirstName)…. ) is not strictly required in this scenario, but in other cases I did not want a change recorded when certain fields were updated (i.e., you have a field that tracks the last change time of that row, or the number of orders, or any other field that can change frequently and isn’t important to track–you don’t want to create too much noise in this or it will not be useful). The GetUtcDate() and user are SQL Server functions that retrieve the current UTC time and the username of the process that caused the change–very useful for tracking responsibility. The inserted table is created by the server for use by the trigger and contains all the new values.

Changes in a Foreign Key Table

The UserEmails has to be handled differently because there can be multiple e-mails for each user and we can assume they can be added, or removed at will (Remove + Add = Update, so I won’t consider direct updates here).

The solution I landed on was to have a generic event log table that stores manual log entries as well as “special” entries denoting adding or removing e-mails.

Table: UserEventLog

Field Type
EventID int (PK, identity)
ID int (FK)
EventTime datetime
EventType int
ChangeUser varchar
Notes varchar

This table can be used for both adding text notes to a user and, by using the EventType field, special events. In our example, we have two events we need to track:

 

Event Value
EmailAdded 1
EmailRemoved 2

(In code, I’ve made these enumerations)

Next we add a trigger on the UserEmails table:

CREATE TRIGGER UserEmails_EmailAddedTrigger
ON UserEmails
FOR INSERT
AS
 BEGIN
     INSERT UserEventLog(ID, EventTime, EventType, ChangeUser, Notes)
        (SELECT ID, GetUtcDate(), 1, user, '{'+email+'}' FROM inserted)
 END

The value 1 stands for EmailAdded. I’ve added braces around the actual e-mail address to set it apart from regular notes (we’ll see how to integrate everything later).

To handle the deletion of e-mails add another trigger:

CREATE TRIGGER UserEmails_EmailRemovedTrigger
ON UserEmails
FOR DELETE
AS
 BEGIN
     INSERT UserEventLog(ID, EventTime, EventType, ChangeUser, Notes )
        (SELECT ID, GetUtcDate(), 2, user, '{'+email+'}' FROM deleted)
 END

The only things different: FOR DELETE (instead of INSERT), changed the EventType to 2 (EmailRemoved), and the values are taken from the SQL Server-supplied deleted table.

That’s enough to get a pretty good change-tracking system in place, but you’ll still have to build a UI to display it effectively.

Displaying the Changes in the UI

With the above work done, you end up with two types of entities: changes and events. While it would be possible to integrate all functionality into a single event/change table using a lot more logic in the SQL Trigger code, I’m personally more comfortable with the change logic being in my application code. I think this way the database is kept more “pure” and open to changes down the line.

That means we will need to integrate these two types of entities into a single list, ordered by date/time. I’m going to assume the existence of two classes or structs that represent each of these entities. They’ll be called UserChange and UserEvent. I’ll also assume that the lists of each of these are already sorted by time, since that’s trivial to do in a SQL query.

Given that, we need a function that takes both of these lists and produces a sorted, combined list with an easy-to-understand list.

How the function works:

  1. Go through both lists, and pick whichever one is next, time-wise.
  2. Translate the object into a string/list-view representation of that object.
  3. If it’s a UserChange object, compare it to the previous one to figure out what changed.
  4. Sort the list in reverse order to put newer items at the top.

Here’s the C# code which I’ve adapted from our production system. Don’t get hung up on the details:

 

private void FillLog(IList<UserEvent> events, IList<UserChange> changes)
{
    List<ListViewItem> tempItems = new List<ListViewItem>();
 
    int currentEventIdx = 0;
    int currentChangeIdx = 0;
    eventLogListView1.Items.Clear();
 
    while (currentEventIdx < events.Count
    || currentChangeIdx < changes.Count)
    {
    UserChange currentChange = null;
    UserChange prevChange = null;
    UserEvent currentEvent = null;
 
    DateTime changeTime = DateTime.MaxValue;
    DateTime eventTime = DateTime.MaxValue;
 
    if (currentChangeIdx < changes.Count)
    {
        currentChange = changes[currentChangeIdx];
        changeTime = currentChange.ChangeDate;
        if (currentChangeIdx > 0)
        {
        prevChange = changes[currentChangeIdx - 1];
        }
 
    }
 
    if (currentEventIdx < events.Count)
    {
        currentEvent = events[currentEventIdx];
        eventTime = currentEvent.EventDate;
    }
    string dateStr;
    string userStr;
    string eventTypeStr="";
    string notesStr;
 
    if (changeTime < eventTime)
    {
        dateStr = Utils.FormatDateTime(changeTime);
        userStr = currentChange.UserName;
        notesStr = GetChangeString(currentChange, prevChange);
        currentChangeIdx++;
    }
    else
    {
        dateStr = Utils.FormatDate(eventTime);
        userStr = currentEvent.UserName;
        notesStr = currentEvent.Notes;
        eventTypeStr = currentEvent.EventType.ToString();
        currentEventIdx++;
    }
 
    if (notesStr.Length > 0)
    {
        ListViewItem item = new ListViewItem(dateStr);
        item.SubItems.Add(userStr);
        item.SubItems.Add(eventTypeStr);
        item.SubItems.Add(notesStr);
        item.ToolTipText = notesStr;
        item.BackColor = (tempItems.Count % 2 == 0) ? 
            Color.Wheat : Color.White;
        tempItems.Add(item);
 
    }
 
    }//end while
    eventLogListView1.BeginUpdate();
    for (int i = tempItems.Count - 1; i >= 0; i--)
    {
    eventLogListView1.Items.Add(tempItems[i]);
    }
 
    eventLogListView1.AutoResizeColumn(0, 
        ColumnHeaderAutoResizeStyle.ColumnContent);
    eventLogListView1.AutoResizeColumn(1, 
        ColumnHeaderAutoResizeStyle.ColumnContent);
    eventLogListView1.AutoResizeColumn(2, 
        ColumnHeaderAutoResizeStyle.ColumnContent);
    eventLogListView1.Columns[3].Width = eventLogListView1.Width - 
    (eventLogListView1.Columns[0].Width +
    eventLogListView1.Columns[1].Width +
    eventLogListView1.Columns[2].Width +10);
 
    eventLogListView1.EndUpdate();
}

Now we need to define GetChangeString, which figures out the differences in successive UserChange objects and displays only pertinent information.

 

private string GetChangeString(
    BuoyDataChange currentChange, 
    BuoyDataChange prevChange)
{
    StringBuilder sb = new StringBuilder();
 
    if (prevChange == null)
    {
        CompareAndAdd(sb, "First Name", 
            null, currentChange.FirstName);
        CompareAndAdd(sb, "Last Name", 
            null, currentChange.LastName);
        CompareAndAdd(sb, "Birth Date", 
            null, currentChange.BirthDate);
    }
    else
    {
        CompareAndAdd(sb, "First Name", 
            prevChange.FirstName, currentChange.FirstName);
        CompareAndAdd(sb, "Last Name", 
            prevChange.LastName, currentChange.LastName);
        CompareAndAdd(sb, "Birth Date", 
            prevChange.BirthDate, currentChange.BirthDate);
    }
    return sb.ToString();
}

And one last helper function which compares two objects and if different appends the change to a StringBuilder object.

 

private void CompareAndAdd(StringBuilder sb, string field, 
    object oldVal, object newVal)
   {
       if (oldVal == null && newVal == null)
           return;
 
       if (oldVal == null || !oldVal.Equals(newVal))
       {
           if (sb.Length > 0)
           {
               sb.Append(", ");
           }
           sb.AppendFormat("{0}:{1} -> {2}", field, oldVal, newVal);
       }
   }

In this way you can end up with an automated system that displays all changes in an easy-to-understand format.

Here’s a sample of what our system looks like (click to enlarge):

Change log screenshot

Other ways to accomplish this? Better ways? Please leave a comment!

kick it on DotNetKicks.com


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Factory Design Pattern to the Rescue: Practical Example

Design patterns really are quite useful. I have a situation in the code I’m working on where I was obviously repeating a lot of the same patterns and code (functions that were 90% the same–the only thing different was the specific class being instantianted): perfect candidate for factory techniques.

Let’s say we have the following set of classes representing a data access layer meant to abstract some database information from the client code. We have a BaseDBObject class that defines all of the common. We derive from that for each table we want to access.

class BaseDBObject
{
    protected BaseDBObject(Database database) {...}
    public void SetProperty(string name, object value) {...}
    //...more common functionality

};

Derived from this base are lots of classes that implement table-specific database objects. To control object creation, constructors are declared protected and static member functions are used. To wit:

class MyTableObject : BaseDBObject
{
    protected MyTableObject(Database database) : base(database) { }
    public static void Create(Database database, int param1, string param2)
    {
        string query = "INSERTO INTO MyTable (param1, param2) VALUES (@PARAM1, @PARAM2)";
        SqlCommand cmd = new SqlCommand(query, database.GetConnection());
        //paramterize query
        try {
            //exeute query
            //error check
            MyTableObject mto = new MyTableObject();
            //set object properties to match what's inserted
            return mto;
        }
        catch (SqlException ex)
        {
            //handle exception
        }
        finally
        {
            //close connection
        }
    }
    //...
    public static IList<MyTableObject> LookupById(Database database, int id)
    {
        string query = "SELECT * FROM MyTable WHERE ID = @ID";
        SqlCommand cmd = new SqlCommand(query, database.GetConnection());
        //parameterize query
        try
        {
            //execute query
            SqlDataReader reader = cmd.ExecuteReader(...);
            List<MyTableObject> list = new List<MyTableObject>();
            while (reader.Read())
            {
                MyTableObject mto = new MyTableObject();
                //set properties in mto
                list.Add(mto);
               
            }
            return mto;
        }
        catch (SqlException ex)
        {
            //handle exceptions
        }
        finally
        {
            //close connections
        }
    }
};

There are two functions here that must be created for every single table object derivation. That can be a LOT of code, and most of it is doing the same thing. There are a number of simple ways to handle some of the repetition:

  1. There will be multiple LookupByXXXXX functions. They can all specify the query they will use and pass it to a common function that executes and returns a list the class’s objects.
  2. Paramterizing queries can be accomplished by a function that takes a query string, a list of parameters (say, in a struct that describes each parameter), and produces a paramterized SqlCommand, ready for execution.
  3. Other helper functions that do the actual execution and checking of errors.

In the end, however, you are still left with two things that can’t be relegated to helper functions: MyTableObject mto = new MyTableObject(); and List<MyTableObject> list = new List<MyTableObject>(); One possible solution is to use reflection to dynamically generate the required objects. From a performance and understandability perspective, I don’t think this is a first choice.

Which leaves a factory method. My first attempt involved using templates to simplify this (you will see why shortly). Something like this:

class DatabaseObjectFactory<T> where T : BaseDBObject, new()
{
    public T Create(Database database) { return new T(database); } 
    public IList<T> CreateList() { return new List<T>(); }
};

This way, I could simply define a function in the base class BaseDBObject, which I could call like this:

Lookup(database, query, arguments, new DatabaseObjectFactory<MyTableObject>());

and that would automagically return a list of the correct objects. The problem with this approach, however, lies in the Create function. .Net can’t pass arguments to a constructor of T. It can only return new T() with no parameters. Nor can you access properties of BaseDBObject through T after creation. Back to the drawing board…

Now I had to face the problem of creating a duplicate inheritance hierarchy of object factories. This is what I had hoped to avoid by using generics. I designed an interface like this:

interface IDatabaseObjectFactory
{
    BaseDBObject Create(Database database);
    IList<BaseDBObject> CreateList();
};

And in each table object derivation I declare a private class and static member like this:

private class MyTableObject : IDatabaseObjectFactory
{
    public BaseDBObject Create(Database database) { return new MyTableObject(database); }
    public IList<BaseDBObject> CreateList() { return new List<MyTableObject>(); }
};
private static IDatabaseObjectFactory s_factory = new MyTableObjectFactory();

Now, I can have a Lookup function in BaseDBObject that accepts an IDatabaseObjectFactory parameter. At the expense of creating a small, simple factory class for each table object that needs it, I can remove roughly 50 lines of code from each of those classes. Smaller code = fewer bugs and easier maintenance.

The base lookup function would look something like this:

protected Lookup(Database database, string query, ICollection<QueryArgument>, IDatabaseObjectFactory factory)
{
    //paramterize query
    //execute query
    //ignoring error-handling for sake of brevity
    SqlDataReader reader = cmd.ExecuteReader(...);
    IList<BaseDBObject> list = factory.CreateList();
    while (reader.Read())
    {
        BaseDBObject obj = factory.Create(database);
        obj.Fill(reader);    //another common function that
                             // automatically fills in all properties
                             //of object from SqlDataReader
        list.Add(obj);
    }
    return list;
}

But what about theMyTableObject.Create()? It’s possible to do something like this, but in a different way. In order to handle inserting rows in a table that uses identity fields (that you don’t know until after creation), I created a utility function that inserted the data using database, query string, and QueryArgument objects. Then, instead of creating the object directly, I do a Lookup based on values that I know are unique to the row I just inserted. This ensures I get the most complete object (at the expense of an extra database trip).


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order:

Detecting database connection

I have a service that’s absolutely critical to business functions that relies on both Exchange and SQL Server. The problem is that the service’s existing code has a number of places where a failure to connect to the database will result in a dummy, “bad” value being returned from the data access layer. The business layer doesn’t know if the bad values came from bad input data or no database connection, and it assumes the former. This leads to data loss for the customer.

I see three possible solutions for this:

  1. Have an exception for when the database is inaccessible. This is complicated because it changes the code flow, which would not be easy. Every place it tries to access the database would have to handle the exception. I would also have to develop a mechanism to reprocess the data later.
  2. Have a known “bad” value that means database couldn’t be accessed. This is tricky (if not impossible) and would significantly decrease the readability of the code.
  3. Detect whether the database is alive or not before attempting any processing. This avoids the problem of reattempting processing at a later time. It allows more of the code to stay as is (thus fewer chances of new bugs).

I went with 3 because it is also conceptually closer to what should happen. If the database is unavailable, the service can’t possibly do any meaningful work so it should just stop until the database comes back up.

The code to detect a database disconnect is very simple:

static bool IsDatabaseAlive()
{
    SqlConnection connection = null;
    try
    {
        
string strConnection = ConfigurationSettings.AppSettings[“ConnectionString”];
        
using(connection = new SqlConnection(strConnection))
         {
             
connection.Open();
             
return (connection.State != ConnectionState.Open);
          }
     }
    
catch
    
{
         
return false;
     }
}

 

I run this code before running through the processing loops. If the database is alive I go ahead and process.  


Check out my latest book, the essential, in-depth guide to performance for all .NET developers:

Writing High-Performance.NET Code, 2nd Edition by Ben Watson. Available for pre-order: