Tuesday, 23 December 2008

Using SQLite for fast database testing – technical gotchas and wider lessons learned

On my current project, we’re successfully using SQLite for in-memory testing with NHibernate / Castle.ActiveRecord. This makes tests that run against the database a lot faster than if we ran them against SQL Server.

NHibernate’s pluggable driver / connection provider model makes this fairly painless, but there are a few gotchas in the technical details that are worth calling out. A couple of these led me to some general observations, neither of which are new but both of which are important.

Database lifetimes

An in-memory SQLite database lives for the lifetime of the connection, so each time you close and re-open the database, you lose your table structure and your data. Usually NHibernate opens and closes connections as it sees fit so you need to provide a custom DriverConnectionProvider for NHibernate which holds the connection open until you explicitly close it. This has been covered pretty well by other bloggers so I won’t go into details here. Just don’t forget to explicitly close the connection otherwise you’ll keep lots of databases hanging around taking up your memory.

The SqlType argument of the Property attribute

When we need to override NHibernate’s default choice of data type for a column, we can use the SqlType argument like so:

        [Property(ColumnType = "BinaryBlob", SqlType = "varbinary(max)")]
        public byte[] Data { get; set; }

There’s a problem here – varbinary(max) is a SQL Server column type that isn’t valid in SQLite. SQLite has the BLOB column type for binary data like this. We’d like to be able to pick a different SqlType for production (SQL Server) and test (SQLite) builds. However because SqlType is in an attribute declaration, it has to be a compile time constant. I found that the simplest way around this was to hook into the ActiveRecord initialization pipeline to modify the ActiveRecord metadata before the schema gets created.

        [TestFixtureSetUp]
        public virtual void TestFixtureSetUp()
        {
            ActiveRecordStarter.ResetInitializationFlag();
            ActiveRecordStarter.ModelCreated += ActiveRecordStarter_ModelCreated;
            ActiveRecordStarter.Initialize(typeof (MyEntity).Assembly, ActiveRecordSectionHandler.Instance);
        }

        private static void ActiveRecordStarter_ModelCreated(ActiveRecordModel model, IConfigurationSource source)
        {
            if (source.GetConfiguration(typeof (ActiveRecordBase))
                .Children
                .Single(configuration => configuration.Name == "connection.driver_class")
                .Value.ToLower().Contains("sqlite"))
                foreach (var property in model.Properties)
                    if (!string.IsNullOrEmpty(property.PropertyAtt.SqlType) && property.PropertyAtt.SqlType.ToLower().Contains("varbinary"))
                        property.PropertyAtt.SqlType = "BLOB";
        }

Column types and the law of leaky abstractions

SQLite and SQL Server offer different sets of column types. For example, SQLite has a TEXT column type for all textual data, whereas SQL Server has fixed and variable length text datatypes like NCHAR(100) and NVARCHAR(MAX). NHibernate and ActiveRecord hide these differences from you. You provide a property declaration and it generates the appropriate DDL for the database which you’ve configured NHibernate to use. For example, here are excerpts from the creation scripts for my current project.

For SQLite

create table ErrorLogs (
  Id INTEGER not null,
  RequestUrl TEXT,
  Message TEXT,
  primary key (Id)
)

For SQL Server

create table ErrorLogs (
  Id INTEGER not null,
  RequestUrl NVARCHAR(255) null,
  Message NVARCHAR(255) null,
  primary key (Id)
)

Notice the different datatypes and nullability declarations between the two databases. This starts to matter when you write a test that you can insert a really long (> 255 characters) string into the Message column. Your test passes fine against SQLite, then you run your app against SQL Server and it falls over horribly.

We can fix this example by making individual tests run against SQL Server, then override the default column types in the ActiveRecord attributes to make them pass.

        [Test]
        public void should_save_error_log_with_long_message()
        {
            using (SqlServer)
            {
                var moreThan255Chars =
                    "a really really long message a really really long message a really really long message a really really long message a really really long message a really really long message a really really long message a really really long message a really really long message a really really long message ";
                var savedErrorLog = new ErrorLog
                                        {
                                            Message = moreThan255Chars
                                        };

                systemUnderTest.Save(savedErrorLog);

                Flush();

                var retrievedErrorLog = systemUnderTest.Load(savedErrorLog.Id);

                Assert.That(retrievedErrorLog.Message, Is.EqualTo(moreThan255Chars));
            }
        }

        // In a base class of your test fixture class
        protected IDisposable SqlServer
        {
            get
            {
                // Reconfigure NH to use SQLServer
                // Rebuild empty schema
                // DisposableAction invokes the delegate you give 
                // it when it's disposed (inspired by Rhino.Commons)
                return new DisposableAction(TestFixtureSetUp);
            }
        }

        // In the affected ActiveRecord class
        [Property(ColumnType = "StringClob", SqlType = "NTEXT")]
        public string Message { get; set; }

But this is another example of a situation where we have to remember each time we need to test a field like this, and if we forget, we may well introduce a bug. As I said yesterday, it’s much better to make it easy to do the right thing, so we should make sure that our automated build will expose significant differences between SQL Server and SQLite automatically. Unfortunately, it seems that the only real option is to run all the tests again against SQL Server. This undoes the reduction in build time that we gained by using SQLite. I think a compromise is to run SQLite tests on the developers’ desktops and run SQL Server tests on the build server.

This scenario serves up a couple of interesting lessons:

Lesson 1: The Law of Leaky Abstractions

NHibernate would like you to think that it’s successfully abstracted away the differences in column types and SQL Syntax between the SQL Server and SQLite, but sometimes you need to know what’s behind the façade. This is a classic example of Joel’s Law of Leaky Abstractions. NHibernate and ActiveRecord have several leaks in their abstractions, as recently my colleague Stuart recently pointed out. They make you feel that you don’t need to worry about the presence of a relational database, but every now and then you need to remember that there is some SQL happening under the hood – in particular around session management and lazy-loading. This isn’t a criticism of ActiveRecord – it’s a great tool and they do explicitly call out some of these leaks on the project homepage.

Lesson 2: Every difference between your test and production environments will trip you up at some point

I think the headline speaks for itself. In this example, it was the difference between SQL Server and SQLite. This can happen in many areas – it’s not only differences in the technical environment, but differences in datasets and usage patterns can hit you too:

  • Testing and developing against tiny datasets, so you don’t notice the fact that your app is dog slow with a realistic dataset.
  • Forgetting that, unlike in development, the users’ browsers aren’t on the same machine, or even the same continent, as the web server, so maybe that really chatty AJAX UI isn’t the best plan.
  • Viewing the UI on the 1920*1080 mega-displays that the developers have, and forgetting that the user with a 1024*768 can’t see the navigation links without scrolling sideways.
  • Testing and developing against tiny datasets, so you don’t realise that you haven’t offered the users any easy way to find their data. Carlos puts it better than I can:
Next time you see a system where there are enormous listings of items with no search, pagination or sorting, ask the developers if they have ever watched a typical user at work; chances are they have only thought about the system as they see it: since the testing dataset is usually small, a loop spitting out a bit of HTML for each element isn’t such a big deal. They might even say there’s a story to implement all that lovely stuff later on, but they just get moved over and over to the bottom of the backlog barrel… until everyone watches a person struggle to find needles in a tabular haystack all day. This is a simple example – almost too trivial actually, but one I’ve seen happen way too many times.

What do we do about this? On the technical side, we can work to make our development and testing environments as close to production as possible. For the other areas, it’s vital that we are vigilant and keep an eye out for these risks as we go along. To paraphrase Thomas Jefferson: “the price of Agile (not having to write stacks of functional and non-functional specifications at the start of the project) is eternal vigilance (having to keep an eye on all these things as we go along)”.

0 comments:

Post a Comment