Agile Testing With Data Anonymisation

In addition to developer-drive unit and integration testing, it is very important to create realistic data sets for acceptance / gui / performance / “ility” etc testing. Creating such a realistic data set is challenging. Realistic data is not ideal data. It is frequently “invalid” data that should not even be in the system in the first place. This is especially important when dealing with legacy systems with poor data controls or systems where there are several sources of input for data. The most frequently encountered data repository for McKenna Consultants is a classic relational database.

With this is mind, I recently decided to create a piece of software that could anonymise a real production relational database so that it could safely be used for agile testing. My solution was to generate an EDMX file using Visual Studio and create a small piece of software that would traverse the entire database structure, anonymising as it went.

For inspiration, I looked to the Moq source code. For my solution, we need to set up an object graph (e.g. the one generated by the Entity Framework in Visual Studio and a root object (or set of objects) to begin the traversal from. The client code ends up looking something like this:

AnonymiseIt anonymise = new AnonymiseIt();

anonymise.AddAnonymisation(new Anonymisation<aspnet_Users, string>().Anonymise(u => u.UserName).Using(a => usernameAnonymiser.Anonymise()));
anonymise.AddAnonymisation(new Anonymisation<aspnet_Membership, string>().Anonymise(u => u.Comment).Using(a => shortWordAnonymiser.Anonymise()));

foreach (aspnet_Users user in users)
{
 anonymise.Execute(user);
}

The solution ends up being type-safe and simple from a client point of view, although underneath there is a bit of moderately complex Generic and Reflection programming.

The code above will anonymise the Usernames of aspnet_Users. The aspnet_Users relationship to the aspnet_Membership table will be traversed automatically and the Comment field will be anonymised too. The “usernameAnonymiser” and “shortWordAnonymiser” classes are very short classes that implement this interface:

  /// <summary>
    /// Interface for anonymising data.
    /// </summary>
    /// <typeparam name="T">Anonymise and return data of this type.</typeparam>
    public interface IAnonymise<out T>
    {
        /// <summary>
        /// Create anonymous data of this type.
        /// </summary>
        /// <returns>Random anonymous data.</returns>
        T Anonymise();
    }

All-in-all with was a good, fun and small weekend project which will lead to much easier testing using realistic data.

My next step is to add further capabilities to the code to allow it to anonymise whole objects at once rather than individual fields. This would be useful, for example, to allow Username and LoweredUsername to be set at the same time to similar values.

I may well release the code under an open source license once it is nearer completion (and I am less embarrassed about the odd hack or two that I have implemented to speed things up…)

Share: