To create functional tests for your web applications, you need data in your database. The crucial thing about this test data is that it should be stable. Otherwise, your tests are not deterministic and that makes them useless. For example, if your test uses an Employee record with ID = 100 and Name = “Bob Smith”, you better make sure that Bob Smith with ID 100 is available each time you run the test.

Because test data is such a critical part of the app and because it needs to be stable, it should really be treated as any other source artifact (like code or database schema). It should be under source control and you should have a strategy on how to manage it properly.

Choosing a Format

Let’s first consider a simple, but important aspect of test data management: the format you store it in. Since databases can be populated in multiple ways (restore from a single backup file, INSERT statements, etc), you have a choice as to the format you use to store your test data. You can store the binary representation (backup) or text files corresponding to each table.

I prefer text files because it’s easier to see what changed. You can simply compare multiple versions and find exactly which records were added on a given check-in.

Creating Test Data

Next, let’s discuss creating and updating your test data. In theory, creating test data is simply a matter of adding records to the right CSV files and checking them in. In practice, this is tricky for a few reasons:

Because of this, it’s probably better to generate data directly in the database. Often, the easiest way to do that is by going through the app. For example, if you need an Employee record, just use the application to create one.

There is another caveat to be aware of. Because of the key issues I mentioned earlier, data generation should be atomic. In other words, multiple people cannot be generating test data at the same time because they may inadvertently generate different records with the same key.

To avoid problems, you should have a process to ensure that only one person at a time can generate test data. If your test data is in source control, you can check out the CSV files with an exclusive lock, update them, and check back in.

Another strategy is to use some kind of token to ensure exclusivity. It could be as simple as a plush toy or as fancy as an online app which shows who has the token. All team members know and agree that only the person who has the token is allowed to generate test data.


There are some tools available to manage test data, Jailer probably being the most popular. Our team uses a great open source project called nDump (gitHub link). nDump can both export a database into CSV files as well as update from them. It also plays well with CI (Jenkins in our case) because it can run from the command line.

You may also like:

Did you love / hate / were unmoved by this post?
Then show your support / disgust / indifference by following me on Twitter!

Comments are closed.