Continuous database migration for MongoDB using Spring

In this post I introduce a continuous database migration mechanism for MongoDB using Java and Spring Boot.

Intro

When you look at a development with Continuous Deployment the database is also continuously adapted. Typical database adaptions are e.g. initialising new attributes for all documents with proper values or performing cleanups. It is advantageous to store those change scripts in the version control system. This way the database can stay in sync with the source code once the database scripts are executed in the correct order.

For many SQL databases it is possible to use one of the powerful Java libraries Liquibase or Flyway to manage database changes (like adding columns or modifying existing table rows).
Regarding MongoDB however, things tight up. MongoDB is a non-relational database, thus Liquibase or Flyway can’t be used for it.
A MongoDB database looks like a JSON file and consists of collections and documents (comparable to tables and rows in SQL). To add a new attribute to a document you will just need to change the document-related POJO in the source code. However a new attribute will only be written to the database when a new document is inserted or an existing document is updated. This means that a new attribute will default to null, false or 0. You may want to set your own default values or apply other changes directly to the database. The common way is to manually connect to the database and perform JavaScript code to apply changes.

Once you work on a larger project though, you cannot rely on manual changes anymore. You will need a database management. For this problem of a missing database migration mechanism I have worked out a solution: The idea is to scan a folder containing update scripts and to apply them to the database if needed. The database will store executed scripts to distinguish them from not executed scripts. This feature will then be attached to the application launch, so whenever the application is started, it’ll check for new scripts. The mechanism is fully integrated into the Spring Boot environment and uses helpful key features like @Service and @Autowired annotations.
You can find the full project at GitHub.

Prerequisites

This feature requires certain prerequisites. Aside from a working Spring Boot Application these are the detailed dependencies:

  • Java 1.7 or later
  • Spring Boot 1.3.0 or later
  • Spring Data MongoDB 1.7.0 or later (Spring Boot 1.3.0 ships with Spring Data MongoDB version 1.8.1)
  • Apache Commons IO (optional, but helpful)

For the Spring Boot Application you need a mongo configuration similar to this one:

MongoConfig.java
@Configuration
@EnableMongoRepositories
public class MongoConfig extends AbstractMongoConfiguration {

	@Override
	protected String getDatabaseName() {
		return "mydatabase";
	}

	@Override
	public Mongo mongo() throws Exception {
		return new MongoClient("127.0.0.1", 27017);
	}

	@Override
	public MongoTemplate mongoTemplate() throws Exception {
		return new MongoTemplate(mongo(), getDatabaseName());
	}
}

For later use we will need the database name ("mydatabase") and an instance of MongoTemplate.

Continuous database migration

In this section I will go through the necessary steps to build the continuous database migration feature.
As described in the Intro section the migration feature will trigger uppon application launch and will apply scanned scripts to the database if needed.

Step 1: Create a POJO for database scripts:

DatabaseScript.java
@Document
public class DatabaseScript {

	private String name;
	private String body;

	public DatabaseScript(String scriptName, String scriptBody) {
		this.name = scriptName;
		this.body = scriptBody;
	}

	// getter and setter
}

The purpose of this class is to create objects of the scanned scripts and save them to the database. This way the database knows which scripts were already executed. It also works for different developers, each with their own database. On application start each database is synced regarding database scripts.

Step 2: Create a class which handles scripts. Note that the runScripts method will get the scripts as parameter input.

MongoDBMigrator.java
@Service
public class MongoDBMigrator {

	@Autowired
	private MongoTemplate mongoTemplate;

	public void runScripts(Resource[] resources) throws IOException {
		
		List<String> executedScriptNames = getExecutedScriptsFromDatabase();
		Map<String, String> allScripts = readAllScripts(resources);

		for(String scriptName : sortScriptNames(allScripts.keySet())) {
			if (!scriptInDatabase(executedScriptNames, scriptName)) {
				execute(scriptName, allScripts.get(scriptName));
			}
		}
	}

	private List<String> getExecutedScriptsFromDatabase() {
		List<DatabaseScript> databaseScripts = mongoTemplate.findAll(DatabaseScript.class);
		
		List<String> executedScriptNames = new ArrayList<>();
		for(DatabaseScript s : databaseScripts) {
			executedScriptNames.add(s.getName());
		}
		return executedScriptNames;
	}

	private Map<String, String> readAllScripts(Resource[] resources) throws IOException {
		HashMap<String,String> hashMap = new HashMap<String, String>();
		
		for(Resource r : resources) {
			String scriptName = r.getFilename();
			String script = getResourceContent(r);
			script = appendFunctionBody(script);
			hashMap.put(scriptName, script);
		}
		return hashMap;
	}

	protected String getResourceContent(Resource r) throws IOException {
		try(InputStream is = r.getInputStream()) {
			return IOUtils.toString(is, Charsets.UTF_8);
		}
	}

	private String appendFunctionBody(String string) {
		return "function() {" + string + "}";
	}

	private TreeSet<String> sortScriptNames(Set<String> allScriptNames) {
		return new TreeSet<>(allScriptNames);
	}

	private boolean scriptInDatabase(List<String> executedScriptNames, String scriptName) {
		return executedScriptNames.contains(scriptName);
	}

	private void execute(String scriptName, String scriptBody) {
		ScriptOperations scriptOps = mongoTemplate.scriptOps();
		scriptOps.execute(new ExecutableMongoScript(scriptBody));
		mongoTemplate.save(new DatabaseScript(scriptName, scriptBody));
	}
}

Step 3: Next, create a folder at src/main/resources/db where the database scripts will be stored. Later this folder will be scanned by the application for .js files.
I have added a test script which modifies each customer by anonymizing the last name. While this example isn’t necessarily the common use case, it shows how a script could look like. A more common update script would set a default value to new attribute.

001_anonymize_customer.js
var process = function(collection) {
	return function(customer) {
		var oldname = customer.lastname;
		if(oldname != null && oldname.length > 0) {
			customer.lastname = oldname.charAt(0) + '.';
		}
		collection.save(customer);
	}
};

db = db.getSiblingDB('mydatabase');
db.customer.find().forEach(process(db.customer));

I have added an incrementing prefix (001_) to the script name. It ensures that scripts will be executed in the correct order.

Now you can add database scripts to this folder. They are written in JavaScript. My usual approach in writing a new script is:

  1. Make a backup of your local database.
  2. Use the mongo shell to modify data by hand. You can use standard JavaScript and mongo shell commands in the console.
  3. Write a script to perform the modifications that you tested by hand. Be sure to replace mongo shell commands with the JavaScript Equivalents.
  4. Test your script with the backup you made earlier.

Step 4: Finally call the MongoDBMigrator class from your main application class:

Application.java
@SpringBootApplication
public class Application implements CommandLineRunner {

	@Autowired
	private MongoDBMigrator migrator;

	@Autowired
	private ApplicationContext ctx;

	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}

	@Override
	public void run(String... args) throws Exception {
		migrator.runScripts(ctx.getResources("classpath:db/*.js"));
	}
}

The application will scan the scripts located at src/main/resources/db and pass them to the method call. A developer just needs to add a script file to the script folder and the script will automatically be executed uppon next application start.

Conclusion

With this setup it will be really easy to perform database changes.
I think this concept has a lot of potential and could be developed into a standalone library. There are still some impovements to be done. At a given time I would like to extend this project to an open source library. Feel free to send your feedback to this topic.

Leave a Reply