cronn GmbH - Using OpenRewrite for large-scale refactoring

Our Starting Position

What makes OpenRewrite so compelling is its automated nature. Migrating your code base between Java versions or upgrading a framework becomes a more relaxed task: You add the corresponding so-called “recipe”, execute rewriteRun, verify the code with your automated tests and then you’re done. Instead of replacing imports by hand or fighting with Gradle because of a rogue transitive dependency, you can take a coffee break while OpenRewrite works in the background.

An OpenRewrite recipe contains the logic to do a specific task, like changing org.junit imports with org.assertj equivalents. Due to the large user base and the open-source nature of most recipes, you can find recipes for everything from Spring Boot upgrades to switching from JUnit to AssertJ in minutes. In some cases, it might also be useful for enforcing code standards – much like an auto-formatter – where OpenRewrite can be integrated into the normal development pipeline, for example as a pre-commit hook.

How Does It Work?

There are “declarative” and “imperative” recipes which have different purposes. You can imagine declarative recipes like Lego. They are defined in a simple YAML file and typically consist of a list of existing recipes that should be executed together. Many of these recipes are available in OpenRewrite’s public repositories¹ and are designed for common tasks, such as dependency upgrades or framework migrations. For example, the AssertJ² recipe I mentioned earlier shows how an entire framework change can be automated with just a single declarative recipe.

Imperative recipes, on the other hand, are implemented in code. They define the actual logic that transforms your source code; in many cases by replacing old methods with new ones or changing an import. While there are many of these already available, OpenRewrite also provides a comprehensive Java API for writing your own recipes which we’ll explore in more detail next.

Lossless Semantic Tree and Visitor Pattern

OpenRewrite builds a Lossless Semantic Tree or LST³ when it is invoked. An LST, as its name suggests, is a much more detailed version of an AST (Abstract Syntax Tree). While the AST only contains the information necessary for evaluating the logical structure of the program, the LST includes whitespace information as well as a complete representation of the type relations. This means that once OpenRewrite has parsed a source file into an LST it can generate an exact replica from that LST alone. Because of this, local design abnormalities like an unusual indentation will be preserved as OpenRewrite doesn’t assume anything about your code styles. Additionally, because of the extensive type information, it can correctly identify the type of any given field. This is incredibly helpful if a recipe only wants to act on a very specific set of statements, for example for fixing a known vulnerability in a specific method from a package. OpenRewrite also uses this to verify that the new code uses existing types and doesn’t reference unavailable classes.

Once that LST is built, we get a chance to modify it. OpenRewrite is designed around the visitor pattern⁴ which allows us to define the behavior of a “visitor” which is moving along the LST. Different visitor types exist to balance how much you’re able to change vs. what can be validated by OpenRewrite. For example, a JavaIsoVisitor isn’t allowed to replace a method declaration with a field, however this is possible when using a JavaVisitor. We would do this by overriding visitX methods for all kinds of elements of a source file, such as class declarations, method declarations/invocations or conditionals. In each of these methods, we get some representation of that LST node in our code. These are immutable objects which contain the information present in the source file. We can use these when we want to change something for the current element, such as only renaming methods that start with “test”:

@Override
public J.MethodDeclaration visitMethodDeclaration(J.MethodDeclaration method, ExecutionContext executionContext) {
   if (method.getSimpleName().startsWith("test")) {
       // TODO: Rename this method
   }
   return super.visitMethodDeclaration(method, executionContext);
}

To allow for more control about how the LST is traversed , OpenRewrite leaves it up to us to decide if and where we call super.visitX. OpenRewrite generally recommends starting any visitX method with the call to super. Omitting this call entirely will mean that the sub-tree is not traversed at all. This can be beneficial for improving performance; however, it isn’t needed in most cases. To further expand upon our example from above, let’s now change the method name. In OpenRewrite, the LST itself should not be mutated. Instead, we build a new “method object” that we then return from our method.

@Override
public J.MethodDeclaration visitMethodDeclaration(J.MethodDeclaration method, ExecutionContext executionContext) {
   String methodName = method.getSimpleName();

   if (methodName.startsWith("test")) {
       String newName = methodName.replaceFirst("test", "check");
       return method.withName(method.getName().withSimpleName(newName));
   }
   return super.visitMethodDeclaration(method, executionContext);
}

OpenRewrite detects that we returned an object different to what was passed into the method. It concludes that we must have changed something about the code and will store this new object in place of the old node in the LST. If you want to instead completely remove a statement, simply return null. In cases where you don’t want to do anything you should return super.visitX.

After the first visitor has traversed the whole LST, OpenRewrite will run another visitor through our recipe. If it detects any further changes, it will repeat this step, until no changes are made anymore. To make sure that changes from our recipe did not cause a “regression” in another active recipe, it will then re-run all other recipes in a similar pattern. Once that finishes it can confidently assert that all recipes have applied their logic to every single piece of code in the code base and every possible change has been made.

Lessons learned

Because of the inherent complexity in this type of meta programming, a test-driven development approach is highly favorable. It allows you to effectively cover the many possible edge cases.

Something that OpenRewrite already warns about in their documentation is recipe state. Recipe state increases the risk of artifacts from previous data unexpectedly changing the behaviour of your recipe. This not only introduces bugs that are difficult to find and fix, it also massively increases the complexity of your recipe. In our above example this can’t be avoided entirely, since we not only need to rename method declarations but also adjust any calls to those methods. This means we need to pass the information about our new names to visitMethodInvocation so that we can adjust the method calls accordingly.

The first option we have is the cursor. While the Java API of OpenRewrite itself doesn’t expose explicit methods like enterClass and exitClass, the cursor keeps track of where exactly we currently are in a stack-like structure, hence the name. It is cleared between every single cycle of a recipe and is best suited for communicating between two methods inside a visitor that come after each other. This wouldn’t be suitable for our scenario since a method call may come from a completely different place in the code base. Another possible solution is to put our information into the execution context. It is only ever cleared after all recipes have run so it is a much more persistent storage location. There are some limitations that you need to keep track of, however. The execution context does not allow mutating stored data to avoid hard to debug problems that occur due to state conflicts. You also need make sure that you don’t overwrite data from other recipes. The optimal way would be a ScanningRecipe⁵ visitor, where we first get the opportunity to scan the whole code base and collect information, after which a second visitor can apply changes.

Final Thoughts

With an extensive collection of open-source recipes and a fleshed-out Java API, OpenRewrite is a great way to approach code refactoring at a large scale. While the in-memory nature of the LST naturally will become a bottleneck for bigger projects, this problem is solved by Moderne’s custom solution with which it is possible to split the tree generation and store it more permanently. While OpenRewrite is primarily focused on Java and the surrounding ecosystem, it also offers recipes for YAML, XML, JSON and even a few other languages like C# or Scala (although in a much more limited capacity). Further code examples can be found in the cronn github⁶

Share your thoughts! We look forward to your comments and questions at blog@cronn.de