Guardian of data history

Auditing plays a significant role in the realm of data management. It serves as the guardian of data history, ensuring that every alteration to data is not only tracked, but also logged comprehensively. Essentially, auditing offers us a unique time-traveling mechanism, allowing us to traverse the history of our data, discover who made which changes, and understand the when and why behind those alterations. As developers, it is vital to consider efficient auditing methods for specific applications and use-cases. In this article, I will lay out the differences between the Java-based auditing tools Hibernate Envers, JPA Data and database trigger-based auditing. Using a simple Java application with Spring Boot, Spring Security, JPA Data and Hibernate ORM as an example, we will explore which approach is most suitable for various situations and requirements.

First, the basics: Change Data Capture

Change Data Capture (CDC) describes a process in which changes made to data in a database are propagated in real-time to a downstream system. This builds the basis for any auditing system. Any auditing process needs to be alerted that the audited data has been changed. This change could be a database insert, or a modify or delete operation. The existence of this change is then either preserved in some way, or a trigger event could be initiated.

Stay in the Java world: Hibernate Envers

Hibernate Envers presents a CDC solution that simplifies auditing within Java applications. What sets it apart is its seamless integration with Hibernate ORM, ensuring that you can stay firmly within the Java ecosystem. It works with one global REVINFO table as well as a table for each entity. To audit an entity, it is sufficient to annotate it with the @Audited annotation. While Envers uses a deny-listing instead of an allow-listing approach, keep in mind that each property is then automatically audited if not explicitly excluded. You will remember this once your database runs out of space because you forgot to explicitly exclude big files from auditing. Let us look at a simple example: imagine we have an Entity called dog with a few simple properties like name and height, and we mark it to be audited.

@Audited
@Entity
public class Dog {

	@Id
	private Long id;

	private String name;

	private Integer height;

	//Getters and setters

}

From this, hibernate will generate three tables: the known entity table, the REVINFO table, and the entity audit table (see picture). The REVINFO table consists of a revision ID and a timestamp. The revision ID included in the REVINFO table is pointing to the revision ID in the entity audit table. The audit table additionally includes the entity properties and a REVTYPE column that states the type of the occurred modification. That means that the auditing table has a history of every revision of an entity. That comes in handy if one wants to fetch the state of an entity at any given point in time, for example to diff it with the current state. Envers provides numerous ways to fulfill your auditing desires by offering a ton of configuration possibilities: from excluding single properties from being audited, to customizing the auditing tables itself. Read more about that in its documentation.

Three tables generated by Hibernate

To retrieve the audit data, Envers offers a AuditReaderFactory which queries all entity state snapshots. For our example dog entity with an object instance with ID 1, this would look like this:

List<Dog> snapshots = AuditReaderFactory.get(entityManager)
.createQuery()
.forRevisionsOfEntity( Dog.class, true, true)
.add(AuditEntity.id().eq(1L))
.getResultList();

Pros:

  • Stays in the Java World: Hibernate Envers offers a smooth transition into the auditing world, keeping your entire process within the Java ecosystem. This is particularly advantageous if your application primarily revolves around Java.

  • Automatic Table Generation: Envers streamlines the auditing setup by automatically generating auditing tables, even for databases like PostgreSQL. Tools like LiquiBase can further simplify this process, making it a breeze to configure.

  • Database-Agnostic: One of the standout features of Envers is its database-agnostic nature. It doesn’t confine you to a particular type of database, ensuring flexibility across various database systems, whether you are working with e.g., PostgreSQL or MariaDB.

  • Revision at Point in Time: Envers empowers you to access data as it existed at a specific revision, offering granular insights for historical data analysis or diffing.

Cons:

  • Performance Overhead: Envers introduces an additional layer of SQL operations for each transaction, including insertions, updates, and deletions. While these operations are essential for comprehensive auditing, they may potentially impact the overall performance due to their synchronous nature.

  • Limited Data Source: Envers primarily captures changes initiated within your application. This means that changes made through SQL consoles or other external applications might go unnoticed.

  • Lack of Bulk Support: Envers does not provide support for bulk updates or deletions, which can be a limitation in use cases where such operations are a common occurrence.

Break out of your java comfort zone: Database triggers

If you are not afraid of leaving the cozy safety provided by Java abstraction layers, want to improve the performance of your auditing solution, or desire more customization, trigger-based auditing could be the right choice for you. Let’s look at how we can transform our dog example to use trigger-based auditing. In a first step, we want to record any type of change in a separate dog_audit_log table which contains a new_value and old_value column with type jsonb. We use jsonb as the column type here for simplicity and durability reasons: if a column is added or removed from the dog entity, it will not affect our audit log table.

Two tables, the left one depicts the entity 'dog', the right one the audit log

In order to achieve this, we need to implement a function in PostgreSQL database which will transfer the committed changes. Additionally, we need to also implement a trigger which will execute this function if a INSERT, UPDATE or DELETE statement is committed.

CREATE TRIGGER dog_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON dog
FOR EACH ROW EXECUTE FUNCTION dog_audit_trigger_func()

There are also more automated solutions available that use database triggers, e.g., Debezium. It is Kafka based and extracts the CDC events from the binary log of the database.

To retrieve the audit data, one must “manually” query the dog_audit_log and join it with the dog table.

Pros:

  • Decoupled Architecture: One of the most significant advantages of trigger-based auditing is its decoupled architecture. It operates independently of your application, providing a non-intrusive auditing solution.

  • Comprehensive Auditing: It captures all changes, not just those initiated within your application. This breadth makes it suitable for auditing data from various sources, ensuring that no change goes unnoticed.

Cons:

  • Not Coupled with Hibernate: Trigger-based auditing is not tightly integrated with Hibernate, which may make it a less convenient choice if your application heavily relies on Hibernate for data management.

  • Cascading Trigger Hell: Be aware of possible side effects of other database operations that could initiate cascading triggers.

  • Metadata Retrieval Challenges: Extracting metadata can be more challenging. This may require additional effort and customization to achieve the desired results.

Keep it simple: JPA Data

For basic auditing requirements, JPA Data offers a straightforward solution. It enables you to track when an entity was created or modified, providing basic auditing capabilities without the intricacies associated with Hibernate Envers or database triggers. To achieve this, we can use simple annotation provided by JPA Data. Additionally, we need to add the @EnableJpaAuditing annotation to our configuration class. Let’s look at our example dog entity again:

@Entity
@EntityListeners(AuditingEntityListener.class)
public class Dog {

	@Id
	private Long id;

	private String name;

	private Integer height;

	@CreatedDate
	private Instant createdAt;

	@CreatedBy
	private String createdBy;

	@LastModifiedDate
	private Instant modifiedAt;

	@LastModifiedBy
	private String modifiedBy;

	//Getters and setters

}

We simply implement four more attributes in our entity which track when and by whom it was created or modified. In contrast to Envers and trigger-based solutions, these attributes are not saved in a separate table and do not include a snapshot of the previous state. That means that JPA Data does not keep a history of revisions of your entity anywhere, only when and by whom the current version was created or last edited. While this solution does not offer the level of customization provided by hibernate, it is possible to extend or modify its capabilities by, e.g., modifying the used EntityListener. Since we do not save snapshots of an entity, we can only query the current entity object, including when and by whom it was modified last or created.

Pros:

  • Simplicity: JPA Data shines in its simplicity. It’s easy to set up and use, making it an excellent fit for projects with basic auditing requirements.

  • Performance: Due to its lightweight nature, JPA Data introduces minimal performance overhead, ensuring that your application’s performance remains largely unaffected.

Cons:

  • Limited Auditing Features: JPA Data, while simple and lightweight, offers only basic auditing capabilities. It is suitable for straightforward use cases but falls short when it comes to more advanced auditing requirements.

  • Inability to Capture Delete Operations: JPA Data does not capture delete operations, limiting its effectiveness in comprehensive auditing scenarios where tracking deletions is essential.

Conclusion

In conclusion, selecting the right auditing approach depends, as always, on your project’s specific requirements and priorities.

Hibernate Envers offers seamless integration, keeping you firmly in the Java world, but it comes with potential performance overhead.

Trigger-based auditing is the go-to-choice for capturing changes from various sources, but its setup may be more complex, and it lacks tight integration with Hibernate.

JPA Data is ideal for basic auditing needs, prioritizing simplicity and minimal performance impact. However, it may not suffice for advanced auditing scenarios.