Project context

GA-Lotse is a modular web application for health authorities which is intended to simplify internal documentation and external communication with citizens. Different departments are mapped in modules, which then can be configured by the health authorities. To ensure that the application meets highest security standards, the data is stored separately for each module. This and other security features – such as the Zero Trust principle – lead to intrinsic performance losses, which is why performance testing was an important part of the project.

Selecting the load testing tool

It is often the case that you don’t have to implement everything yourself, so we looked for a tool which supports performance testing. Since we want to test a web application, the tool must allow browser testing. Our additional requirements were as follows:

  • The ability to write the test code in TypeScript, as we also use TypeScript for the frontend of the application and the end-to-end tests

  • Open-source availability of the tool

  • Executability on a self-hosted server (not a pure cloud solution)

  • Good reporting to visualize the results of the tests for us and the developers.

After evaluating several tools, we decided on k6. k6 supports browser tests, enables development in TypeScript and, in combination with Grafana and through individually definable metrics, offers comprehensive reporting.

Our setup

k6 runs the performance tests and generates some metrics, such as TTFB or the duration of the individual requests. However, in order to visualize these and other test results, we needed even more tools. We chose InfluxDB as the database, as it is optimized for storing data in a time-resolved manner. To visualize the results, we used Grafana-Dashboards because k6 belongs to Grafana and it provides an interface to InfluxDB. To query the data from the InfluxDB, we used the proprietary database query language Flux. However, this is not a long-term solution as Flux will probably no longer be supported – or only supported to a limited extent – in the next major version. We decided to use the tools locally and package them in Docker containers in order to be able to run the tests hardware-independently and not be dependent on cloud providers. Alternatively, there is the option of using Grafana Cloud k6 to avoid installing the tools locally.

Performance testing with k6

A test with k6 can be executed with a Javascript or TypeScript file (see example script).

import { Options, Scenario } from "k6/options";
import { schoolEntryBrowserTest } from "@/modules/browser/schoolEntryBrowserTest";
import { schoolEntryApiTest } from "@/modules/api/schoolEntryApiTest";

const scenarios: Record<string, Scenario> = {
  schoolEntryBrowser: {
    exec: 'schoolEntryBrowserTestFunction',
    executor: 'constant-vus',
    vus: 3,
    duration: '15m',
    options: {
      browser: {
        type: 'chromium',
      }
    }
  },
  schoolEntryApi: {
    exec: 'schoolEntryApiTestFunction',
    executor: 'ramping-vus',
    startVUs: 1,
    stages: [
      { target: 3, duration: '5m' },
      { target: 5, duration: '5m' },
      { target: 3, duration: '5m' },
    ]
  }
};

export const options: Options = {
  discardResponseBodies: true,
  scenarios: scenarios,
  systemTags: ['status', 'url', 'check', 'scenario'],
  setupTimeout: '5m',
};

export async function schoolEntryBrowserTestFunction() {
  await schoolEntryBrowserTest();
}

export async function schoolEntryApiTestFunction() {
  await schoolEntryApiTest();
}

This script defines options for the test and the test functions to be executed. The options are defined as JSON. An important option which determines the course of the test is scenarios. This is where executable scenarios can be defined, thus mapping the actual test.

To define a scenario one must define a function to be executed, as well as the number of executing parallel users, which in k6 are called Virtual Users (VU). The total duration of the scenario can be determined by specifying time periods. In addition, ramps can be defined to increase or decrease the number of parallel users during the test. Another way to influence the course of the test is to set a time interval in which a specific number of VUs should go through the scenario.

Several such scenarios can be defined for a test, which are then run using different configurations. To make this definition of the scenarios easier and faster than editing a long JSON file, we have developed a builder that dynamically creates the scenario configuration and makes it available on GitHub: https://github.com/cronn/k6-scenario-builder.

Our findings

During testing, we noticed a few things which need to be taken into account. First of all, it makes sense to have a dedicated machine available to run the tests. Since performance is not only affected by the load of many simultaneous users, but also by the amount of data in the database, we created both short spike tests as well as test scenarios that have a runtime of several hours in order to constantly increase the amount of data and simulate a kind of time-lapse of the actual use of the application. These tests can be carried out much more comfortably by an external machine than on your own laptop.

In addition, the execution of a test requires sufficient resources on the executing machine. Therefore, care should be taken to ensure that there are always free resources available during the execution of a test so as not to unintentionally influence the results. We noticed this when running browser tests with some VUs. Too many browsers open at the same time turned the machine into a bottleneck. Our solution to this is to define both scenarios and browser tests which depict the same user journey, but send the necessary requests directly to the backend in order to increase the load on the backend without accessing the browser. Such API scenarios are also well suited to quickly assemble a scenario and thus get an overview of the backend’s performance.

Another insight we gained was to test in an environment which was as close to production as possible. After all, the configuration of an environment, especially a complex microservice cluster, can have significant impact on performance. In addition to running the tests from another machine and testing on a production-like environment, it was still important for us to enable testing entirely on our own laptop. This allows developers to independently develop new scenarios and provide easy access to databases and logs.

It also occurred that we had exceeded professional limits by configuring our scenarios, especially during long tests. For example, we created an unrealistic number of appointments for one day or user, or even had too many users with the same permissions. Many different parameters can influence performance and should therefore be defined as early as possible, allowing us to avoid unnecessary test runs. Nevertheless, it was also important for us to deliberately exceed the known limits to test the limits of the application and then improve it where necessary. After all, the customer may not know their professional limits, or their limits might be reached through technical errors. The application should not become unusable because the user booked one appointment too many. One lesson learned was therefore to clarify professional limits at an early stage and to observe them in the tests.

Pros and Cons of k6

We ran into problems from time to time during testing with k6. A significant limitation of developing performance tests with k6 is a lack of a debugger. k6 uses its own JavaScript engine to execute the test code, and there is no built-in debugger. The Javascript engine also has other weaknesses which you should be aware of, such as that it does not support the popular fetch API. In the context of browser tests, methods such as goto() are a weakness, as they do not always work reliably in combination with Chromium, which occasionally leads to timing problems. In addition, locators must be identified via XPaths, which is very susceptible to regression, as well as often unsightly and long. Finally, the documentation of k6 is often relatively short.

However, k6 also has many advantages. The reporting in combination with InfluxDB and Grafana works very well. Meaningful plots can be quickly created in such a setup without much prior knowledge and then be displayed in a dashboard so that the test results can be analyzed and communicated. In addition, the parallel execution of different scenarios, each of which is also executed with parallel virtual users, works very well. It allows you to create complex scenarios which map different types of performance tests, such as load tests, spike tests, and soak tests. The fact that the test options (and especially the scenarios) are described in JSON is an advantage as it provides a smooth transition to the Typescript code. You also have the option of running the browser tests in headful mode, so that problems can be detected and fixed during execution.

Summary

Since we had constantly developed both our tests and setup during the test phase, an iterative approach paid off for us. We started with two simple scenarios for application-critical modules. In these initial scenarios, we realized that we needed more metrics and plots in our reports to analyze the results. Iteratively, we then added metrics to our tests and visualized them in the Grafana board. These metrics included information such as the duration of requests, the loading times of certain pages, or even the CPU and RAM usage of the executing machine. The duration of individual requests was particularly important for us, but which information is relevant depends on the application. Metric types built into k6 allow the collection of information to be flexibly designed. Working with k6 has shown us both strengths and weaknesses of the tool. Whether k6 is the best choice certainly depends on the use case, but for us it was a suitable tool despite some significant weaknesses.