Tech Talk: "Application Load Testing with k6" mit Daniel Knittl-Frank von eurofunk Kappacher

Tech Talk: "Application Load Testing with k6" mit Daniel Knittl-Frank von eurofunk Kappacher

Hi, I'm Daniel and the next ten minutes or so I will talk about application load testing with k6. I am GNU and Linux enthusiast, I am a part time lecturer at the University of Applied Sciences Upper Austria and during my day job I am an expert developer at Eurofunk with a strong focus on application performance.

Eurofunk is a solution provider for emergency centers and command- and control-centers with its headquarters in Sankt Johann and further office locations in Hagenberg and in Salzburg. Eurofunk employs more than 500 employees from more than 23 nations and our customers is the public sector – so, the police, fire bridges and ambulances, the industry and bigger airports. Eurofunk is a solution provider for command centers, this means that in a command center or in an emergency center everything is provided by Eurofunk, everything you see in this picture is provided by Eurofunk, starting from the desks, to the network cables, to the video walls, to the computers, and of course the software.

So, when you call the emergency hotline 112, the dispatcher will pick up and you can talk to him or her, and she will enter your information in our system, which is the Eurofunk operation center suite. It’s a large scale web application which can handle a multitude of data, has a streamlined UI for incident management, so your emergency is handled quickly. It can be used to dispatch units to incident locations and once the units are dispatched you can see realtime information about the location in a map and also the status of the units and status of the incidence.

Directly in eOCS – the operation center suite – you have voice communication directly in your browser via WebRTC, so clicking the take-call button will connect you to a dispatcher. And once the units are dispatched, the dispatchers can use also the browser to communicate with radio devices that the units have in the fields. And also, external event sources can be easily integrated into eOCS. So we have integrators for alarm systems, fire detectors, smoke detectors, CCTV and video surveillance and also external web services which provide additional information for incidents.

eOCS is mission critically. So, it’s important that it has high availability, low latency, the response times are consistent, and the information is always up to date. If you ask a dispatcher, they will tell you in an emergency situation every single second counts and that’s why we need to test our performance and we do that with the tool k6 – which is an open-source load testing tool, its AGPL licensed and k6 was required by Grafana Labs this summer. It’s built with Go and you write the tests with JavaScript. There is some support for the newer JavaScript features and there is more support coming but at the moment it's limited to the ES5.1 version – although there is module support which makes writing tests a lot easier. When you write a test, it follows usually the same structure, so we have set-up method and teardown method, which sets up data, deletes data from your system, or just handles login and logout of users. You have a summary which you can customize, which will be generated at the end of the test and most importantly you have the default function, the default function is called by virtual users of the testing framework. And each virtual user executes default function in a loop and performs HTTP requests or gRPC request and whatever you like to implement. K6 is a commandline tool, so you start the test with k6 run on the commandline and then you get an interactive view of the test while it's running, it will show logs and it will show failed durations or the remaining duration. Inside of a test you will usually want to apply checks to the request that you make. So you can apply arbitrary rules to any object in your test, usually those are the HTTP responses, so you might want to check is the HTTP status code 200 ok or is the content type JSON, does it contain an ID that matches a specific pattern or not. And these checks will automatically be rendered into the final summary where you get green check marks if all checks succeeded, but also red crosses if at least one of the checks failed. And if you have failed checks you will get the percentage of failed checks vs. the succeeded checks and also the absolute number of failed checks and succeeded checks. You can also define custom metrics – you have 4 types of metrics, you have counters, gauges, rates and trends which can be used for different use cases. So you might just be interested in how many calls do I have or you might be interested how many failures do I have compared to how many successes of one specific thing that’s not automatically captured by k6. Or you might be interested in a trend of timeseries: what is my maximum response time, what is my average response time. And those metrics are automatically rendered in the summary again. They are rendered differently depending on the type of the metric. For a counter, it’s just the number with the per second rate of the test, for the trends it’s the average, the minimum, percentiles, medium, maximum response time. You can configure this. For gauge its simply the latest or the last known value.

And once you have those metrics you usually want to apply thresholds. Thresholds work on a global level for your complete test, so you can say: "I want to fail in my test if the average response time is above 100 milliseconds. Or I want to fail my test if a rate drops below 5% of something." And the thresholds are again rendered in the summary where you get red crosses for failed thresholds or exceeded thresholds and green check marks if the thresholds was within the defined limits. That’s not enough, you can also use groups and tags to categorize request checks and metrics, or you can also use them to split required metrics. You might be interested in response times of one specific end point of your application. And you can then define thresholds for this subset of the metric. The tags can be applied per check or globally for the full test. And here you see a test summary of several metrics and thresholds. Most of them are automatically defined and captured and recorded by k6. So you get the number of checks that you have performed, you get the data that you have received from the web service, you get the data that you have sent to your web application, you get the average duration per group, you get the response time, distribution, how many requests have I performed, what was the average duration, what was the medium duration, and so on.

We can also see custom metrics at the bottom of the screen where we have the sock_frm_received and the stomp_cmd_received where you can also see the number and then a breakdown what this means in a per second rate for your test.

Furthermore you can define different workload models, depending on your application needs. So there are two different models, you have the open workload model and the closed workload model. For the open workload you control the arrival rate of new users and for the closed workload model you control the number of concurrent users that are interacting with your system. In k6 you define this with scenarios – you say: "I want to have a scenario that's using 16 users who perform the default function in a loop". Or you can say: "I want to have 100 requests per second on my application." k6 will automatically start up new users to meet your required load. You're not only limited to a single baseload or a static load, you can also define ramps – you can say: "In the first minute I start with 0 users and then after one minute I want to have 50 users. Then for two minutes stay at 50 users, then again increase it to 100 users over 1 minute." And you can do the same for the arrival rate – start with 0 requests per second, then at the end of one minute you're at 50 requests per second and then stay at 50 requests for two minutes and then again ramp it up. You can also ramp it down, of course. If you visualize this, you see for the ramping we use – virtual users – the number of users is matching what you have defined. For the ramping arrival rate the users are automatically injected by k6. You can also see it in the request duration – the bottom part of the screenshots – that the number of requests – the green background area – matches the defined request rate.

You're not limited to HTTP. So our application uses WebSockets. You can also do gRPC requests with k6, where you can connect to secure and plain WebSocket endpoints and then send WebSocket messages – you might also receive messages of course – and react on those. So you might want to generate an HTTP request, or you can record the duration that it takes for WebSocket requests to be made or for the response to be received.

There's one caveat though, that we noticed when implementing this: the cookie support is kind of broken. They're currently working on it. There is a workaround available, so you can also use cookies with WebSockets.

And of course, our application is mission critical so we want to test the performance daily, nightly, or even hourly. So we run this test in a GitLab CI pipeline with a Docker image that's executing a predefined test on an hourly basis – for example. We have defined thresholds for our application. The pipeline fails if there is a performance regression. Then we can react. We get immediately notified by email, if the pipeline fails – that's an added benefit.

And to analyze your performance tests in detail, you can also define outputs in k6, or you can use outputs with k6, which you can then integrate with Grafana. There's InfluxDB – a timeseries database – where your test simply writes all the metrics it's capturing while it's running into this database. With Grafana you can create panels, which will then visualize this data. And you can view live as the test is happening how many users are there, what are the response times, are there any unexpected errors, is there a slowdown over the duration of the test, or is the slowdown in relation to the number of users.

And with that, we can guarantee that our application runs smoothly, has low latency, consistent response times.

If you're interested to learn more, visit our website, contact us via social media and also read up on k6, on InfluxDB and on Grafana.

Thank you very much!