Comparing response times from different tools

# Comparing response times from different tools In general, it's not a good idea to compare response times between [[Load Testing Tool|load test tool]]s because they can be significantly different from each other. The best practice is to redo benchmarks when switching to a new tool. However, when considering using a new tool, it may be necessary to gain confidence that the response times are accurate. It's common to see response times deviate wildly from tool to tool. Here are a few reasons why this deviation might have occured, and what you can do to rule each one out: ### Resource utilization [[k6 (tool)|k6]] is written in [[Go]], and [[JMeter]] is written in Java. Besides the difference in language, this also means that they fundamentally differ in how they handle virtual users. We have found, in our testing, that k6 is significantly more performant than JMeter in that it requires fewer resources to achieve the same level of load. [Here is a blog post](https://k6.io/blog/comparing-best-open-source-load-testing-tools/#memory-usage) on this topic, along with links to the script so that you can confirm the results for yourself. Higher resource utilization can lead to inaccurate load testing results as it can make the load generator the performance bottleneck, instead of the application under test. Here are some ways to rule this out as the cause of the discrepancy: - Monitor your load generator's resource utilization while you run both tests. - Verify the JVM settings. JMeter requires you to tune the [[JVM]] it's running on as well as that of the load generator. - Run the JMeter test in CLI mode. JMeter's GUI is an extra overhead that is not present with k6, so it's best to run JMeter headlessly during the test. ### Throughput 20 threads in JMeter != 20 VUs in k6. As you can see in the blog post linked above, a k6 VU can execute more requests than a JMeter thread. Even if the number of "users" is the same, the load each test generated AND the resource each test required may not the same either. ### Test duration Test duration may be a bigger contributor than is apparent. Very short tests increases the likelihood of outliers skewing the results heavily. Metrics are great starting points to judge a test by, but the distribution of the results may differ from tool to tool. To rule this out: - run both tests over a longer period of time - log the raw results and graph them in a scatterplot for an ultimate view of the shape of the load ### Scripting differences What's in the script can also affect how long it takes to execute, such as: - [[Dynamic think time and pacing|Think Time]]: Did you use any timers (JMeter) or sleep (k6)? If yes, were they the same type (Gaussian, uniform random, constant)? JMeter applies timers to every sample within the scope of the timer. If you're not using think time, I'd suggest you try adding think time to both. Sending requests repeatedly, without think time, could do more harm than good with regards to load testing results since it's very resource-intensive. - [[Download embedded resources|Embedded resources]]: Did you record embedded resources in one script but not the other? - JMeter log configuration: Verify your JMeter configuration to see what's being logged. You can click on the Configure button of the listener you're using to verify this. JMeter's default log settings record more than k6's log results settings do.