Problems in results analysis

%% date:: [[2022-10-27]] %% # [[Problems in results analysis]] Below are some common problems in results analysis, in particular [[Analyzing load testing results|load testing results analysis]]. ## Amount of data Load tests generate a lot of data. Even when you reduce the amount of logging that occurs during a test, which you should be doing to minimize resource utilization anyway, every successful or failed request creates a data point. Depending on the duration of your test and how frequently it's executed, you may end up with gigabytes of data, but no app or program that can actually read them, much less compare them in meaningful ways. One way to solve this is to aggregate the data: consider sacrificing some granularity and processing data beforehand using something like [[Logstash]]. Another way to solve this is by putting load testing data into a database and using a data visualization tool like [[Tableau]] to pull out information from it. This is the approach of [[Stijn Schepers]]'s [[Robotic Analytical Framework]]. ## Data is imperfect Data is imperfect (because data gathering methods are imperfect). [^artofstats] A load test is a bit like an experiment; unless we're testing production, it is a recreation or simulation of the load that we expect to see, not a record of what actually happened. We __could__ just replay production traffic that we've recorded. An advantage of this approach is that it stays truer to what actually occured. A disadvantage is that there are user flows that cannot simply be replayed due to dynamic data. Scripting virtual users allows us to program more complex user behavior. There are always elements of data that are created on the basis of judgments and emotional components, so it cannot be fully trusted. Data is a snapshot at a given moment in time and place, so it is difficult to make inferences about issues that affect other times and places. ## Correlation is easy to mistake for causation "X is usually the "independent variable" that is the causer, and Y is the "dependent variable" we are examining to see how strongly X affects." "However, this convention also means we'd have to decide in advance which variable affects the other, which is easy to mistake." Strong correlation can be difficult to truly make meaningful. ## [[Cognitive Biases]] are at play at all stages of the process