Server metrics - Fork My Brain

%% date:: [[2020-12-20]], [[2021-02-23]], [[2022-10-27]] %% # [[Server metrics]] Server metrics, unlike [[Load testing metrics]], are not generated by the load testing tool, although some [[Load Testing Tool|Load Testing Tools]] do also report server metrics. They are measured by the operating system of the machines involved, including both the load generators from which load is generated as well as your application servers, which receive the load. What you’re looking for in resource utilization metrics from the load generators is a way to determine whether or not the load generators were a performance bottleneck. Ideally, you will want all the metrics to show that the load generators were not overutilized, which means the results of your test will be valid. If they were overutilized, you will want to fix the issue and re-run the test. The resource utilization metrics of your application servers, however, will show you how easily your application handled the load and will give you an idea of how much more it can handle. If you’ve identified a bottleneck, these metrics will also give you a clue as to where to start your investigation. ## Processor **CPU utilization** is how much of the machine’s processing power was being used during the test. This indicates whether a server was struggling with the tasks it was carrying out at the time. In a load generator, you’ll want to make sure you stay below 80% utilization for most of the test. In an application server, consistently high utilization may suggest that you need to allocate more CPU towards the server. **Processor Time** - how much the processor is being utilised **Processor Interrupt Time** - how much time the processor is spending to handle interrupts **Processor Privileged Time** - the time the processor spends handling overhead activities - [[Queue Time]] **Processor Queue Length** - the number of threads that are waiting to be executed ## Memory **Memory utilization** is how much of the machine’s memory (RAM) is being used up. Sometimes this is measured in percentages (80% memory utilization means 20% of the memory was not used) or this can be expressed in terms of available bytes (the amount of memory that was not used). Consistently high utilization (again, >80%) could point to a memory leak within the server. Memory leaks are often only spotted in longer tests, which is why it may be worthwhile to extend the duration of your tests. High memory utilization could also be a good reason to consider allocating more memory to the server. **Memory (Available Bytes)** - unused memory available to process new requests **Memory Cache Bytes** - the size of the data stored in memory for quick retrieval **Process - Private Bytes** **Process - Working Sets** ![[Performance Testing Basics What Is Resource Utilization#^17cce0]] ## Disk Disk bottlenecks are usually related to the time it takes to read and write to a disk rather than the disk _space_. **Disk I/O Utilisation** metrics are useful because they measure how quickly data is transferred from memory (RAM) to the actual hard disk drive and back. This can be expressed as a rate (reads/sec or writes/sec), a percentage (busy time is the percentage of time that the disk was actively being used), or even a number of requests (queue length). Requests that cannot immediately be processed are assigned to a queue to be processed later, and a high amount of requests in this queue can be taken as a disk utilization issue on the application server because that means it can’t keep up with the number of read/ write requests. These metrics are more useful for the application server rather than the load generators. **Disk I/O** - reads and writes to the disk during the test **Disk Idle Time** - time that disks are not doing work **Disk Transfer/sec** - average number of seconds that an I/O request takes to complete **Disk Write/sec** - average number of seconds that a write request takes to complete ![[Performance Testing Basics What Is Resource Utilization#^83460173]] ## Network **Network throughput** is similar to the transaction rate in the sense that it tries to measure how much load is being put through the system; however, it does this by measuring the amount of data in bytes that is being delivered by your application servers to the load generator. High network throughput is only a concern if it is equal to or hitting up against the maximum bandwidth of the connection. **Latency** is the portion of the response time that accounts for the “travel time” of information between the load generator and the application server. It can be influenced by factors such as network congestion and geographical location and is notoriously difficult to measure. While it’s ideal to have low latency, having high latency does not necessarily render a test void; it’s still possible to determine the actual server processing time by subtracting latency from the overall response time. **Network I/O or Data Transfer** - bytes sent and received **Network latency** **Network round trip** - time taken for a request to be sent by a client and a response to be returned by the server ## For load testing If you’re not sure where to start: At a minimum, you’ll need the CPU and memory utilisation of every major component that’s involved in the processing of requests. These two metrics are vital and if either of these is consistently maxing out at (or close to) 100%, that’s a sign that the component is struggling with the number of requests. CPU and memory over-utilisation is a very common reason for less-than-ideal response times. ## References - [[Performance Testing Basics What Is Resource Utilization]] - _Performance Testing Microsoft .NET Applications_, by [[Microsoft]]