Latency can be very difficult to understand since it can show up anywhere in an environment. This article will focus on IO latency as it is affected by different size IO Transfers Sizes (some call this storage block size, but that is a bit misleading).
Over millions of recorded servers in the Live Optics program, the Read Ratio is 69% and the average IO Transfer size is 34.4K. Just for simplicity's sake, let’s round to 32K.
Most environments will not have a single IO transfer size. So saying “I want X number of IOPS at a 4K IO Size” is probably not likely to reflect your environment.
When it comes to IOPS in a Converged, Standard Storage Array, Blade Chassis, or similar the IO can be visualized like a Highway system viewed from the air. IO sizes will vary based on the task the application developer asked them to do and when they hit a shared data transport (like an FC or iSCSI link) or a storage controller, the effect of merging traffic should be a familiar analogy.
The two dominant patterns for IOPS are generally random and sequential. As a rule of thumb, random IOPS have a tendency to be smaller and larger ones are often sequential. However, when this merging traffic occurs even sequential workloads get all mixed in and all traffic eventually starts to look highly random with varying patterns in IO size.
IO Transfer size is not the disk formatting size. IO Transfer size is the choice of the application developer. One application might be very sensitive to IO completion (like processing money) and therefore send very small sets of data i.e. 4K or 8K in size while others are larger 256K or even 512K.
Think of a curbside at the airport. If you are traveling by yourself then you are the highest priority person to reach your destination. You order up your Uber car and you go straight to your destination as fast as you can. This is very reflective of Random IO. Smaller payloads and the focus is on completion time.
On the other hand, if you were traveling in a group then you might have chartered a bus. The bus is efficient since you and your 40 friends can all ride together, but you have to load the entire bus before you can leave. So there is more latency… or delay…
By the way, this is why you often hear that “if you have large IO, then disks will do fewer IOPS” and it generally is in a negative tone. However, if I had to get 40 people from the airport to the hotel and I did it in one charter bus trip vs. 40 Uber trips… was I less efficient in any way by taking a little more time and doing 1/40th the activity?
Neither one is wrong, but the application developers will decide and some applications like SQL can dynamically change IO size to suit the workload at the time.
The confusion comes in when we try and standardize what is considered “good” in a highly randomized environment.
Let’s say we declared that all application IO must be 10ms or less.
IOs larger than 64K will often come with increased acceptable latency. You might try rationalizing down to 64K elements. In other words, break up a larger IO’s time to complete and see what it would have looked like as 64K IOs.
A 128K IO that completes in 20ms is like 2 x 64K IOs that are completed in 10ms.
Summary
Understanding latency at the disk level can benefit from understanding its intertwined relationship to IO Size. In the following graph one can see that although latency is higher and does match a correlation to high Disk Queue, the environment's sudden jump to a well above average IO Transfer size is probably the leading contributor to one or both of these other elevated symptoms.