Since the inception of server and storage performance assessments dating back to early 2000, industry experts have continued a common debate:
How long should a performance assessment run to provide enough data to properly describe an existing workload in a production environment?
One week? Several days? 24 hours?
Many hold differing opinions on the answer to this question. However, on the Live Optics team, we have concluded that the correct answer in most cases is 24 hours. Let’s explain why.
Since 2008, the Live Optics team has observed literally millions of server performance runs. Additionally, many team members were participants in other server/storage monitoring projects prior to their roles in Live Optics. This collective experience has conclusively shown, that in the vast majority of performance runs, variability from one business day to another, is not significant.
The chart below, shows IOPS over a one week period.
The variability between each of the business days in this chart is within +/- 10%.
Our experience has shown that the larger the environment, the less variability exists day to day.
Outlier workload variability can sometimes exist where the workload dramatically changes on a single day for some number of hours. The most common outlier is a backup operation.
On the Friday night in this example, a backup operation takes place.
In this particular case, the backup does not significantly alter the overall characteristics of the workload, or significantly alter the read mix, the average IO sizes, or 95th percentile IOPS for the week.
However, in some cases, the backup operation can be considerable different from the nominal operation during the business week. In most cases, service during backup windows, which often occur in off business hours, have different service level agreements than nominal operation hours. In other words, sizing the workload for backup has a different set of requirements than the nominal operation.
In all cases, our best practice advice is to ask the admin if there are known outliers, like a backup operation. We recommend running Optical Prime on a standard business day for 24 hours, and then again later on an outlier day, such as a weekend day, also for 24 hours.
Now, one could argue that why not run for a full week, even if the information is redundant. The reason against this is straightforward:
Multiday Optical Prime collector runs, especially in large environments, generate a large amount of data, potentially gigabytes worth of information that must be processed. The resulting Live Optics projects are difficult to load and manipulate.
Additionally, the assessment process gets dragged out unnecessarily for seven days.
Optical Prime allows multiday runs only when the live streaming option is selected. This guarantees that users can observe the data days before the collection has completed, kicking off the sizing conversation without the unnecessary delay.