
Recently I was doing some small testing by using EC2 instances on AWS and I noticed the execution time and performance highly depend on which time of the day I am running my scripts. I was using t3.xlarge
instance type as I didn’t need many CPUs and memory for my tests, but from time to time I planned to use all the resources for a short time (few minutes), and this is when I noticed the difference.
First, let’s see what AWS says about T3
instances:
T3 instances start in Unlimited mode by default, giving users the ability to sustain high CPU performance over any desired time frame while keeping cost as low as possible.
In theory, I should not have any issues or performance differences. I have also monitored the CPU credit balance and there was no correlation between the balance and the performance at all, and because these were unlimited instances the balance should not have any impact.
I have decided to start a longer sysbench
test on 3 threads to see how the QPS changes over the day.
As you can see, the Query Per Second could go down by almost 90%
, which is a lot. It’s important to highlightthat the sysbench script should have generated a very steady workload. So what is this big difference? After checking all the graphs I found this:
Stealing! A lot of stealing! Here is a good article which explains stealing very well. So probably, I have a noisy neighbor. This instance was running in N. California
. I have stopped it and tried to start new instances to repeat the test but I have always gotten very similar results. There was a lot of stealing which was hurting the performance a lot, probably because that region is very popular and resources are limited.
Out of curiosity, I have started two similar instances in the Stockholm
region and repeated the same test and I got very steady performance as you can see here:
I guess this region is not that popular or filled yet, and we can see there is a huge difference between where you start your instance.
I also repeated the tests with the m5.xlarge
instance type to see if it has the same behavior or not.
N. California
Stockholm
After I changed the instance type, we can see that both regions give very similar, steady performance, but if we take a closer look:
N. California
Stockholm
The instance in Stockholm
still performs almost 5% more QPS as in N. California
, and uses more CPU as well.
Conclusion
If you are using T2
and T3
instance types, you should monitor the CPU usage very closely because noisy neighbors can hurt your performance a lot. If you need stable performance, T2
and T3
are not recommended but if you only need a short burst it might work but still, you have to monitor the steal. Other instance types can give you a much more stable performance but you could still see some difference between the regions.