Ken MuseALM | DevOps Ranger & Azure MVP

Correctly Sizing Azure Virtual Machines


If you're used to high-control virtualization environments, the fixed nature of the options for sizing virtual machines in Azure can be a bit confusing. One of the benefits of cloud environments like Azure is that it's possible to find the right size for your loads and thereby save money. In today's post, I'm going to show a practical example to help you better understand how to size an Azure VM.

As an example, let's take a legacy database server with 16 cores, 64 GiB of RAM, and a need for moderate-to-high disk throughput.

It's All About Location

Location is everything ... even in Azure. Each location has different machines available and different pricing for the machines. As a result, the location you pick could limit your options. If you have a workload that is not sensitive to the location or has limited egress/ingress, you can take a moment to shop around and find the best price for your virtual machines. The pricing can be substantially different. At the time I'm writing this articleI'm looking at the prices in West US. An A2M_v2 in that region costs 34% more than one in West US 2, and getting a D12_v2 in East US 2 will save you nearly 20%!

Selecting the Disk

It's important to understand what you need from the disk. Generally speaking, you will want to always choose managed disks. There's a few reasons for this, but one of the most important is that they have very guaranteed performance characteristics, high availability, and high durability (approaching a 0% failure rate). This really leaves us with one choice: Premium (SDD) or Standard (HDD) Managed Disks. Understanding how to select the right performance characteristics is a subject for a future article. For now, there are a few important things to know:

  1. In addition to larger storage amounts, the higher Premium tiers support increasing amounts of IOPS and throughput per disk. Consequently, selecting the right size of disk is not just about the amount of storage -- it's about understanding the maximum sustained performance. This ranges from P4 (120 IOPS and 25 MBps per disk) to P50 (7,500 IOPS and 250 MBps).

  2. The IOPS and throughput of Standard disks is not provisioned. The performance varies according to the size of the VM, with the bandwidth generally capped at 60 MBps and the IOPS limited to 500 IOPS per disk.

  3. The VM selection itself can constrain the maximum throughput and IOPS.

For most databases, you will typically want to choose a Premium disk in order to have provisioned and guaranteed performance characteristics. This limits you to only the series of VMs which have an "S" ... DS, DSv2, GS, Ls, Fs, and ESv3. Let's narrow that down a bit further.

Selecting Your Series

So we've narrowed our search to machines that accept premium disks. What next? We need to look at each series and select the right one. There's several things to consider. First, you may need to examine the maximum number of disks if you need storage over 4 TiB or if you need high IOPS (via RAID). Second, examine the network performance associated with the machine. Different sizes and series of machines will support varying amounts of network performance. Let's examine each series in our list of candidates.

Series ACU per vCPU vCPU : Core Purpose
D 160 1:1 General compute, legacy tier. Supports up to 16 cores and 112 GiB RAM.
DSv2 210 - 250 1:1 General compute. Ideal for most OLAP database workloads. Supports up to 20 cores and 140 Gib RAM.
DSv3 160 - 190 2:1 General purpose workloads. Reduced cost using hyper-threaded vCPUs instead of cores running on customized Xeon. Supports up to 64 vCPUs (32 cores with hyper-threading) and 256 GiB RAM.
Es 160 - 190 2:1 Extended memory. Ev3 was the D11-D14 tiers. Higher available memory than the D-series, supporting up to 64 cores and 432 GiB RAM.
GS 180 - 240 1:1 Goliath. More RAM per core and faster disk speeds than D-series. G5 is dedicated hardware, supporting 32 cores and 448 GiB RAM.
Ls 180 - 240 1:1 Low latency storage (uses local SSD drives, which can out-perform Premium SSD storage). Supports up to 32 cores, 256 GiB RAM, and 5,630 GiB temporary drives (SSD). Max temp drive throughput is 160K IOPS/ 1.6 GBps.

For a basic database workload, the D-series is usually a good starting point. For a long time, this has been the main workhorse for general purpose computing loads. Starting with the DSv3 series, the virtual CPU (vPU) is actually separated from the underlying physical cores. In this case, 1 vCPU in a DSv2 represents a 1/2 core (hyper-threaded) and has a correspondingly lower price point. The V3 series has the ability to incrementally increase the computing power, reaching higher performance levels than the previous V2 machines.

The Gs-series ("Goliath") are specifically optimized for large database workloads. Consequently, they have significantly larger RAM quantities and temporary disk sizes. Strictly on ACUs, the CPU is not as powerful as the DSv2 series and the prices are significantly higher.The tradeoff is consistency in performance. Additionally, a G5 (32 cores) is dedicated and isolated to a single customer. Consequently, these are more expensive units, and they do not get a discount for Reserved Instances. If you need more RAM (or higher MBps) than the D-series, then you should also consider these machines.

The Ls-series ("low latency") are optimized for high disk throughput and I/O for processing large amounts of data. They are ideal for large databases and Big Data solutions. This series utilizes local SSD drives for temporary storage, increasing the maximum throughput significantly compared to the normal Premium SSD solutions. It is optimized for loads requiring low-latency disk access.

Since we don't have a need for low-latency or semi-dedicated hardware, we'll look at the D-series.

Understanding ACU

Finally, it's time to review the Azure Compute Unit (ACU) rating for any series we are considering. The ACU metric provides a quick way of gauging the relative performance of different virtual machine sizes without knowing the underlying hardware details. For example, a rating of 200 is twice as fast on a typical load as a rating of 100. You can see from the ACU ratings above that each vCPU of a DSv2 is around 35% faster than the original D series.at the options we selected above in a bit more detail:

The maximum ACU performance is determined by multiplying the number of cores against the ACU rating for the series. Consequently, we can easily see how a few of these systems will compare if we place these values into a chart:

Instance vCPU ACU Total ACU RAM (GiB) Temp Storage (GiB) Max Data Disks Max cached and temp Max uncached disk
IOPS MBps Cache IOPS MBps
DS13_V2 8 210 - 250 1,680 - 2,000 56 112 32 32k 256 288 25.6K 284
D16S_V3 16 160 - 190 2,560 - 3,040 64 128 32 32K 256 400 25.6K 384
DS14_V2 16 210 - 250 3,360 - 4,000 112 224 64 64K 512 576 51.2K 768
DS15_V2 20 210 - 250 4,200 - 5,000 140 280 64 80K 640 720 64K 960
GS4 16 180 - 240 2,880 - 3,840 224 448 64 40K 800 2112 32K (64x500) 1000

Placed into a chart like this, it becomes easier to see our options -- and how we can scale the machine up or down to get the right level of CPU utilization. Notice that although the DS16_V3 has 8 physical cores (and 16 VCPUs), it has noticeably higher ACUs than the 8-core DS13_V2. By reviewing the ACU range, we can determine a good starting point and how to scale up or down as needed.

As a starting point for our legacy database, a D16S_V3 is a good starting point with 16 VCUs and just the right amount of memory. If we need the parallel processing power of 16 physical cores, then we would need to step up to the DS14_V2.

Selecting the RAM

Each VM size in Azure supports a specific amount of RAM. This means that as we choose a specific VM size, we're also choosing the associated memory. If you're coming from a VMWare world, this is quite a change. Since we only need 64 GiB, the D16S_V3 remains a good starting point.

A Shortcut

To save some time, let's use PowerShell to help us in our search. Assume we've picked Central US for our region. We can use a simple script to find machines that match specific sizing needs:

Login-AzureRMAccount
$region = 'CentralUS'
Get-AzureRMVMSize -Location $region `
   | Where { $_.NumberOfCores -ge 16 `
     -and $_.MemoryInMB -ge 65536 `
     -and $_.Name -match '_[DEGFL][^_]*s' } `
   | Format-Table

This will provide you a formatted table of virtual machines that meet your requirements. Notice that unfortunately you cannot query for machine sizes that support premium disks except by name. As a result, this search uses a very simple regular expression to match the naming conventions for the sizes that support premium storage.

Go Forth and Scale!

Finding the right VM size on Azure can be challenging, but with a little bit of research it's easy to identify the series and size that will be right for your specific load. The great thing about Azure is that if you find you've created a machine that is too large or too small for you needs, you can simply change the VM size. This is part of the power of elastic compute resources -- they can quickly be scaled up or down to meet your needs. This makes it incredibly easy to right-size your virtual machine to match its load.

Until next time, Happy DevOp'ing!