If the cartoonists are right, heaven is located in a cloud where
everyone wears white robes, every machine is lightning quick, everything
you do works perfectly, and every action is accompanied by angels
playing lyres. The current sales pitch for the enterprise cloud isn't
much different, except for the robes and the music. The cloud providers
have an infinite number of machines, and they're just waiting to run
your code perfectly.
The sales pitch is seductive because the cloud offers many
advantages. There are no utility bills to pay, no server room staff who
want the night off, and no crazy tax issues for amortizing the cost of
the machines over N years. You give them your credit card, and you get
root on a machine, often within minutes.
To test out the options available to anyone looking for a server, I rented some machines on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure
and took them out for a spin. The good news is that many of the
promises have been fulfilled. If you click the right buttons and fill
out the right Web forms, you can have root on a machine in a few
minutes, sometimes even faster. All of them make it dead simple to get
the basic goods: a Linux distro running what you need.
At first
glance, the options seem close to identical. You can choose from many of
the same distributions, and from a wide range of machine configuration
options. But if you start poking around, you'll find differences --
including differences in performance and cost. The machines may seem
like commodities, but they're not. This became more and more evident
once the machines started churning through my benchmarks.
Fast cloud, slow cloud I
tested small, medium, and large machine instances on Amazon EC2, Google
Compute Engine, and Microsoft Windows Azure using the open source DaCapo benchmarks,
a collection of 14 common Java programs bundled into one easy-to-start
JAR. It's a diverse set of real-world applications that will exercise a
machine in a variety different ways. Some of the tests will stress CPU,
others will stress RAM, and still others will stress both. Some of the
tests will take advantage of multiple threads. No machine configuration
will be ideal for all of them.
Some of the benchmarks in the collection will be very familiar to
server users. The Tomcat test, for instance, starts up the popular Web
server and asks it to assemble some Web pages. The Luindex and Lusearch
tests will put Lucene, the common indexing and search tool, through its
paces. Another test, Avrora, will simulate some microcontrollers.
Although this task may be useful only for chip designers, it still tests
the raw CPU capacity of the machine.
I ran the 14 DaCapo tests on three different Linux machine
configurations on each cloud, using the default JVM. The instances
aren't perfect "apples to apples" matches, but they are roughly
comparable in terms of size and price. The configurations and cost per
hour are broken out in the table below.
Cloud machines under test
Virtual CPUs or cores
RAM
Cost per hour
Amazon m1.medium
1
3.75GB
12 cents
Amazon c3.large
2
3.75GB
15 cents
Amazon m3.2xlarge
8
30.00GB
90 cents
Google n1-standard1
1
3.75GB
10.4 cents
Google n1-highcpu-2
2
1.80GB
13.1 cents
Google n1-standard-8
8
30.00GB
82.9 cents
Windows Azure Small VM
1
1.75GB
6 cents
Windows Azure Medium VM
2
3.50GB
12 cents
Windows Azure Extra Large VM
8
14.00GB
48 cents
I gathered two sets of numbers for each machine. The first set shows the amount of timethe
instance took to run the benchmark from a dead stop. It fired up the
JVM, loaded the code, and started to work. This isn't a bad simulation
because many servers start up Java code from command lines in scripts.
To
add another dimension, the second set reports the times using the
"converge" option. This runs the benchmark repeatedly until consistent
results appear. This sometimes happens after just a few runs, but in a
few cases, the results failed to converge after 20 iterations. This
option often resulted in dramatically faster times, but sometimes it
only produced marginally faster times.
The results (see charts and tables below) will look like a mind-numbing sea of numbers to anyone, but a few patterns stood out:
Google was the fastest overall. The three Google instances
completed the benchmarks in a total of 575 seconds, compared with 719
seconds for Amazon and 834 seconds for Windows Azure. A Google machine
had the fastest time in 13 of the 14 tests. A Windows Azure machine had
the fastest time in only one of the benchmarks. Amazon was never the
fastest.
Google was also the cheapest overall, though Windows
Azure was close behind. Executing the DaCapo suite on the trio of
machines cost 3.78 cents on Google, 3.8 cents on Windows Azure, and 5
cents on Amazon. A Google machine was the cheapest option in eight of
the 14 tests. A Windows Azure instance was cheapest in five tests. An
Amazon machine was the cheapest in only one of the tests.
The
best option for misers was Windows Azure's Small VM (one CPU, 6 cents
per hour), which completed the benchmarks at a cost of 0.67 cents.
However, this was also one of the slowest options, taking 404 seconds to
complete the suite. The next cheapest option, Google's n1-highcpu-2
instance (two CPUs, 13.1 cents per hour), completed the benchmarks in
half the time (193 seconds) at a cost of 0.70 cents.
If you
cared more about speed than money, Google's n1-standard-8 machine (eight
CPUs, 82.9 cents per hour) was the best option. It turned in the
fastest time in 11 of the 14 benchmarks, completing the entire DaCapo
suite in 101 seconds at a cost of 2.32 cents. The closest rival,
Amazon's m3.2xlarge instance (eight CPUs, $0.90 per hour), completed the
suite in 118 seconds at a cost of 2.96 cents.
Amazon was rarely
a bargain. Amazon's m1.medium (one CPU, 10.4 cents per hour) was both
the slowest and the most expensive of the one CPU instances. Amazon's
m3.2xlarge (eight CPUs, 90 cents per hour) was the second fastest
instance overall, but also the most expensive. However, Amazon's
c3.large (two CPUs, 15 cents per hour) was truly competitive -- nearly
as fast overall as Google's two-CPU instance, and faster and cheaper
than Windows Azure's two CPU machine.
These
general observations, which I drew from the "standing start" tests, are
also borne out by the results of the "converged" runs. But a close look
at the individual numbers will leave you wondering about consistency.
Some
of this may be due to the randomness hidden in the cloud. While the
companies make it seem like you're renting a real machine that sits in a
box in some secret, undisclosed bunker, the reality is that you're
probably getting assigned a thin slice of a box. You're sharing the
machine, and that means the other users may or may not affect you. Or
maybe it's the hypervisor that's behaving differently. It's hard to
know. Your speed can change from minute to minute and from machine to
machine, something that usually doesn't happen with the server boxes
rolling off the assembly line.
So while there seem to be clear performance differences among the
cloud machines, your results could vary. These patterns also emerged:
Bigger, more expensive machines can be slower. You can pay
more and get worse performance. The three Windows Azure machines
started with one, two, and eight CPUs and cost 6, 12, and 48 cents per
hour, but the more expensive they were, the slower they ran the Avrora
test. The same pattern appeared with Google's one CPU and two CPU
machines.
Sometimes bigger pays off. The same Windows Azure
machines that ran the Avrora jobs slower sped through the Eclipse
benchmark. On the first runs, the eight-CPU machine was more than twice
as fast as the one-CPU machine.
Comparisons can be troublesome.
The results table has some holes produced when a particular test failed,
some of which are easy to explain. The Windows Azure machines didn't
have the right codec for the Batik tests. It didn't come installed with
the default version of Java. I probably could have fixed it with a bit
of work, but the machines from Amazon and Google didn't need it. (Note:
Because Azure balked at the Batik test, the comparative times and costs
cited above omit the Batik results for Amazon and Google.)
Other
failures seemed odd. The Tradesoap routine would generate an exception
occasionally. This was probably caused by some network failure deep in
the OS layer. Or maybe it was something else. The same test would run
successfully in different circumstances.
Adding more CPUs often
isn't worth the cost. While Windows Azure's eight-CPU machine was often
dramatically faster than its one-CPU machine, it was rarely ever eight
times faster -- disappointing given that it costs eight times as much.
This was even true on the tests that are able to recognize the multiple
CPUs and set up multiple threads. In most of the tests the eight CPU
machine was just two to four times faster. The one test that stood out
was the Sunflow raytracing test, which was able to use all of the
compute power given to it.
The CPU numbers don't always tell the
story. While the companies usually double the price when you get a
machine with two CPUs and multiply by eight when you get eight CPUs, you
can often save money if you don't increase the RAM too. But if you do,
don't expect performance to still double. The Google two-CPU machine in
these tests was a so-called "highcpu" machine with less RAM than the
standard machine. It was often slower than the one-CPU machine. When it
was faster, it was often only about 30 percent faster.
Thread
count can also be misleading. While the performance of the Windows Azure
machines on the Sunflow benchmark track the number of threads, the same
can't be said for the Amazon and Google machines. Amazon's two-CPU
instance often went more than twice as fast as the one-CPU machine. On
one test, it was almost three times faster. Google's two-CPU machine, on
the other hand, went only 20 to 25 percent faster on Sunflow.
The
pricing table can be a good indicator of performance. Google's
n1-highcpu-2 machine is about 30 percent more expensive than the
n1-standard-1 machine even though it offers twice as much theoretical
CPU power. Google probably used performance benchmarks to come up with
the prices.
Burst effects can distort behavior. Some of the
cloud machines will speed up for short "bursts." This is sort of a free
gift of the extra cycles lying around. If the cloud providers can offer
you a temporary speed up, they often do. But beware that the gift will
appear and disappear in odd ways. Thus, some of these results may be
faster because the machine was bursting.
The bursting behavior
varies. On the Amazon and Google machines, the Eclipse benchmark would
speed up by a factor of more than three when using the "converge" option
of the benchmark. Windows Azure's eight-CPU machine, on the other hand,
wouldn't even double.
If all of these factors leave you confused, you're not alone. I
tested only a small fraction of the configurations available from each
cloud and found that performance was only partially related to the
amount of compute power I was renting. The big differences in
performance on the different benchmarks means that the different
platforms could run your code at radically different speeds. In the
past, my tests have shown that cloud performance can vary at different times or days of the week.
This test matrix may be large, but it doesn't even come close
to exploring the different variations that the different platforms can
offer. All of the companies are offering multiple combinations of CPUs
and RAM and storage. These can have subtle and not-so-subtle effects on
performance. At best, these tests can only expose some of the ways that
performance varies.
This means that if you're interested in
getting the best performance for the lowest price, your only solution is
to create your own benchmarks and test out the platforms. You'll need
to decide which options are delivering the computation you need at the
best price.
Calculating cloud costs Working
with the matrix of prices for the cloud machines is surprisingly complex
given that one of the selling points of the clouds is the ease of
purchase. You're not buying machines, real estate, air conditioners, and
whatnot. You're just renting a machine by the hour. But even when you
look at the price lists, you can't simply choose the cheapest machine
and feel secure in your decision.
The tricky issue for the bean
counters is that the performance observed in the benchmarks rarely
increased with the price. If you're intent upon getting the most
computation cycles for your dollar, you'll need to do the math yourself.
The
simplest option is Windows Azure, which sells machines in sizes that
range from extra small to extra large. The amount of CPU power and RAM
generally increase in lockstep, roughly doubling at each step up the
size chart. Microsoft also offers a few loaded machines with an extra
large amount of RAM included. The smallest machines with 768MB of RAM
start at 2 cents per hour, and the biggest machines with 56GB of RAM can
top off at $1.60 per hour. The Windows Azure pricing calculator makes it straightforward.
One
of the interesting details is that Microsoft charges more for a machine
running Microsoft's operating system. While Windows Azure sometimes
sold Linux instances for the same price, at this writing, it's charging
exactly 50 percent more if the machine runs Windows. The marketing
department probably went back and forth trying to decide whether to
price Windows as if it's an equal or a premium product before deciding
that, duh, of course Windows is a premium.
Google also follows
the same basic mechanism of doubling the size of the machine and then
doubling the price. The standard machines start at 10.4 cents per hour
for one CPU and 3.75GB of RAM and then double in capacity and price
until they reach $1.66 per hour for 16 CPUs and 60GB of RAM. Google also
offers options with higher and lower amounts of RAM per CPU, and the
prices move along a different scale.
The most interesting options come from Amazon, which has an even
larger number of machines and a larger set of complex pricing options.
Amazon charges roughly double for twice as much RAM and CPU capacity,
but it also varies the price based upon the amount of disk storage. The
newest machines include SSD options, but the older instances without
flash storage are still available.
Amazon also offers the chance to create "reserved instances" by
pre-purchasing some of the CPU capacity for one or three years. If you
do this, the machines sport lower per-hour prices. You're locking in
some of the capacity but maintaining the freedom to turn the machines on
and off as you need them. All of this means that you can ask yourself
how much you intend to use Amazon's cloud over the next few years
because it will then help you save more money.
In an effort to
simplify things, Google created the GCEU (Google Compute Engine Unit) to
measure CPU power and "chose 2.75 GCEUs to represent the minimum power
of one logical core (a hardware hyper-thread) on our Sandy Bridge
platform." Similarly, Amazon measures its machines with Elastic Compute
Units, or ECUs. Its big fat eight-CPU machine, known as the m3.2xlarge,
is rated at 26 ECUs while the basic one-core version, the m3.medium, is
rated at three ECUs. That's a difference of more than a factor of eight.
This
is a laudable effort to bring some light to the subject, but the
benchmark performance doesn't track the GCEUs or ECUs too closely. RAM
is often a big part of the equation that's overlooked, and the
algorithms can't always use all of the CPU cores they're given. Amazon's
m3.2xlarge machine, for instance, was often only two to four times
faster than the m3.medium, although it did get close to being eight
times faster on a few of the benchmarks.
Caveat cloudster
The
good news is that the cloud computing business is competitive and
efficient. You put in your credit card number, and a server pops out. If
you're just looking for a machine and don't have hard and fast
performance numbers in mind, you can't go wrong with any of these
providers.
Is one cheaper or faster? The accompanying tables show
the fastest and cheapest results in green and the slowest and priciest
results in red. There's plenty of green in Google's table and plenty of
red in Amazon's. Depending on how much you emphasize cost, the winners
shift. Microsoft's Windows Azure machines start running green when you
take the cost into account.
The
freaky thing is that these results are far from consistent, even across
the same architecture. Some of Microsoft's machines have green numbers
and red numbers for the same machine. Google's one-CPU machine is full
of green but runs red with the Tradesoap test. Is this a problem with
the test or Google's handling of it? Who knows? Google's two-CPU machine
is slowest on the Fop test -- and Google's one-CPU machine is fastest.
Go figure.
All of these results mean that doing your own testing
is crucial. If you're intent on squeezing the most performance out of
your nickel, you'll have to do some comparison testing and be ready to
churn some numbers. The performance varies, and the price is only
roughly correlated with usable power. There are a number of tasks where
it would just be a waste of money to buy a fancier machine with extra
cores because your algorithm can't use them. If you don't test these
things, you can be wasting your budget.
It's also important to recognize that there can be quite a bit of
markup hidden in these prices. For comparison, I also ran the benchmarks
on a basic eight-core (AMD FX-8350) machine with 16GB of RAM on my
desk. It was generally faster than Windows Azure's eight-core machine,
just a bit slower than Google's eight-core machine, and about the same
speed as Amazon's eight-core box. Yet the price was markedly different.
The desktop machine cost about $600, and you should be able to put
together a server in the same ballpark. The Google machine costs 82
cents per hour or about $610 for a 31-day month. You could start saving
money after the first month if you build the machine yourself.
The
price of the machine, though, is just part of the equation. Hosting the
computer costs money, or more to the point, hosting lots of computers
costs lots of money. The cloud services will be most attractive to
companies that need big blocks of compute power for short sessions. If
they pay by the hour and run the machines for only a short block of
time, they can cut the costs dramatically. If your workload appears in
short bursts, the markup isn't a problem because any machine you own
will just sit there most of the day waiting, wasting cycles and driving
up the air conditioning bills.
All of these facts make choosing a
cloud service dramatically more complicated and difficult than it might
appear. The marketing is glossy and the imagery makes it all look comfy,
but hidden underneath is plenty of complexity. The only way you can
tell if you're getting what you're paying for is to test and test some
more. Only then can you make a decision about whether the light, airy
simplicity of a cloud machine is for you.
No comments:
Post a Comment