The Only Good Computer is a Busy Computer
In case you need the toaster and the kettle at the same time
To take a leaf from Toyota's book - overproduction is the worst kind of waste.
What's a computer for, anyway?
A computer's job in life is to compute. Although there are a thousand fancy instruction sets and special supplementary hardware processors, we're talking about crunching numbers. Is your computer busy computing, or is it sitting around waiting to compute?
Your overproduction of computing cycles costs money, whether you're getting your numbers crunched or not. Power, cooling, space. Network connections, storage allocations. Software licences, client licences, port licences. Systems administrators, security analysts - even Business Consultants. Resources consumed whether that machine is doing its job, or just waiting for another job to come in.
We should have a spare computer, for sure.
The Curmudgeon sees headroom laid on top of headroom laid on top of headroom. Well, we have to account for user growth over the next 3 years. We need to prepare for new software versions that could come out. We should make sure there's enough capacity for peak loading. If a machine in the cluster goes down, the remaining machines need to have enough capacity to handle all the load. There has to be enough resources to prevent conflict between guests. The hypervisor needs its own breathing room. No matter how many angels can dance on the head of a pin, it's probably safest to have a spare pin in case more angels show up.
The ridiculous upshot? Waste. Muda. Idle, idle computers. Machines built to crunch numbers just sitting around eating resources. The solution? Some risk-taking on the business side.
What if the computer was busy?
Imagine, for a moment, what you would do if you sent a request to your server, and you could actually perceive the time it took to compute the response. It depends how long it takes, and how often that occurs. If it takes 1 second, you might twitch and carry on. If it took 1 second 100 times a day, it might irritate you, or you might just accept it as part of your workflow. If it took 10 minutes you might call the helpdesk and ask if there was a problem. If it always took 10 minutes, you might pop by the watercooler.
Life before laptops
Back in the day, The Curmudgeon used to work on minicomputers to do scientific research. Everybody got a batch schedule. You'd spend your day analyzing last night's results and lining up the next night's run. You expected to be able to write code and save files during the day, but real computing wasn't done while humans were interacting with the terminals - the hard work was done while no-one was watching. If you did run a query during the day, you might stop by your colleagues' offices and apologize for taking cycles. And you could expect a big bill at the end of the month for daytime processing. Computational resources were scarce, expensive and carefully-shared.
Nowadays, cycles are cheap. Mind-bendingly cheap. But we buy so many of them that we spend more money on computing than we did yesterday. Every computer needs someone to look after it. Needs someone writing software for it. Needs a security guard at the door to its facility. To take a Marxist view of costs, we spend more human effort on the computers than ever before. But that is ONLY because we're buying so much overcapacity. If we purchased only the computing power we really needed, it'd be cheaper than ever before.
So I recommend that you consider a life where you might have to share resources, and you might have to wait a moment or two for your query. Accept that possibility on the business side, and you can actually hope to make your computers work for a living. IT cannot accept that on your behalf - if they just unilaterally decided you were going to have to wait your turn, you'd complain bitterly. IT is forced to always suggest surplus capacity, because you'll make their life hell otherwise. But the business CAN make that decision.
So, what does busy mean? It is a question fundamental to capacity planning In a broader sense, you can't isolate one functional axis of your infrastructure and scale it without considering other axes. OK, your RAM is not overallocated, but your CPU is working hard. Or your CPU is idle because your disk is choked. Or nobody gets good response because the network latency is long. To really keep your infrastructure busy, you need it in delightful balance - every piece working together, lifting and carrying.
But let's look at defining a busy CPU, for a key example.
The 'busy' CPU
To start with, if your CPU has any time at which it is not reading 100% busy, it's a waste. Stop it! Stop idling that hotrod and get it back to work! Is 100% therefore your target? No, that's shooting too low! This team should give me 110%!
But really, there's an theoretical, perfect level of balance where requests are coming in precisely as fast as the CPU can handle them, no request waits, and each request gets a full processor core to run it to completion. That would be 100% utilization, and every request would be perfectly met - as perfectly as if the entire CPU were idle when it occurred.
But that kind of 100% is not really making the most of your infrastructure. If you don't have a queue waiting for the CPU, the CPU is going to be idle at some point. You should have a queue depth. So here's the critical question - how deep a queue?
How long can you wait?
This, in fact, is not an IT question, it's a business question. Back in my minicomputer days, the accepted answer was "I can wait overnight." A queue depth of 16 hours was perfectly reasonable. To be honest, we rarely actually managed to keep the machines quite that busy. But batch runs of 8 to 10 hours weren't uncommon. Very occasionally, some group would fire off a job that would actually last longer than their batch schedule, and the job would have to be paused to wait for the next night to complete.
But nowadays, when you're not doing anything as trivial as calculating radiation doses for cancer patients or projecting an intercept position of Soviet submarines under the Arctic ice, you certainly don't expect to wait overnight to get your spreadsheet to refresh that pivot table of EI contributions or to transcode the corporate promotional video from MJPEG to H.264.
So, set your queue depth at something modern. Since every little iota of work in our 'modern' bloatware takes the computational equivalent of tallying the Domesday Book, you probably don't want to wait too long for each little 'atom' of computing to finish. I'd recommend 100 milliseconds as a responsiveness guide. Somewhere around 100 milliseconds of queue depth will transition from imperceptible wait time for your computing to 'laggy'.
Waiting is not a hardware spec
You might note we've not talked about core count or clock speed or context switching or any of the other arcane considerations of CPU performance.
Is a quad-core CPU twice as fast as a dual-core? What if the dual-core is running at twice the clockspeed of the quad-core? What if there was a single-core CPU that had dedicated hardware subprocessors for your workload, while the dual- and quad-core machines had to run that computation in software?
If there are 4 processes waiting in a queue for that first quad-core machine and the exact same 4 processes waiting in a queue for that second double-speed dual-core machine, is the dual-core machine twice as heavily loaded? You should definitely be running at 100%, and you should definitely have a nonzero queue depth, but what depth is 'heavy' loading and what depth is 'light' loading is gauged better by your wait time than by some simple characteristic of your machine.
Benchmarks are great value
I love SPEC.org's ethos: an ounce of data is worth a pound of hype. SPEC's test suite runs a huge variety of different processing tasks and amalgamates them into a single result by which to get some idea of the real world performance of your machine. Now, granted, their test suite is not going to resemble your environment's computational workload. But the measured performance of a system under a variety of conditions is a far, far better metric than e.g. core count or clock speed.
So, if you're trying to predict how much capacity you'll need in the next generation of equipment, don't look at the numbers in the brochure - use the independent test results.
Delete your headroom
Chances are, you're already swimming in overcapacity on some axis of your infrastructure's performance. Really, don't be surprised if you find you've got hundreds of times as much power as you need - with capacity doubling by Moore's Law - every 2 years or so - some kinds of capacity are sure to have long-outstripped your requirements. Find those places where you're overprovisioned and strip them all down. Tear out the redundant capacity and the backup capacity and the futureproofing capacity and the contingency capacity. Remember, IT can't volunteer to do this for you - you would complain if you ran into one of those contingency scenarios and it hadn't been your decision to risk the slowdown. Stick your neck out and take a little chance. Accept that if everyone logs in at precisely 08h30 on a Monday morning that some people are going to have to wait 100 milliseconds. Let the Comms department know they might have to get a coffee if they're going to upload a half-hour talking-head video of the CEO to the website.
The Curmudgeon Guarantee
If you pare your system down to the essentials for getting your work done, you will, I guarantee it, find that you need fewer computers, fewer licences, fewer administrators, and fewer dollars. If you insist your computers keep busy, if you have an ongoing policy of reducing the muda that is IT overproduction, your IT costs will converge towards $0. And perhaps you could spend that huge pile of money on improving your business.
If nobody in your IT organization has already suggested this (and who would want their own budget cut, honestly?) - you should come talk to MYRA. We're so certain of finding savings that we can make the venture self-funding. Ask us how!