The ‘find the bottleneck’ approach to systems analysis

Jason

10 years ago

When it comes to Information Technology, I think understanding the concept of Systems Analysis is important for all IT Professionals. Systems analysis can be done at small scales, large scales, and anywhere in between. Whether you are talking about optimizing or speeding up one computer or a whole manufacturing process, systems analysis can give an understanding of how the system works, its desired outcomes, and any costs along the way. Systems analysis is formally a process of studying an organization, process, or product in order to determine its purposes, workflows, and goals in order to create a system that will achieve those results efficiently.

One approach to system improvement that I have long adhered to is what I call ‘find the bottleneck’. It’s a pretty simple, straight-forward concept that scales with the problem you are trying to tackle. Back in the day, if somebody complained about their computer being slow, the automatic response seemed to be ‘add more RAM’. RAM is a fairly cheap and very easy upgrade to do. It often addressed the first bottleneck in the system – the computer did not have enough free resources available, so it was using the swap file on the hard drive. Giving the computer more RAM meant swaps were less frequent and the slow process of writing to disk and reading it again was avoided.

Many people took the common solution ‘add more RAM’ and held onto it for dear life. Any time the computer was slow, they thought they just needed to add more RAM. This becomes an upgrade of diminishing returns as it made less of a difference on the system, all slots were currently full, and RAM wasn’t the bottleneck.

‘Find the bottleneck’ is my simplifying of a systems analysis process. It is more formally known as the Theory of Constraints with many more concepts, application, and management history behind it. It holds that if you do not address the constraint of a system, or the bottleneck, your investments are wasted and the system will remain at the same operating level it was before.

Example A: A factory churns out widgets. The process of making a widget moves across five different stations. Station #2 is the slowest station and is holding the process up. Even if you put substantial investments into all 4 of the other stations, you will not see any significant improvement because Station #2 is your constraint. If you incrementally improve Station #2, the entire process will improve with a throughput speed to match Station #2 until you come to the next bottleneck.

Example B: A group of people are hiking from a campground parking lot to the camp site. The average person walks at 2.0 MPH and the campsite is 4 miles away. Therefore people should arrive in 2 hours. The ranger, who went ahead of the group to prepare the site, expects people to arrive 2 hours after they arrived at the parking lot but finds himself waiting even longer. It turns out one of the walkers, Tim, was overloaded with equipment and walked to the campsite at a slower pace than 2.0 MPH. Even though others could walk faster and brought the average up, the whole group arrived late with Tim. Including additional faster walkers in the group might bring the average up but it is not addressing the problem. If the other walkers helped John with some of the equipment and raised his walking speed, the whole group would arrive closer to on time.

The Theory of Constraint is presented by Eliyahu M. Goldratt in his book The Goal, which is actually as old as I am. It uses the same model that The Phoenix Project draws from with ITIL, which I have reviewed previously, as a novelization approach to implementing process improvement in an organization.

Whether you are talking about a single system and trying to improve its performance, you won’t see a substantial improvement or get much out of your investment unless you address the bottleneck. For a ‘slow’ computer, the bottleneck might be the storage, the startup processes, the RAM available, or the processor. Switching from a traditional spinning hard drive to a solid state drive might net you the greatest improvement if the storage was the constraint in the system. However, if the computer is starving for RAM and has an old, slow processor, the switch to an SSD might not net you as much return on your investment.

If you were trying to increase the capacity of your deployment server ahead of a bulk amount of computers, you might analyze it and come away with a few ways to upgrade the system: processor, RAM, network speed, or storage. Given today’s virtualized infrastructures, it might be fairly painless to temporarily allocate additional RAM and CPUs to the system. The quantity of storage would not be a factor in determining the system’s performance but the speed of the storage would be. Given that deployment is a lot of reading from storage and transferring across the network, switching to faster storage or a faster network equipment for this bulk deployment would likely net the most increase in capacity, as would looking into multicasting or minimizing the size of your images. Sometimes an optimization isn’t throwing more resources at a problem but using them most efficiently.

I would never claim to be an expert at systems analysis or obeying the Theory of Constraint. I wrote this article to share a more formal framework for thinking problems like this through. After all, these sorts of issues seem to come up frequently in all departments of IT like Help Desk and Operations. From the single desktop system to the entire networked infrastructure and even the factory line, effectively improving the process requires finding the bottleneck first. Sometimes it might even be easier to improve some parts of the process and measuring the differences than measuring the individual components’ performances. Once you start making improvements to the bottleneck, the whole process should begin seeing improvements.