404 Tech Support

The ‘What’s under the box?’ way of troubleshooting

A visualization that helps me think through troubleshooting problems or explaining my thought process to others is what I refer to as ‘What’s under the box?’. This approach to troubleshooting harkens back to solving math problems in grade school. You might be given a series of values for X and Y and be told to complete the equation. You then have to figure out what the formula does.

X is your input into the equation and Y is your output. What would modify X in such a way to get you Y? Sticking to a very simple formula and the math explanation, we can use this example.

X Y
1 5
2 7
3 9
4 11

From this table, we can see that Y is always double the value of X plus 3, or:

Y = 2X + 3

I like to visualize this type of problem solving by thinking of a factory assembly line. You have a conveyor belt leading into a box and a conveyor belt leading out of the box. The problem is that you can’t see inside the box at all to figure out what function the box does. From the math example above, we could see 1 enter the box and 5 exits the box. Therefore, the box must be doubling the input and adding three to it.

To me, this is very similar to troubleshooting or just understanding even complex IT systems. There might be a series of boxes but usually you can control or monitor the input and usually you can see the output. By watching enough inputs and outputs, you might be able to guess what goes on under the box.

Bringing it back to IT, say that there is a setting in a complex application that is poorly documented. You’re having a problem with the software and this setting, a simple checkbox, sounds like exactly what you want to change. Your input is the normal day-to-day use of the software and the output is how it behaves. You check the box and see if it fixes the problem, the behavior of the system, and the outputted results. If it doesn’t fix the problem or changes the output negatively, you uncheck the box and monitor the output again.

The nice thing about trying this approach to troubleshooting is that it is simple and you don’t have to be the one making the changes. You could have someone that is rather shy about technology call you and explain their problem. You ask them what they’re getting and what they do to get that result. You can then try modifying the input by having them do something differently and have them explain the output they get. By doing this, you can get a clear understanding of the possible behaviors and a better explanation of the desired behavior.

“Changing the input” might sound difficult but it’s actually really simple. It could be as easy as having the person on the other end of the phone trying what they are doing on a different computer or in a different browser. If the output is the same, you can rule out the computer or browser as the problem and go on to investigate other possible sources for the problem. If the output is the desired behavior, you can then see if there is a setting within the browser or computer that explains the difference.

The “What’s under the box?” way of troubleshooting is easy but it helps explain the process to non-technical people and have multiple technicians collaborate on a problem. To explain the problem, you can share the inputs and outputs as you work through the issue.

You should also take good notes during this troubleshooting to understand the inputs and outputs. It can help document troubleshooting steps to take with the system in the future and expected outputs with given inputs. If it’s a particularly complex issue, you might be working through the problem for hours, if not days. You can then use these notes to refute any sudden solutions that somebody comes up with to see if it is something you have already tried or it will not work for a different reason. This saves you from wasting your time modifying the system unnecessarily or barking up the wrong tree.

Experienced technicians likely have their own process that they use and example of visualization. This works well for me and I have even used it to “reverse engineer” systems that I was supposed to be responsible for but had no real access to the systems.