IT Troubleshooting – Part 1.

Introduction

 

IT Troubleshooting - Part 1

 

This is the first installment of a three-part series, intended to teach you how to become an expert at technology troubleshooting. The first part covers the proper mindset to achieve long term success. Here you will also find some helpful tools and website recommendations on troubleshooting.

This training document originates from a technical course we teach at Snowball. We use the techniques in this document to troubleshoot complicated IT and Internet related problems. Many of the techniques here can also be applied to other disciplines.

In order to implement troubleshooting efficiently, you need skill, tenacity, and experience. You also need information and background knowledge of the problem.

In technology troubleshooting you will often come across these questions:

  • When did the problem occur?
  • What is the actual error?
  • Can you easily reproduce the problem?
  • What steps have you already taken to try and resolve the problem?
  • Is the problem consistent or intermittent?

These questions aim to start the discovery process. The more precise the information given, the better. Never be afraid to ask as many questions as possible!

Expert troubleshooters also know how to reverse engineer. To reverse engineer, you need to know how to build up in the first place. You can teach yourself building up by installing, troubleshooting, and breaking down systems.

Attributes Required

 

Here is some more information about the personality attributes required to become an expert troubleshooter.

Experience

The best way to acquire experience is by trial and error. Expert engineers gain their knowledge by spending late nights (often at home) working with every single possible configuration. As you ascend in the troubleshooting hierarchy, you might be the sole person responsible for solving complex problems. This increased pressure often helps, as it accelerates the learning process. Being able to work under pressure is a must-have.

Tip: Never experiment on a live system. Experiment on your own router, VM (Virtual Machine) or create your own lab. If you are going to experiment on a live network or system, make sure you have sufficiently informed at least one other senior person to get a second opinion.

Never be scared of computers. People who are intimidated by the intricacy of how computers work, never progress very far with discovery of the issue. When I was young, a friend of mine, Uli Deutschlander, taught me the most valuable lesson I could ever learn about computers. Uli said: “Don’t be afraid of the computer. If you make a mistake, you can just switch it off and on again and it will forget.”. Adopting this attitude from an early age meant that I was never scared to experiment and always learnt the maximum.

Skill

The best way to acquire skill is by studying. Don’t think by just Googling you will become an expert. Google will help you in a tight situation and infuse you with quick bites. However, teaching yourself the theoretical knowledge from start to finish, will lay a solid foundation for most complex subjects. Books and courses are written to be constructive from beginning to end – laying out these the topics in an incremental fashion. You need this solid foundation to get deep insight into complex stuff. You can’t do this by only using Google. Being very skillful is more like Lego blocks – you have to build up all the pieces in order to see the complete picture. Google just gives you insight into pieces of the entire puzzle.

I once worked on the same problem for almost a year. After about 10 months I just didn’t make any progress. So instead of trying every single thing I had already done “again”, I found a really good book on the topic. Even though the book never covered exactly what my problem was, the solid theoretical foundation meant that I was able to solve the problem. Without the book, I would have probably struggled much longer.

Tenacity

Tenacity means “persistent determination”. Some people are born with it and others might have learned to adopt this mindset. Great engineers have excellent tenacity. They are also realists and understand when the problem needs to be handed over. Bear in mind, handing over the problem doesn’t hand over the responsibility. You are probably still the first point of contact for the client. Tenacity goes hand-in-hand with patience.
This goes a very long way towards solving complex problems.

How to Troubleshoot

 

Always try to view the problem from the source.” – Eugene van der Merwe
Do not underestimate the power of Double Checking.” – Eugene van der Merwe

Problems are best caught as they occur. If you are unable to catch the problem as it happens, you have to try to access a historical log file. If the log file does not exist, you have to try and simulate the problem. If you cannot simulate the problem, try to establish a pattern. If you can not achieve this, you cannot simulate the problem. Bear in mind that  the problem happens intermittently, especially not often. This means that you have a complex problem at hand. At this point, the only possible remaining option would be to recreate the problem in a lab environment or find a workaround.

In software and operating systems, a valid workaround is often to reinstall the entire system, simply because the complex interaction between multiple programs and services.

In summary

 

  • Catch problems as they happen
  • Always find out exactly when and what time the problem first occurred
  • Check the log file
  • Simulate the problem
  • Establish a pattern
  • Establish if this pattern is intermittent
  • Recreate the problem in a lab environment
  • Think about workarounds

Troubleshooting goes hand-in-hand with escalation. If you don’t know how to escalate properly, you will have a poor turnaround time for troubleshooting problems.

Tools and Tips for Troubleshooting

 

Tip: If you can simulate the problem, your chances of solving the problem is much increased.

One of the key skills is the process of isolation. You need to be able to break a problem down into all it’s parts and start isolating the bits one by one.

The Log File

The log file is the most important tool that you can use. Almost 100% of software systems have some kind of a log file. When using the log file, it is important to see that the verbosity is raised. Conversely, if the verbosity is raised too much, it must be lowered or filtered. Memorize the commands to filter log files like tail and egrep.

The Running Log File (Your Own Notes)

Keep track of what you are doing when working on a complex problem, especially if you have to revert back to a previous point. At the end of your session, this running log file you have created can also become documentation in case you have to solve this problem in the future.

Google

Google works best if you can use exact pattern matching. For example, if the log file contains something detailed like “18 Dec 2012 21:16:13 exception from hresult 0xf004f006” Google could contain references.

Google Tip: Know and understand the difference between the following Google searches:

  1. exception from hresult 0xf004f006
  2. “hresult exception 0xf004f006”
  3. “hresult 0xf004f006” exception

In our next installment we’ll give you the answer to this question.

Forums

Forums provide an interactive way of engaging with a community that uses similar software and / or technology. This is a great way to reach out to others who might be experiencing the same problem. Some well known forums are:

  • MikroTik
  • Parallels
  • VMware
  • WHMCS

Most major software platforms have forums, and certainly most of the software Snowball use have forums. As an engineer you are judged by how many forum posts you have made.

Tips for Forums

  • Do not use a generic or shared username, use your own identity
  • Build up forum reputation by interacting with the forum
  • You can only improve your forum logging skills by spending more time on a forum. It is literally just a case of practice makes perfect.
  • Don’t just read forums – participate!
  • Be aware of forum etiquette, which is normally clearly spelt out somewhere on the platform
  • Make sure you have explained in detail how to reproduce the problem. Don’t be lazy!

Final word on forums: Great engineer have logged many forum posts.

Stack Exchange

Stack Exchange is a more challenging type of forum, because you get scored and rated on your activity. All engineers are required to log at least one Stack Exchange site question per week and e-mail this to their manager.

Snowball judges engineers by the amount of score they have obtained on the Stack Exchange family of sites.

Internal Escalation

 

Teamwork is essential in hardcore technical environments. Consult with the members of your team – perhaps someone else has had the same problem.

Great managers demand that problems are logged on forums or Stack Exchange before escalation. If time is limited, this would depend on the urgency.

External Escalation

 

Never hesitate to contact external parties. Examples of these would be:

  • External Contractors
  • Paid-for support
  • Vendor, e.g. Microsoft / VMware support
  • Supplier technical contacts

Put yourself in the client’s shoes. Would the client want this problem resolved? The answer is obviously Yes, so sometimes you have to do and pay whatever, just to get the problem solved.

Final Escalation Tip: Never let the lack of escalation affect your performance in your workplace.

Books

 

The best engineers are those who have a deep understanding of the subject at hand. There are various books on technical MikroTik, TCP/IP + Networking, Linux, Postfix, CCNA, etc. All the best troubleshooters I have met, have a deep theoretical understanding of the topic at hand because they studied it.

Comment below if you would like us to contact you or if you need more advice on IT Troubleshooting.

Eugene