Good reference for Troubleshooting speed tips?

Does anyone have any good references they could share for increasing your speed in Troubleshooting?  Mainly looking for references on strategies.  

I watched all of the INE's VoD series for Troubleshooting v5 with Dave Smith, and he offers great strategies/tools in how to pick up key clues if something is wrong.

E.g., start trace; all *** means no route on device, if have a few hops then ***, means most likely return path problem (except if MPLS).  Divide and conquer vs Top to bottom or bottom to top, etc.

Anyone have any pointers?  This to me is the trickiest part of the lab, being able to narrow down problems quickly/efficiently in the TS section.  I would love more hands on/broken labs, but Ive been through INE's troubleshooting labs multiple times.

Comments

  • JoeMJoeM ✭✭✭

    Hi DoyleSean,

    Great question.   In my opinion, TS is where everything comes together.   If you can be very fast at troubleshooting, then there is a high probability that you can also configure it correctly.

    It sounds like you are on the right track.   For me the key, is learning SHOW and DEBUG commands.   They are the tools in your toolbox that make the difference.  

    The issue with trying to give a strategy, is that every technology has a different strategy -- which means every technology has different show/debug commands.    Have a toolkit for each technology -- remember the commands and match them with exactly what you are trying to find.     This may sound over-simplistic, but it is the difference of focusing in on a TS issue, versus repeating commands in confusion (anxiety with a clock ticking down).

    • Learn the show/debug commands -- for each technology (improve understanding of technology and how it can break)
    • Know which command(s) to use -- match it with the precise information you want to see. (less repeating of wrong TS commands)
    • Practice these commands slowly, looking for efficiency over finger typing speed.
    • .....and of course practice making a checklist of TS issues/points  (see videos)

     

    I have not seen Dave Smith's series.  I may watch them as a refresher for myself.   The videos I watched were by BrianD and BrianM.  Brian Dennis' videos changed my whole way of thinking for TS.      "Do you just want to pass the exam, or do you want to be a Network Engineer".

     

    As for as the methodology of TS (Divide-Conquer or other), I still say learn the technology(ies). Some times the issue is obvious, sometimes it does need to be narrowed down.   But what are the fundamental pieces for that given technology?  Are they configured as they should be?  etc...etc...etc.     

     

    Hope this Helps!

  • peetypeety ✭✭✭

    I'd say half of it depends on your style - are you a bottom-up kind of person (start at one end, validate everything on the first hop, move to the next hop, etc.), or a divide-and-conquer kind of person (ping the far end, work your way back, iterate from there, etc.)? Next, based on your style, build a flowchart of what you'd try, and see how well you can optimize that flowchart "name that solution in eleven commands! I can name it in 10, no I can name it in 9!"

    The other half of it is perhaps teaching yourself to grade (verify) your work in config. Go through a mock lab sometime, but INSTEAD of solving it, write out a detailed grading methodology/commands you'd use to grade the config section start-to-finish. Assume that you have baseline configs archived in a manner that you can pull up, etc., and figure out every verification command and technique you'd use. Then solve it, and grade yourself. The exercise should help you analyze questions and confirm solutions, rather than just throwing darts at the wall to see what sticks.

  • Excellent pointers JoeM and peety!

    EDIT (most of my original reply):

    I think my main problem is a) learning how to compose myself when I start to think "uh oh, Im not coming up with a solution here" and b) a structured approach to "pulling" myself back in once I start spiraling.

    I think the "what are the fundamental pieces for that given technology" is a great suggestion.  This actually made me miss a question on one of INE's TS tickets for MPLS/L3VPN because one of the routers had a /24 subnet.  I was so busy looking elsewhere I didnt think to step back and say ok what must be done for this to work.  Turned out it was something pretty simple (seems like most of the time, when its not coming together, the solution turns out to be something fairly simple).

    I really like the idea of "grading" myself with various show commands, and going through my lab again looking at various outputs in different manners. 

    Appreciate the input!  By the way, JoeM, what troubleshooting videos are you referring to with Brian Dennis?  Where those the vSeminar videos?  Its funny, I wrote an original reply, then started watching the vSeminar, and Brian started harping on things that Im trying to find answers to here; basically a structured approach to troubleshooting.  I had to laugh :).  But, its not that Im looking for "at zero seconds I will launch these show commands, and at 1 minute I will run a trace," etc. I understand I need to know the technologies above all else; just looking for methodologies/strategies to "anchor" myself should I start to get too far down the troubleshooting rabbit hole.  He had a great one too that I will keep in mind; work from the outside (source/destination) inwards.    

  • JoeMJoeM ✭✭✭

    Its really is a good question, but it is tough to give a general methodolgy for everything.  Each technology is different.  Each scenario is different.

    As with most things in life, I think it is correct to go back to the basics of each, rather than looking for trick questions.  How does it work?  Multicast is a great example. It is one technology where a candidate could spin his wheels (while the clock is ticking).   Or how about DMVPN with IPsec withwithout NAT?   A 5 minute job can become a 60 minute job if one is chasing their tail.

    For myself now. I am learning Juniper, and I find myself immediately trying to learn show and traceoptions (different style of debug).  We can use TS as a way to reinforce what  we think we know about a technology on the given platform.   Efficiency will follow.

    Apologies if I am repeating myself.   It sounds like you are on the right track.  No doubt, your configuration skills are also improving with your TS practice.

     

    By the way, do you have your lab checklist strategy down?  I remember you mentioning this.   This is a way to stay on track. Keep organized also adds to efficiency.

     

  • If your current role does not involve banging on live gear every day, it may take some time to master troubleshooting.  Obviously the more practice you do the better.

    Think back to subnetting, remember how long it would take to answer questions on subnetting?  I see it everyday with ccna guys they take a good 5 minutes trying to figure out a problem, that I can do in my head in about 2 milli seconds because you have seen it a million times over.  Same with troubleshooting, the more you see the same problem the more you remember what to do.

    Troubleshooting your home lab is one thing, but doing it live on a 700,000 user global network is another. 

    You need to have a solid understanding of the technology first before you can master troubleshooting.  How can you truly troubleshoot OSPF if you dont have a understanding of how OSPF works fully per the RFC.

  • I do think though that home labs provide more flexibility for the CCIE student than live production networks.  Do home labs give you more real world experience of course not; unless you are independently wealthy you probably dont have Nexus TOR/EOR devices connecting to ASRs connecting to other vendor Firewalls connecting to other vendor LBs, etc in your house (and what a terrible power bill that would be lol).  But for a test that focus's solely on Routers and Switches, virtual/home labs do pretty well for all of the technologies involved.

    I have spent countless hours, nights, weekends, early mornings, etc labbing like a mad scientist on real "cheap" gear, virtual gear, and rack rentals.  Doing so has afforded me the opportunity that all but working at a NOC would do; be able to directly destroy configurations (and not get fired!) and build them back in various ways to fully understand a technology.

    I used to work for a mid size employer who had very bad network regulations.  No process, no templates, and constant firefighting.  I was very proficient at troubleshootig quickly there :), although we had good NMS systems and monitoring tools (Asynga, Cacti, others).  Having moved on from that situation recently, I work for a customer that is VERY process orientated, you cant sneeze in their environment without a change window and every single command entered into the CLI must be highly scrutinzed.  Needless to say, this has minimized the requirement to troubleshoot.  Sure I get called in if another engineer has an implementation plan and they have an issue they cant figure out on their own, and most of the time I can identify the issue quickly and offer them the solution verbally.

    Short form, I know that I "know" enough to pass the CCIE.  My issue is, besides being my own worst test taking enemy, identifying speed tips for the troubleshooting section (e.g, as soon as I plop down in my seat Im doing X,Y,Z because this will tell me immediately A,B,C or save me 10 minutes later),or strategies to bring myself back quickly if troubleshooting starts spiraling. 

    I knew posting this initially that this is not an easy question to answer because its sort of an individual thing, but I was just curious if anyone had any solid strategies (outside of what has already been mentioned and what INE has discussed in their videos).  I really appreciate all the feedback though and have gathered a list of pointers for my notes.

    To sum:

    1.  Most important! Know the technologies inside and out

    2.  Create a check list at the beginning of the lab tracking tickets, points, time at start of ticket and time used once complete of ticket.

    3.  Determine the necessity to do Divide and Conquer | Top->Bottom | Bottom->Top examinations

    4.  Improve verification commands; use of debug, use of show commands with regex

    5.  Work outside (source/destination) inwards.  E.g., start with Source/Destination, check L2/L3, then look towards "weird" things in the transit path (filters, route/next hop manipulations)

    6.  Dont spend more than 10 minutes on a ticket unless you have identified THE problem and are fixing it.  Move on and circle back.  If attempting to fix, drop dead time should be ~15 minutes before moving on and circling back.

    7.  If looking at output is drawing blanks, check to see if core technology is set up in a basic manner (what are the underlying things that need to be completed for this technology to work properly); then see if there is anything funny going on.

    8.  Goes without saying, dont break the guidelines of the lab or specifically the ticket!

    Thanks to everyone for your inputs!

  • Thanks for sharing this information!!! Great points

    I would only add two comments (I think I got them from the TS cram video):

    - Introduce small changes and use the less intrusive way: This goes for metric changes, where it would be better to increase the delay by 10 rather than 10000 for instance.

    - Turn off any debugging/extra-logging in the end, otherwise it might interfere with the grading scripts

Sign In or Register to comment.