if-state nhrp

Hi Folks,

I'm trying to find out more about the if-state nhrp option for DMVPN, and more specifically how it interacts with front-door VRF.  Googling cisco.com just brings up minimal info that it basically tracks the NHRP NHS and if it's unavailable, it downs the tunnel interface.  I'd like a little more meat.  Specifically, whether it matters which routing table the NHS is rachable via.

I'm troubleshooting a spoke with this config on it that occasionally shuts it's tunnel interface down, and it *appears* that simply pinging the NHS (via global table) wakes it back up.  Kludge fix would be to setup an sla pinger if this is the case.  I doubt that would fly as a true solution, though.

Anyone have info or pointers to docs?

Comments

  • JoeMJoeM ✭✭✭

    I found this doc that says it the most clearly.  Below I did a couple of tests (with and without).


    http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/security/d1/sec-d1-cr-book/sec-cr-i1.html#wp1383303258

    if-state nhrp
    Usage Guidelines

    If the system detects that
    one or more of the Next Hop Servers (NHSs) configured on the interface
    is up, then the tunnel interface state is also declared as ‘up’. If all NHSs configured on the interface are down, then the tunnel interface state is also declared as ‘down’.

    The system does not consider NHSs configured with ‘no-reply’ when determining the interface state.



    Hi Po8,  I am wondering if it is beneficial to use this command when there are multiple NHS's on the tunnel......or if it just gives us the ability do a health-check on the LINE-PROTOCOL status of the tunnel.

     




    I am not clear what you mean by "whether it matters which routing table the NHS is rachable via?".  Isn't the NHS always in the INTERNAL(back-door) vrf?  Which would mean that your tunnel is in the global table, since you pinged the nhs in the global.




    I also am not understanding HOW your NHS is going to sleep.  Is it possible that you are catching it between NHRP REGISTRATION TIMEOUTS before the spoke has time to try and re-register.   With my tests below, the UP and DOWN is dependent on the registration timeouts (maybe your ping is reinitiating this too).








    I finally figured out a way to test it, and I found a small difference (with and without the option). 






    Hope this helps.  Let me know if you learn something new about this. I would like to know moe.

    Thanks.

     

    TEST SETUP:

    1.  lower NHRP REGISTRATION TIMEOUT <-- STATE depends on nhrp registration

    2.  two example spokes -- with and without if-state nhrp

    3. SHUTDOWN the hub tunnel

                   ....and then check the spoke tunnels

     

     

    RESULTS:

    ========WITH the option (if-state nhrp on R1)========

    R1#show dmvpn int tun 0 DETAIL
    ==========================================================================
    Interface Tunnel0 is up/down, Addr. is 10.0.0.1, VRF ""
       Tunnel Src./Dest. addr: 169.254.10.1/MGRE, Tunnel VRF "TEST"
       Protocol/Transport: "multi-GRE/IP", Protect ""
       Interface State Control: Enabled
       nhrp event-publisher : Disabled
    IPv4 Registration Timer: 30 seconds
    <SNIP>

    # Ent  Peer NBMA Addr Peer Tunnel Add State  UpDn Tm Attrb    Target Network
    ----- --------------- --------------- ----- -------- ----- -----------------
        1 169.254.60.6           10.0.0.6  NHRP 00:00:59    S       10.0.0.6/32

    R1#sh ip int br
    Interface                  IP-Address      OK? Method Status    Protocol
     Tunnel0                    10.0.0.1        YES NVRAM   up         down

     

    ========WITHOUT the option (no if-state nhrp on R2)========


    R2#  sh dmvpn int tun 0 DETAIL
    ==========================================================================
    Interface Tunnel0 is up/up, Addr. is 10.0.0.2, VRF ""
       Tunnel Src./Dest. addr: 169.254.20.2/MGRE, Tunnel VRF ""
       Protocol/Transport: "multi-GRE/IP", Protect ""
       Interface State Control: Disabled
       nhrp event-publisher : Disabled
    IPv4 Registration Timer: 30 seconds
    <SNIP>
    # Ent  Peer NBMA Addr Peer Tunnel Add State  UpDn Tm Attrb    Target Network
    ----- --------------- --------------- ----- -------- ----- -----------------
        1 169.254.60.6           10.0.0.6  NHRP 00:04:16    S       10.0.0.6/32

    R2#sh ip int br

    Interface                 IP-Address      OK? Method Status   Protocol
     Tunnel0                   10.0.0.2        YES NVRAM    up        up

     

     

  • Hi JoeM - thanks for the reply.  This is a scenario where there are two routers and two uplinks at multiple remote sites sites homed back to the man site (where the dmvpn hub lives).  The primary uplink is a dedicated P2P circuit, usually bonded T1 or metroe.  The secondary is usually dsl or cable internet with dmvpn/ipsec.  The idea is to eventually have hsrp running on the routers so if the primary goes down, the dmvpn backup takes over automatically.

    Everything uses a single EIGRP AS.  I believe that the primary router gets all of the routes, and the backup router only gets a default route from EIGRP.  The primary and the backup also swap routes.  Because of this, for many things, the backup router actually uses the primary router's routes - tacacs for example.  I'm wondering if it's also using the primary router's routes to reach the public IP for the DMVPN hub.  I can trace to it via either the FVRF or the global table since both have a default route, and both have an exit to the internet.

    The NHS is awake for several spokes, just one drops off and stays down until it's shut/no shut or fiddled with somehow.  At least this is my understanding.  This was handed to me with "I think we've had this problem before" so it might or might not actually be an ongoing issue.

    Basically, I need to find some best practices for DMVPN and some more info about specific commands and how they really work, but the CVD doc for DMVPN is not allowing me to access it, saying my acess level isn't high enough or something, which is strange since I have contracts at this point.

  • JoeMJoeM ✭✭✭

    ... I'm wondering if it's also using the primary router's routes to reach the public IP for the DMVPN hub.  I can trace to it via either the FVRF or the global table since both have a default route, and both have an exit to the internet.

    This does not make sense to me.  The tables are completely separate.  FVRF and Internal-VRF.   Even the IGP setups are in different VRF's (separated completely).

    But lets say without VRFs, that this problem existed.  Filter it.  This is important.   We DO NOT want NBMA addresses coming over the tunnels, nor do we want tunnel addresses (NHS) being advertised into the underlay space.  Huge problem.

    EDIT: I understand that you are saying there are two default routes (in two different VRFs).  This does not matter, because the tunnel underlay is tied to the FVRF (completely separate).  If that breaks, then the tunnel should not be functional.

     

    The NHS is awake for several spokes, just one drops off and stays down until it's shut/no shut or fiddled with somehow.  At least this is my understanding.  This was handed to me with "I think we've had this problem before" so it might or might not actually be an ongoing issue.

    Did you check the NHRP REGISTRATION TIMEOUT ?  What do they look like on this setup. It seemed in my research (just now), that this command (if-state nhrp) is dependent on the nhrp regist..timeout for faster response times (up or down).

    The IGP timers are separate.  When the neighbor goes down, the IGP goes down quickly (hello or update timers).

     

    If anyone else has some information, feel free to chime in.  

    Po8,  Let me know if you want me to test something else with my lab.

     

  • This does not make sense to me.  The tables are completely separate.  FVRF and Internal-VRF.   Even the IGP setups are in different VRF's (separated completely).

    But lets say without VRFs, that this problem existed.  Filter it.  This is important.   We DO NOT want NBMA addresses coming over the tunnels, nor do we want tunnel addresses (NHS) being advertised into the underlay space.  Huge problem.

    EDIT: I understand that you are saying there are two default routes (in two different VRFs).  This does not matter, because the tunnel underlay is tied to the FVRF (completely separate).  If that breaks, then the tunnel should not be functional.

    It doesn't make sense to me, either, which is why I was surprised when after pinging via the global table, the tunnel just woke up.  I had already tried all kinds of pinging and tracing inside the FVRF with different soure ints, etc.  I wasn't even meaning to bring the tunnel up, I was trying to figure out why it was down in the first place.

    Anyhow, the major diffs between this router ant all of the other spokes are that it's running 15, and that it has that single line in it.  While I understand the basic implications, I was hoping to find something more concrete about what methods it was using to determine the UPness of the NHS, and whether it was only using the VRF, or whether it could use the global table as a last resort, etc.  I'd assume, like you say, that it wouldn't make a lot of sense for it to use the global table, but it seems like my results disagree with this assumption.  I guess I'm going to have to build a lab and debug/wireshark it.

  • JoeMJoeM ✭✭✭

    Sounds good.  Let me know what you learn.

     

    While I understand the basic implications, I was hoping to find something more concrete about what methods it was using to determine the UPness of the NHS, and whether it was only using the VRF, or whether it could use the global table as a last resort, etc.  I'd assume, like you say, that it wouldn't make a lot of sense for it to use the global table, but it seems like my results disagree with this assumption.  I guess I'm going to have to build a lab and debug/wireshark it.

     

    EDIT: apologize if I am repeating myself here.  Just want to be sure that we are testing the same thing (with the same results).

    I did a packet capture, and nothing changed for NHRP.   Always two REQUESTS and two REPLIES (4 packets).   It was not until I lowered the registration timeout, that I started receiving quicker NHRP responses.  There was no heartbeat. Also, the if-state just changes the interface line-protocol state.  This should probably be a default behavior.

     

    As far as the VRF, I have been asuming that your setup is using TUNNEL VRF <VRFNAME> on the tunnel.  I have worked with a lot of different labs with this, and it is almost a blessing to have the FVRF separated like this.  There is no confusion.   This command, TUNNEL VRF XXX is telling DMVPN to use that VRF period.  It cannot come back up into the global table for its tunnel source/destination.

    I have found that the IGP recursive problems happen ONLY without the separation. This becomes obvious with logging errors from the IGP and the tunnel collapses up/down/up/down.

     

    Speaking without the details of your setup, I think the issue is with WHY the tunnel is going to sleep.  Or is it just breaking? Any logs?.  Is the NHRP mapped destination always reachable?   Other NHRP options?

    I would love to know the final answer to your issue.  Please keep us posted.  Maybe a DMVPN expert can help us with this. ;-)

     

     

  • The if-state command governs the protocol state of the interface, which relies on NHRP; if this is configured, for the interface to remain in the up state it needs a successful registration to at least one configured NHS, which means the spoke needs to receive a NHRP Registration Reply acknowledgment from at least one NHS.
    Behavior is the same regardless of how you deploy DMVPN, with FVRF, with IVRF or both.

    Sent from my iPhone

    On Jun 12, 2015, at 17:48, JoeM <[email protected]> wrote:

    imagepieces_of_eight:
    While I understand the basic implications, I was hoping to find something more concrete about what methods it was using to determine the UPness of the NHS, and whether it was only using the VRF, or whether it could use the global table as a last resort, etc.  I'd assume, like you say, that it wouldn't make a lot of sense for it to use the global table, but it seems like my results disagree with this assumption.  I guess I'm going to have to build a lab and debug/wireshark it.

    Sounds good.  Let me know what you learn.

    I did a packet capture, and nothing changed for NHRP.   Always two REQUESTS and two REPLIES (4 packets).   It was not until I lowered the registration timeout, that I started receiving quicker NHRP responses.  There was no heartbeat. Also, the if-state just changes the interface line-protocol state.  This should probably be a default behavior.

     

    As far as the VRF, I have been asuming that your setup is using TUNNEL VRF <VRFNAME> on the tunnel.  I have worked with a lot of different labs with this, and it is almost a blessing to have the FVRF separated like this.  There is no confusion.   This command, TUNNEL VRF XXX is telling DMVPN to use that VRF period.  It cannot come back up into the global table for its tunnel source/destination.

    I have found that the IGP recursive problems happen ONLY without the separation. This becomes obvious with logging errors from the IGP and the tunnel collapses up/down/up/down.

     

    Speaking without the details of your setup, I think the issue is with WHY the tunnel is going to sleep.  Or is it just breaking? Any logs?.  Is the NHRP mapped destination always reachable?   Other NHRP options?

    I would love to know the final answer to your issue.  Please keep us posted.  Maybe a DMVPN expert can help us with this. ;-)

     

     




    INE - The Industry Leader in CCIE Preparation

    http://www.INE.com



    Subscription information may be found at:

    http://www.ieoc.com/forums/ForumSubscriptions.aspx
Sign In or Register to comment.