BGP maximum paths versus static routes

Hi Folks,

I labbed up a few maximum paths examples yesterday since I know almost nothing about it.  I discovered that I appeared to be getting a full table for every path (in this case a few loopbacks, so NBD) - looking at docs, this seems to be the correct behaviour, but most of the docs I found were including MPLS in the mix as well.   When I had two bgp sessions between two single routers, using static routes to the far side source loopback seems to overcome this issue.  Obviously this doesnt work when multihomed to the same provider where you have a single router connected to two or more provider routers.

So, the questions:

Is multiple full tables when doing maximum paths the correct behaviour, and can it be overcome?

I'm assuming it's primarilly a tool designed to keep people from announcing more specific routes - is this correct?

Basically, if you do have to take multiple tables, other than as a last resort, where do people actually use it?  Where their provider refuses to accept more specific routes, but is willing to do multipath?

Comments

  • Also, I meant to ask, what's the purpose of "best" in a multipath scenario.  It's confusing to look at a bgp table with > on only a fraction of the actual in use links.  Yes, sho ip bgp <prefix> shows that the routes are multipath, but even then, one of my routes always was multipath, best.

  • I don't know the answer to your first post. but "best path", is also used by BGP to decide what prefixes it should send to its neighbors.

    (can check it by doing :#show ip bgp neighbor a.b.c.d advertised-routes)

  • I don't know the answer to your first post. but "best path", is also used by BGP to decide what prefixes it should send to its neighbors.

    (can check it by doing :#show ip bgp neighbor a.b.c.d advertised-routes)

     

    Right - I assume this (or something similar) is why it has to be done the way it is.  The thing that came to mind in the example you give is what would "next-hop" be, and would next-hop-self have to be set to avoid weirdness.  I had torn down my lab before I looked, and haven't taken time to check since.

    In any event, the "best" path choice when using multiple paths was mostly just something I noticed when I was poking around.

  • peetypeety ✭✭

    It depends on what behavior you want to overcome.  Multiple parallel sessions (one per link) is probably best, as it allows BGP to reconverge on a link failure.  Static routes means you could/will blackhole traffic if a link goes down.

    In my previous life as an ISP engineer, I'd always do a BGP session over each link.  It allows the other side to control whatever they want, i.e. MED if they want to push traffic to one link or the other.

    I think you might also be confusing this with max-prefix - that's to keep customers from making major mistakes.

  • Thanks peety.  I'm not confusing max-prefixes, though - I'm familiar with that option.

    I'm not talking about adding static routes in lieu of the bgp table, but using static routes to the far side bgp source loopback to get those routes into the igp/rib/fib.  I'll post the diff configs I'm asking about in a while.

    I realize I asked this question poorly.  I seem to have a habit of doing this recently, which is not at all normal for me.  I'll try to refrain from posting half baked thoughts while I'm sipping coffee - sorry for that, and thanks for tolerating it :-)

  • peetypeety ✭✭

    Also, I meant to ask, what's the purpose of "best" in a multipath scenario.  It's confusing to look at a bgp table with > on only a fraction of the actual in use links.  Yes, sho ip bgp <prefix> shows that the routes are multipath, but even then, one of my routes always was multipath, best.

    I think the purpose of "best" is "this is THE path that I'm going to advertise to my BGP friends as best".  At least you know which of the ECMP paths would get announced onward.

  • peetypeety ✭✭

    I'm pretty sure I understand what you're saying.  R1 in AS 100 has two links to R2 in AS 200.  With EBGP over link 1 and EBGP over link 2, you were getting "two full tables", but if you set up static routes on R1 to R2's Lo0 and EBGPed to R2's Lo0, you only had one session and "one full table".  In doing so, you're using BGP as a last-hop protocol, and the static routes are multipathing to the next-hop(s).  Nothing wrong with that, as long as you can detect link failures.  Metro Ethernet would be bad, unless you can do BFD.

  • [code]

    interface Loopback0
     ip address 1.1.1.1 255.255.255.255
    !
    interface Loopback100
     ip address 100.0.0.1 255.0.0.0
    !
    interface Serial1/0
     ip address 10.1.1.1 255.255.255.252
     serial restart-delay 0
    !
    interface Serial1/1
     ip address 10.2.1.1 255.255.255.252
     serial restart-delay 0
    !
    router bgp 1
     no synchronization
     bgp log-neighbor-changes
     network 100.0.0.0
     neighbor 2.2.2.2 remote-as 2
     neighbor 2.2.2.2 disable-connected-check
     neighbor 2.2.2.2 update-source Loopback0

     no auto-summary

    !
    ip route 2.2.2.2 255.255.255.255 10.1.1.2
    ip route 2.2.2.2 255.255.255.255 10.2.1.2

    [/code]

    This was what I was trying to describe by "static routes."  Obviously it's fairly clunky config, but it does result in cef load sharing.

    [code]

    R1#show ip cef 200.0.0.0
    200.0.0.0/8, version 31, epoch 0, per-destination sharing
    0 packets, 0 bytes
      via 2.2.2.2, 0 dependencies, recursive
        next hop 10.1.1.2, Serial1/0 via 2.2.2.2/32
        valid adjacency
      Recursive load sharing using 2.2.2.2/32.

    [/code]

    IME, the reasons somone chooses to multihome with a single provider is due to PA space, cost, or lack of other options.  Most of these sceanarios don't really require a full view at all, let alone two of the same full views.  Obviously there's some redundancy, but I'd personally rather have redundantcy with two providers.  There's also the scenario where the customer wants more bandwidth, but doesn't have a reasonable way to upgrade their single port so chooses two same-speed ports instead - say 2x ds3 rather than 100m fast eth.  You do make a good point about metro-e - I hadn't considered that until you made the point.  I need to learn more about ethernet oam...

    As usual, thanks, and congrats on your number!

  • Apparently this leads me to a new question.  I labbed up a true multipath scenario - the same two B2B routers with serial links in the example above, with 2 BGP sessions and multipath.  To see how "best" works in this scenario for ibgp neighbors, I added a second router in AS1, but used loopbacks for peering.  R1 in this case is the router with an EBGP session with R2 - R1 and R3 are in AS1, and R2 is in AS2.  Here's what I'm showing on R1 for the 200.0.0.0 network:

    [code]

    R1#show ip bgp sum
    BGP router identifier 100.0.0.1, local AS number 1
    BGP table version is 4, main routing table version 4
    2 network entries using 234 bytes of memory
    3 path entries using 156 bytes of memory
    1 multipath network entries and 2 multipath paths
    3/2 BGP path/bestpath attribute entries using 372 bytes of memory
    1 BGP AS-PATH entries using 24 bytes of memory
    0 BGP route-map cache entries using 0 bytes of memory
    0 BGP filter-list cache entries using 0 bytes of memory
    BGP using 786 total bytes of memory
    BGP activity 2/0 prefixes, 3/0 paths, scan interval 60 secs

    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    3.3.3.3         4     1      20      22        4    0    0 00:16:58        0
    10.1.1.2        4     2      33      33        4    0    0 00:28:29        1
    10.2.1.2        4     2      33      33        4    0    0 00:28:44        1

    R1#show ip bgp 200.0.0.0
    BGP routing table entry for 200.0.0.0/8, version 4
    Paths: (2 available, best #2, table Default-IP-Routing-Table)
    Multipath: eBGP
      Advertised to update-groups:
         1          2
      2
        10.2.1.2 from 10.2.1.2 (200.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath
      2
        10.1.1.2 from 10.1.1.2 (200.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath, best

     

    R1#show ip bgp neighbors 3.3.3.3 advertised-routes
    BGP table version is 4, local router ID is 100.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 100.0.0.0        0.0.0.0                  0         32768 i
    *> 200.0.0.0/8      10.1.1.2                 0             0 2 i

    Total number of prefixes 2

    [/code]

    OK, makes sense, next-hop is the one marked as best.  However, on R3:

    [code]

    R3#show ip bgp sum
    BGP router identifier 11.1.1.2, local AS number 1
    BGP table version is 4, main routing table version 4
    2 network entries using 234 bytes of memory
    2 path entries using 104 bytes of memory
    3/2 BGP path/bestpath attribute entries using 372 bytes of memory
    1 BGP AS-PATH entries using 24 bytes of memory
    0 BGP route-map cache entries using 0 bytes of memory
    0 BGP filter-list cache entries using 0 bytes of memory
    BGP using 734 total bytes of memory
    BGP activity 2/0 prefixes, 2/0 paths, scan interval 60 secs

    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    1.1.1.1         4     1      25      23        4    0    0 00:19:10        2

    R3#show ip bgp 200.0.0.0
    BGP routing table entry for 200.0.0.0/8, version 3
    Paths: (1 available, best #1, table Default-IP-Routing-Table)
      Not advertised to any peer
      2
        1.1.1.1 (metric 156160) from 1.1.1.1 (100.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, internal, best

    [/code]

    I'm trying to figure out why the next-hop was modified.  Next-hop-self isn't configured, and the advertizement would seem to indicate that it's sending 10.1.1.2 as next-hop.  Debugs show it obviously isn't:

    [code]

    R1#
    *Mar 13 11:35:31.899: BGP(0): 3.3.3.3 send UPDATE (format) 100.0.0.0/8, next 1.1.1.1, metric 0, path Local
    *Mar 13 11:35:31.903: BGP(0): 3.3.3.3 send UPDATE (format) 200.0.0.0/8, next 1.1.1.1, metric 0, path 2
    [/code]

  • Well, thinking it might have something to do with bgp on multiaccess, I swapped out the eth for a serial connection, and same deal.  Weird.  I apparently need to go back and review next-hop processing.

  • Apparently this is the default behavior for multipath bgp, but I'll be damned if I could find much info on it.  Internet Routing Architectures touches on this quite briefly, saying that it advertizes the bestpath to ibgp neighbors, via an implied next-hop-self.  Of course, it also touches on multipath versus multihop cef load sharing via loopbacks as update source and statics (again quite briefly).  Considering that, I'm going to stop worrying about multipath - I'm satisfied that I can answer whatever might be on the BGP exam, and if not, I'm still burning study time that could be better spent elsewhere.  If I ever need to use it, I'll pick up on it then.

  • OK, I'm a fatty fat fat liar.  To satisfy my own curiosity, I decided to see how things looked with a single session via loopbacks and statics:

    [code]

    R3#sho ip bgp
    BGP table version is 4, local router ID is 3.3.3.3
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    r>i11.1.1.0/24      1.1.1.1                  0    100      0 i
    *>i200.0.0.0/8      2.2.2.2                  0    100      0 2 i
    [/code]

    So now the next-hop isn't even on our network (2.2.2.2 is the loopback source in AS2) so to reach it, I have to redistribute statics into IGP.

    finally, over a single link, what I expected to see all along:

    [code]

    R3#sho ip bgp
    BGP table version is 6, local router ID is 3.3.3.3
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    r>i11.1.1.0/24      1.1.1.1                  0    100      0 i
    *>i200.0.0.0/8      10.1.1.2                 0    100      0 2 i
    [/code]

    next hop is the far side of the connected link for EBGP, requiring either redistributing connected in IGP or adding the network to IGP and setting that interface to passive.  Seems like all around, next-hop-self is the easiest to maintain solution unless there's a reason it shouldn't be set in a particular scenario.

  • peetypeety ✭✭

    Excellent research. Now you have the tools necessary to know how to work around a "don't change next-hop anywhere" restriction.

    Side note: it's OK for the next-hop to be learned by BGP (as long as it's not 0/0), so it'd be OK to redist conn/stat into BGP if you wanted to keep the IGP small.

    In a production network that's considering MPLS forwarding, I would say that redistributing connected/static into BGP would be a better choice than changing next hop, as you can extend the forwarding a hop further by doing so.  But we aren't talking about production networks here... *wink*

  • I'm trying to figure out why the next-hop was modified.  Next-hop-self isn't configured, and the advertizement would seem to indicate that it's sending 10.1.1.2 as next-hop.  Debugs show it obviously isn't:

     

    1. Next-hop modification, eBGP peer always updates the next-hop but iBGP peer doesn't do this.

    2. You can do multi-hop with static route but what you want load-sharing or failover?

    If you want load-sharing use configure static route with same AD and configure eBG:

    R7(config-router)#do show ip bgp sum | be Ne
    Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    8.8.8.8         4            8       5       5        3    0    0 00:00:28        1
    R7(config-router)#do show run | in ip route
    ip route 8.8.8.8 255.255.255.255 1.1.1.8
    ip route 8.8.8.8 255.255.255.255 2.2.2.8
    R7(config-router)#do show ip bgp 88.88.88.88
    BGP routing table entry for 88.88.88.88/32, version 3
    Paths: (1 available, best #1, table default)
      Not advertised to any peer
      8
        8.8.8.8 from 8.8.8.8 (88.88.88.88)
          Origin IGP, metric 0, localpref 100, valid, external, best

    R7#traceroute 88.88.88.88 source lo0

    Type escape sequence to abort.
    Tracing the route to 88.88.88.88

      1 2.2.2.8 512 msec
        1.1.1.8 140 msec
        2.2.2.8 640 msec
    R7#traceroute 88.88.88.88 source lo0

    Type escape sequence to abort.
    Tracing the route to 88.88.88.88

      1 1.1.1.8 208 msec
        2.2.2.8 460 msec
        1.1.1.8 428 msec

    If you want BGP peering with failover, just configure static route with differnt AD and configure BGP peering.

    If you have more question, just let us know.

    Good Luck

  • 1. Next-hop modification, eBGP peer always updates the next-hop but iBGP peer doesn't do this.

    Right, but in this case, iBGP *does* change next-hop - when using multipath (ebgp?) apparently there's an implicit next-hop-self added to the announcement.  You can see that R1 claims that it's advertizing a different next hop than its iBGP neighbor recieves.  Even R1s own debugs show that it's actually advertizing its own loopback as the next-hop.  My question at that point was "why?" - I still don't know, other than maybe it has to be done that way, or iBGP peers won't be able to take advantage of multipath.  The only place I've found any mention fo this at all is in Internet Routing Architectures, and even there, it's a single sentence saying that it happens.

    My own setups with BGP have all involved links to multiple neighbors/different ASes, and no load sharing configured.  Balancing of inbound and outbound traffic has been done via route-maps, prepending, and communities.  This is one of the reasons I was fooling around with multipath - I've never used it, and have only seen it mentioned in passing most places.  I suspect that, as Peety was discussing, it may be more common in the MPLS/SP world than the R&S world.

  • Excellent research. Now you have the tools necessary to know how to work around a "don't change next-hop anywhere" restriction.

    Side note: it's OK for the next-hop to be learned by BGP (as long as it's not 0/0), so it'd be OK to redist conn/stat into BGP if you wanted to keep the IGP small.

    In a production network that's considering MPLS forwarding, I would say that redistributing connected/static into BGP would be a better choice than changing next hop, as you can extend the forwarding a hop further by doing so.  But we aren't talking about production networks here... *wink*

    Thanks - I obviously have a ton to learn about using BGP for actual traffic engineering.  I've never touched MPLS at all, and watching the vids in the CCIP section of INE's stuff, it's obvious I have a lot to learn there as well.

  • 1. Next-hop modification, eBGP peer always updates the next-hop but iBGP peer doesn't do this.

    Right, but in this case, iBGP *does* change next-hop - when using multipath (ebgp?) apparently there's an implicit next-hop-self added to the announcement.  You can see that R1 claims that it's advertizing a different next hop than its iBGP neighbor recieves.  Even R1s own debugs show that it's actually advertizing its own loopback as the next-hop.  My question at that point was "why?" - I still don't know, other than maybe it has to be done that way, or iBGP peers won't be able to take advantage of multipath.  The only place I've found any mention fo this at all is in Internet Routing Architectures, and even there, it's a single sentence saying that it happens.

    My own setups with BGP have all involved links to multiple neighbors/different ASes, and no load sharing configured.  Balancing of inbound and outbound traffic has been done via route-maps, prepending, and communities.  This is one of the reasons I was fooling around with multipath - I've never used it, and have only seen it mentioned in passing most places.  I suspect that, as Peety was discussing, it may be more common in the MPLS/SP world than the R&S world.

    Let me clear what I mean is: iBGP sessions preserve the next hop attribute learned from eBGP peers but update the next-hop attribute learned from iBGP peers" and this is the case either single path or multipath. I think you misunderstood that iBGP never updates the next-hop attribute if you are thinking, this is wrong, iBGP preserves the next-hop attribute learned from eBGP but updates the next-hop of all routes learned from iBGP:

    Ok see here on both case a. single path:

    I have R4(AS 4)- R5(AS 100) - R6(AS100)

    R4#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 44.44.44.44/32   0.0.0.0                  0         32768 i
    *> 55.55.55.55/32   5.5.5.5                  0             0 100 i
    *> 66.66.66.66/32   5.5.5.5                                0 100 i  --> Next-hop has been updated and advertised to eBGP neighbor

    R6#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *>i44.44.44.44/32   4.4.4.4                  0    100      0 4 i --> No Next-hop update because R5 learned this route from eBGP neighbor, either we need next-hop-self command on R5 to R6 or we need route-map to manually change next-hop.
    *>i55.55.55.55/32   5.5.5.5                  0    100      0 i
    *> 66.66.66.66/32   0.0.0.0                  0         32768 i
    R6#

    b. Let's see on multipath case: R3(AS 3)-R1(AS 100)- R2(AS 100)

    R3#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.11.11.11/32   1.1.1.1                  0             0 100 i
    *> 22.22.22.22/32   1.1.1.1                                0 100 i -->Next-hop has been changed becuase route is learned from iBGP neighbor
    *> 33.33.33.33/32   0.0.0.0                  0         32768 i
    R3#show ip route 1.1.1.1
    Routing entry for 1.1.1.1/32
      Known via "ospf 1", distance 110, metric 65, type intra area
      Last update from 13.13.13.1 on Serial1/0, 00:22:14 ago
      Routing Descriptor Blocks:
        31.31.31.1, from 1.1.1.1, 00:22:14 ago, via Serial1/2
          Route metric is 65, traffic share count is 1
      * 13.13.13.1, from 1.1.1.1, 00:22:14 ago, via Serial1/0
    R2#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *>i11.11.11.11/32   1.1.1.1                  0    100      0 i
    *> 22.22.22.22/32   0.0.0.0                  0         32768 i
    *>i33.33.33.33/32   3.3.3.3                  0    100      0 3 i -> Next-hop hasn't been changed because route is learned from eBGP neighbor.

    I hope it clears your doubt about how and where BGP changes next-hop.

    Good Luck

  • peetypeety ✭✭

    Thanks - I obviously have a ton to learn about using BGP for actual traffic engineering.  I've never touched MPLS at all, and watching the vids in the CCIP section of INE's stuff, it's obvious I have a lot to learn there as well.

    In my head, the optimizations I speak of aren't related to traffic engineering.

    When a packet travels through an MPLS-enabled network, it'll eventually get to the end of the MPLS network and have to be forwarded based on normal layer-3 IP information. The next-to-last hop (known as the penultimate hop) recognizes that the next hop (i.e. the last hop) is the end of the line, and that in most cases that next hop will need to do normal L3 routing.  As a result, it pops the label off the packet before sending it the last-hop router.  This is called penultimate hop popping, or PHP, and appears as 'pop tag' when you look under the hood.

    If the IBGP mesh sees the BGP next-hop as router loopbacks, the hop before the listed loopback has to perform PHP.

    If the IBGP mesh sees the egress link's far side as the next-hop, the hop at the listed loopback pops the tag, as the next hop is outside "this" network.  By extending the next-hop information out by a hop, MPLS forwarding now reaches out by a hop.

     

  • 'm afraid I still don't understand what you're saying.  I've labbed this up several times at this point, and have checked several resources, and all I can find is that acrosss EBGP links, next-hop is updated, and by default, iBGP is not.  I've verified this with several different topologies.

     

    Let me clear what I mean is: iBGP sessions preserve the next hop attribute learned from eBGP peers but update the next-hop attribute learned from iBGP peers" and this is the case either single path or multipath. I think you misunderstood that iBGP never updates the next-hop attribute if you are thinking, this is wrong, iBGP preserves the next-hop attribute learned from eBGP but updates the next-hop of all routes learned from iBGP:

    Ok see here on both case a. single path:

    I have R4(AS 4)- R5(AS 100) - R6(AS100)

    R4#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 44.44.44.44/32   0.0.0.0                  0         32768 i
    *> 55.55.55.55/32   5.5.5.5                  0             0 100 i
    *> 66.66.66.66/32   5.5.5.5                                0 100 i  --> Next-hop has been updated and advertised to eBGP neighbor

    R6#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *>i44.44.44.44/32   4.4.4.4                  0    100      0 4 i --> No Next-hop update because R5 learned this route from eBGP neighbor, either we need next-hop-self command on R5 to R6 or we need route-map to manually change next-hop.
    *>i55.55.55.55/32   5.5.5.5                  0    100      0 i
    *> 66.66.66.66/32   0.0.0.0                  0         32768 i
    R6#

    OK, I'm with you so far.

    b. Let's see on multipath case: R3(AS 3)-R1(AS 100)- R2(AS 100)

    R3#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.11.11.11/32   1.1.1.1                  0             0 100 i
    *> 22.22.22.22/32   1.1.1.1                                0 100 i -->Next-hop has been changed becuase route is learned from iBGP neighbor
    *> 33.33.33.33/32   0.0.0.0                  0         32768 i

    Except, at least according to your diagram, R3 is in AS3 and is learning this route via EBGP with R1 in AS100.  The next-hop is being changed because it's EBGP, correct?

    R3#show ip route 1.1.1.1
    Routing entry for 1.1.1.1/32
      Known via "ospf 1", distance 110, metric 65, type intra area
      Last update from 13.13.13.1 on Serial1/0, 00:22:14 ago
      Routing Descriptor Blocks:
        31.31.31.1, from 1.1.1.1, 00:22:14 ago, via Serial1/2
          Route metric is 65, traffic share count is 1
      * 13.13.13.1, from 1.1.1.1, 00:22:14 ago, via Serial1/0
    R2#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *>i11.11.11.11/32   1.1.1.1                  0    100      0 i
    *> 22.22.22.22/32   0.0.0.0                  0         32768 i
    *>i33.33.33.33/32   3.3.3.3                  0    100      0 3 i -> Next-hop hasn't been changed because route is learned from eBGP neighbor.

    Unless R2 is peering with R3, it's learning the route from R1, correct?  R1 does not change the next-hop for iBGP peer R2 because it's iBGP, correct?

  • Unless R2 is peering with R3, it's learning the route from R1, correct?  R1 does not change the next-hop for iBGP peer R2 because it's iBGP, correct?

    There is no different operation methods on BGP for single path or multiple path for next-hop update. Multiple path is either for equal load sharing or unequal load sharing. If you have to use unequal load sharing use BGP DMZlink-BW option.

    I have this topology: R3(AS 3)-R1(AS 100)- R2(AS 100), R3 and R1 has eBGP peering nd R1 and R2 has iBGP peering.

    Now what will happen by defult on next-hop on default configuraiton: iBGP sessions preserve the next hop attribute learned from eBGP peers:

    R1 learned 33.33.33.33/32 network from R3 which is eBGP peering and sends to R2 because it learned from eBGP (not from iBGP) without updating next-hop which is 3.3.3.3.

    R2#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *>i11.11.11.11/32   1.1.1.1                  0    100      0 i
    *> 22.22.22.22/32   0.0.0.0                  0         32768 i
    *>i33.33.33.33/32  
    3.3.3.3                  0    100      0 3 i -> Next-hop hasn't been
    changed because route is learned from eBGP neighbor.

    R1 learned 22.22.22.22/32 network from R2 which is iBGP peering and sends to R3 because R3 is eBGP neighbor (if you don't havce fully mesh topology iBGP router doesn't send the routes to another iBGP neighbor learned from another iBGP neighbor, that is the reason of having Route Reflector and Confideration) and it automatically updates the next-hop before forwarding to eBGP(R3), if you look at R3 the next-hop of 22.22.22.22/22 network is 1.1.1.1 which is R1.

     

    R3#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.11.11.11/32   1.1.1.1                  0             0 100 i
    *>
    22.22.22.22/32   1.1.1.1                                0 100 i
    -->Next-hop has been changed becuase route is learned from iBGP
    neighbor
    *> 33.33.33.33/32   0.0.0.0                  0         32768 i

    So iBGP doesn't update the next-hop if route is from eBGP peer but it does if route is from iBGP peer to eBGP.

    Tell me now what's confusion you have?

     

  • I have this topology: R3(AS 3)-R1(AS 100)- R2(AS 100), R3 and R1 has eBGP peering nd R1 and R2 has iBGP peering.

    OK - I understand what you're saying here.

    R3#show ip bgp | be Ne
       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.11.11.11/32   1.1.1.1                  0             0 100 i
    *>
    22.22.22.22/32   1.1.1.1                                0 100 i
    -->Next-hop has been changed becuase route is learned from iBGP
    neighbor
    *> 33.33.33.33/32   0.0.0.0                  0         32768 i

    So iBGP doesn't update the next-hop if route is from eBGP peer but it does if route is from iBGP peer. eBGP always does it.

    Here's where I lose you.  What iBGP neighbor?  R3 is learning 11.11.11.11 and 22.22.22.22 from an EBGP neighbor, according to your description as well as the AS path info, and it's sourcing 33.33.33.33 as indicated by the default weight of a locally sourced route, the next hop of 0.0.0.0 and the lack of an AS path.  Thank you for your help, but I think I must be completely misunderstanding you.

  • Here's where I lose you.  What iBGP neighbor?  R3 is learning 11.11.11.11 and 22.22.22.22 from an EBGP neighbor, according to your description as well as the AS path info, and it's sourcing 33.33.33.33 as indicated by the default weight of a locally sourced route, the next hop of 0.0.0.0 and the lack of an AS path.  Thank you for your help, but I think I must be completely misunderstanding you.

    You are correct, I was looping while writing, sorry for this but iBGP always preserved the next-hop and eBGP updates the next-hop in single path or multiple paths but I tried in your scenario as well never got same result as yours. Can you tell me the IOS and platform you are using?

     

  • I honestly can't *not* make it happen - I've now tried multiple 7200 IOSes from 12.2 to 12.4 and even  the default INE dynamips IOS for the 3700s (some 12.4(t) IOS IIRC.

    R3(AS1)--R1(AS1)==R2(AS2)

    R1:

    [code]

    interface Loopback0
     ip address 1.1.1.1 255.255.255.255
    !
    interface Loopback100
     ip address 100.0.0.1 255.0.0.0
    !
    interface Serial1/0
     ip address 10.1.1.1 255.255.255.252
     serial restart-delay 0
    !
    interface Serial1/1
     ip address 10.2.1.1 255.255.255.252
     serial restart-delay 0
    !
    interface Serial1/2
     ip address 11.1.1.1 255.255.255.0
     serial restart-delay 0
    !
    !
    !
    !
    router eigrp 1
     redistribute connected
     network 11.0.0.0
     auto-summary
    !
    router bgp 1
     no synchronization
     bgp log-neighbor-changes
     network 11.1.1.0 mask 255.255.255.0
     network 100.0.0.0
     neighbor 3.3.3.3 remote-as 1
     neighbor 3.3.3.3 update-source Loopback0
     neighbor 10.1.1.2 remote-as 2
     neighbor 10.2.1.2 remote-as 2
     maximum-paths 16
     no auto-summary





    R1#show ip bgp
    BGP table version is 7, local router ID is 100.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.1.1.0/24      0.0.0.0                  0         32768 i
    *> 100.0.0.0        0.0.0.0                  0         32768 i
    *  200.0.0.0/8      10.2.1.2                 0             0 2 i
    *>                  10.1.1.2                 0             0 2 i


    R1#show ip bgp sum
    BGP router identifier 100.0.0.1, local AS number 1
    BGP table version is 7, main routing table version 7
    3 network entries using 351 bytes of memory
    4 path entries using 208 bytes of memory
    1 multipath network entries and 2 multipath paths
    3/2 BGP path/bestpath attribute entries using 372 bytes of memory
    1 BGP AS-PATH entries using 24 bytes of memory
    0 BGP route-map cache entries using 0 bytes of memory
    0 BGP filter-list cache entries using 0 bytes of memory
    BGP using 955 total bytes of memory
    BGP activity 7/4 prefixes, 9/5 paths, scan interval 60 secs

    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    3.3.3.3         4     1      22      32        7    0    0 00:09:18        0
    10.1.1.2        4     2      29      29        7    0    0 00:09:18        1
    10.2.1.2        4     2      27      28        7    0    0 00:09:18        1


    R1#show ip bgp nei 3.3.3.3 adv
    BGP table version is 7, local router ID is 100.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.1.1.0/24      0.0.0.0                  0         32768 i
    *> 100.0.0.0        0.0.0.0                  0         32768 i
    *> 200.0.0.0/8      10.1.1.2                 0             0 2 i

    Total number of prefixes 3


    R1#show ip bgp 200.0.0.1
    BGP routing table entry for 200.0.0.0/8, version 3
    Paths: (2 available, best #2, table Default-IP-Routing-Table)
    Multipath: eBGP
      Advertised to update-groups:
         1          2
      2
        10.2.1.2 from 10.2.1.2 (200.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath
      2
        10.1.1.2 from 10.1.1.2 (200.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath, best


    R1#show ip cef 200.0.0.1
    200.0.0.0/8, version 32, epoch 0, per-destination sharing
    0 packets, 0 bytes
      via 10.2.1.2, 0 dependencies, recursive
        traffic share 1
        next hop 10.2.1.2, Serial1/1 via 10.2.1.0/30
        valid adjacency
      via 10.1.1.2, 0 dependencies, recursive
        traffic share 1
        next hop 10.1.1.2, Serial1/0 via 10.1.1.0/30
        valid adjacency
      0 packets, 0 bytes switched through the prefix
      tmstats: external 0 packets, 0 bytes
               internal 0 packets, 0 bytes

    [/code]

     

    R2:

    [code]

    R2:

    interface Loopback0
     ip address 2.2.2.2 255.255.255.255
    !
    interface Loopback200
     ip address 200.0.0.1 255.0.0.0
    !
    interface Serial1/0
     ip address 10.1.1.2 255.255.255.252
     serial restart-delay 0
    !
    interface Serial1/1
     ip address 10.2.1.2 255.255.255.252
     serial restart-delay 0
    !
    router bgp 2
     no synchronization
     bgp log-neighbor-changes
     network 200.0.0.0 mask 255.0.0.0
     neighbor 10.1.1.1 remote-as 1
     neighbor 10.2.1.1 remote-as 1
     maximum-paths 16
     no auto-summary



    R2#show ip bgp
    BGP table version is 4, local router ID is 200.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.1.1.0/24      10.1.1.1                 0             0 1 i
    *                   10.2.1.1                 0             0 1 i
    *> 100.0.0.0        10.1.1.1                 0             0 1 i
    *                   10.2.1.1                 0             0 1 i
    *> 200.0.0.0/8      0.0.0.0                  0         32768 i


    R2#show ip bgp sum
    BGP router identifier 200.0.0.1, local AS number 2
    BGP table version is 4, main routing table version 4
    3 network entries using 351 bytes of memory
    5 path entries using 260 bytes of memory
    2 multipath network entries and 4 multipath paths
    3/2 BGP path/bestpath attribute entries using 372 bytes of memory
    1 BGP AS-PATH entries using 24 bytes of memory
    0 BGP route-map cache entries using 0 bytes of memory
    0 BGP filter-list cache entries using 0 bytes of memory
    BGP using 1007 total bytes of memory
    BGP activity 9/6 prefixes, 15/10 paths, scan interval 60 secs

    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    10.1.1.1        4     1      38      38        4    0    0 00:01:19        2
    10.2.1.1        4     1      38      37        4    0    0 00:01:19        2


    R2#show ip route 100.0.0.1
    Routing entry for 100.0.0.0/8
      Known via "bgp 2", distance 20, metric 0
      Tag 1, type external
      Last update from 10.2.1.1 00:01:00 ago
      Routing Descriptor Blocks:
        10.2.1.1, from 10.2.1.1, 00:01:00 ago
          Route metric is 0, traffic share count is 1
          AS Hops 1
          Route tag 1
      * 10.1.1.1, from 10.1.1.1, 00:01:00 ago
          Route metric is 0, traffic share count is 1
          AS Hops 1
          Route tag 1


    R2#show ip bgp 100.0.0.1
    BGP routing table entry for 100.0.0.0/8, version 3
    Paths: (2 available, best #1, table Default-IP-Routing-Table)
    Multipath: eBGP
    Flag: 0x820
      Advertised to update-groups:
         1
      1
        10.1.1.1 from 10.1.1.1 (100.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath, best
      1
        10.2.1.1 from 10.2.1.1 (100.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, external, multipath

    R2#show ip cef 100.0.0.1
    100.0.0.0/8, version 31, epoch 0, per-destination sharing
    0 packets, 0 bytes
      via 10.2.1.1, 0 dependencies, recursive
        traffic share 1
        next hop 10.2.1.1, Serial1/1 via 10.2.1.0/30
        valid adjacency
      via 10.1.1.1, 0 dependencies, recursive
        traffic share 1
        next hop 10.1.1.1, Serial1/0 via 10.1.1.0/30
        valid adjacency
      0 packets, 0 bytes switched through the prefix
      tmstats: external 0 packets, 0 bytes
               internal 0 packets, 0 bytes


    R2#show ip bgp neigh 10.2.1.1 advertised-routes
    BGP table version is 4, local router ID is 200.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.1.1.0/24      10.1.1.1                 0             0 1 i
    *> 100.0.0.0        10.1.1.1                 0             0 1 i
    *> 200.0.0.0/8      0.0.0.0                  0         32768 i


    Total number of prefixes 3
    R2#show ip bgp neigh 10.1.1.1 advertised-routes
    BGP table version is 4, local router ID is 200.0.0.1
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    *> 11.1.1.0/24      10.1.1.1                 0             0 1 i
    *> 100.0.0.0        10.1.1.1                 0             0 1 i
    *> 200.0.0.0/8      0.0.0.0                  0         32768 i

    Total number of prefixes 3

    [/code]

     

    R3:

    [code]

    R3:


    interface Loopback0
     ip address 3.3.3.3 255.255.255.255
    !
    interface Serial1/0
     ip address 11.1.1.2 255.255.255.0
     serial restart-delay 0
    !
    !
    !
    !
    router eigrp 1
     redistribute connected
     redistribute static
     network 11.0.0.0
     auto-summary
    !
    router bgp 1
     no synchronization
     bgp log-neighbor-changes
     neighbor 1.1.1.1 remote-as 1
     neighbor 1.1.1.1 update-source Loopback0
     no auto-summary



    R3# show ip bgp
    BGP table version is 19, local router ID is 3.3.3.3
    Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                  r RIB-failure, S Stale
    Origin codes: i - IGP, e - EGP, ? - incomplete

       Network          Next Hop            Metric LocPrf Weight Path
    r>i11.1.1.0/24      1.1.1.1                  0    100      0 i
    r>i100.0.0.0        1.1.1.1                  0    100      0 i
    *>i200.0.0.0/8      1.1.1.1                  0    100      0 2 i


    R3# show ip bgp sum
    BGP router identifier 3.3.3.3, local AS number 1
    BGP table version is 19, main routing table version 19
    3 network entries using 351 bytes of memory
    3 path entries using 156 bytes of memory
    3/2 BGP path/bestpath attribute entries using 372 bytes of memory
    1 BGP AS-PATH entries using 24 bytes of memory
    0 BGP route-map cache entries using 0 bytes of memory
    0 BGP filter-list cache entries using 0 bytes of memory
    BGP using 903 total bytes of memory
    BGP activity 7/4 prefixes, 8/5 paths, scan interval 60 secs

    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
    1.1.1.1         4     1      40      28       19    0    0 00:15:39        3


    R3#show ip route 200.0.0.1
    Routing entry for 200.0.0.0/8, supernet
      Known via "bgp 1", distance 200, metric 0
      Tag 2, type internal
      Last update from 1.1.1.1 00:02:59 ago
      Routing Descriptor Blocks:
      * 1.1.1.1, from 1.1.1.1, 00:02:59 ago
          Route metric is 0, traffic share count is 1
          AS Hops 1
          Route tag 2

    R3#show ip bgp 200.0.0.1
    BGP routing table entry for 200.0.0.0/8, version 19
    Paths: (1 available, best #1, table Default-IP-Routing-Table)
    Flag: 0x820
      Not advertised to any peer
      2
        1.1.1.1 (metric 2297856) from 1.1.1.1 (100.0.0.1)
          Origin IGP, metric 0, localpref 100, valid, internal, best


    R3#show ip cef 200.0.0.1
    200.0.0.0/8, version 24, epoch 0, cached adjacency to Serial1/0
    0 packets, 0 bytes
      via 1.1.1.1, 0 dependencies, recursive
        next hop 11.1.1.1, Serial1/0 via 1.1.1.1/32
        valid cached adjacency
    [/code]

     

    Have I done something strange with my mulipath setup?

  • It would be swell if the code tags changed font to something fixed-width...

  • Have I done something strange with my mulipath setup?

    I can't see any fault on your configurtion, only strange part is next-hop modification on R3 output, I first time noticed this strange behavior about next-hop processing within iBGP domain(R1 shouldn't update the next-hop to R3 of all routes learned from R2(eBGP neighbor). I will test in my side and update you.

     

  • FWIW, the Internet Routing Arch entry I referenced earlier is on pg 334 and goes:

    When dealing with iBGP peers, RTA will advertise only a single BGP entry out of the
    multiple identical entries that have a next-hop-self.

    I can't really parse this sentence, but it *sounds like* he's saying that this is the expected behaviour.  The example routing table (which I'm not transcribing) also seems to reflect this.

Sign In or Register to comment.