OSPF reconvergence -- neighbor down but interface up
I probably need to lab this out, but we experienced an outage in production last week that I'm trying to understand and handle better. I could use some validation and any tips on fixing it.
The setup is pretty straightforward: imagine two routers and two circuits (for redundancy) between them. Each router also has a loopback interface. One of those circuits is Metro E (or VPLS), and you've set a lower cost on it because it's the higher bandwidth one; let's pretend the other circuit is just a point-to-point of irrelevant type. The goal of this whole setup is to make sure the loopback interface on the other router is reachable at all times, even when one circuit goes down. The two routers run OSPF and are in the same area.
In the event that our Metro E provider experiences an internal data plane failure, the OSPF neighbor relationship between my routers over that path will fail, but the interfaces will stay up, and they'll remain peered up over the point-to-point circuit.
Here's the problem: will the loopback addresses now be routed via the point-to-point circuit? It seems like both routers would still advertise the same LSAs -- "here's my router ID, and I'm connected to these three subnets -- the loopback, Metro E, and point-to-point." What's to stop the other router from building a topology that assumes (since both routers are still advertising the Metro E subnet) that the Metro E circuit is still the best path?