BFD – Sub-second Failure Detection
If there’s no BFD
If you have two routers directly connected, like here:
In this case, it is normal that one of them will remove the routes learned from the other if the other one goes down completely. It is because the link will go to down state and the routing protocol adjacency will disappear.
If two routers are connected through an L2 device (switch) like down here:
In this case, when one of them goes down, it will not take down the interface of the L3 neighbour (other router) because the switch will still work fine and it will keep the other half of the like up:
If that’s the case, you will depend on routing protocol timers which are the failure detection mechanisms implemented in the routing protocol itself. Routing protocol timers will need to expire in order to bring the router adjacency down and start the convergence to some other path towards the destinations.
Routing protocols timers are not a bad mechanism and they can be tuned so that they detect the failure faster.
EIGRP hello and hold timers can be tuned to get you somewhere around 1 second for failure detection and the start of convergence. With IS-IS and OSPF you can enable fast hello option and this can get also to 1 second for failure detection.
You can probably guess by now that to speed things up the BFD from the title will be the best solution.
Whats is BFD?
To make failure detection fast, like really fast, like sub-second fast you should use BFD. BFD, which is a separate protocol for communication failure detection, uses small overhead probe packets (like smallish hello packets) that are sent many times in a second in order to get you to sub-second detection of communication failure.