While trying to enable IGMP-snooping on our EVPN fabric we ran into a “fun” issue.
We updated firmwares to enable the new route-types (EVPN Type-7/8). Unfortunately
as we are running Arista spines and Juniper leafs, we ran into a compatibility issue.
The Juniper leafs received a “malformed” route update from the spines, and decided to
reset the session. This of course broke the network for a short period.
Intro
We use multicast in our network for the streaming distribution. This allows us to efficiently use
our bandwidth within the network. While this worked perfectly in our “old” network, our “new”
EVPN fabric didn’t support igmp-snooping yet.
IGMP-snooping in EVPN
EVPN uses a couple of new route-types to enable IGMP snooping information to be exchanged in the network. As we use EVPN multihoming we need to exchange the IGMP information between the ESI LAG leafs also. To do this EVPN uses type-7 and type-8 routes.
- type-7 is the IGMP-join-sync, this syncs the joins sent by a client to both ESI LAG nodes.\
- type-8 is the IGMP-leave-sync, this syncs the leave sent by a client to both ESI LAG nodes.\
Spines as iBGP route-reflectors
Now as explained in the intro, the Junipers didn’t like the new route-updates that much and decided to kill the sessions. What increased the problem for us is that we are using the spines as route-reflectors, this caused the spines to send the “faulty” update to all leafs, which then caused the leafs to kill their sessions to the spines. Effectively isolation them from the network and causing an outage.
Spines
The spines report that the BGP session is closed by the leaf, and even supply the reason as reported by the leafs.
Nov 14 12:31:24 spine01 Bgp: %BGP-5-ADJCHANGE: peer 10.10.10.17 (VRF default AS 65003) old state Established event RecvNotify new state Idle
Nov 14 12:31:24 spine01 Bgp: %BGP-3-NOTIFICATION: received from neighbor 10.10.10.9 (VRF default AS 65003) 3/10 (Update Message Error/bad address/prefix field) 0 bytes
Nov 14 12:31:24 spine01 Bgp: %BGP-5-ADJCHANGE: peer 10.10.10.9 (VRF default AS 65003) old state Established event RecvNotify new state Idle
Nov 14 12:31:24 spine01 Bgp: %BGP-3-NOTIFICATION: received from neighbor 10.10.10.13 (VRF default AS 65003) 3/10 (Update Message Error/bad address/prefix field) 0 bytes
Nov 14 12:31:24 spine01 Bgp: %BGP-5-ADJCHANGE: peer 10.10.10.13 (VRF default AS 65003) old state Established event RecvNotify new state Idle
Leafs
The leafs report a little more information on what is the cause for the outage:
Nov 14 11:29:00 leaf1 rpd[18511]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 10.10.10.1 (Internal AS 65003) changed state from Established to Idle (event RecvUpdate) (instance master)
Nov 14 11:29:00 leaf1 rpd[18511]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 10.10.10.0 (Internal AS 65003) changed state from Established to Idle (event RecvUpdate) (instance master)
Nov 14 11:29:00 leaf1 rpd[18511]: EVPN_CORE_ISOLATED: EVPN core is isolated
Nov 14 11:29:01 leaf1 rpd[18511]: bgp_rcv_nlri:11842: NOTIFICATION sent to 10.10.10.1 (Internal AS 65003): code 3 (Update Message Error) subcode 10 (bad address/prefix field), Reason: peer 10.10.10.1 (Internal AS 65003) update included invalid route 8:0:0::0::0/2008 (35 of 37)
Nov 14 11:29:01 leaf1 rpd[18511]: Received malformed update from 10.10.10.1 (Internal AS 65003)
Nov 14 11:29:01 leaf1 rpd[18511]: Family inet-unicast, prefix 0.0.0.0/0
Nov 14 11:29:01 leaf1 rpd[18511]: bgp_rcv_nlri:11842: NOTIFICATION sent to 10.10.10.0 (Internal AS 65003): code 3 (Update Message Error) subcode 10 (bad address/prefix field), Reason: peer 10.10.10.0 (Internal AS 65003) update included invalid route 8:0:0::0::0/2008 (35 of 37)
Nov 14 11:29:01 leaf1 rpd[18511]: Received malformed update from 10.10.10.0 (Internal AS 65003)
Nov 14 11:29:01 leaf1 rpd[18511]: Family inet-unicast, prefix 0.0.0.0/0
What the leaf is saying is that it doesn’t like the update from the spine, causing it to reset the BGP session.
Packet capture
Workaround
To make sure that an IGMP-join/leave would not cause an issue again, we removed all IGMP-snooping config from the EVPN fabric. This is of course not a permanent solution as this causes the multicast traffic to be effectively broadcasted acros the fabric.
LAB testing
We built a test-setup in our virtual LAB, which allowed us to test some settings. Conclusions up to now are:
- enabling bgp-error-tolerance does not fix the issue
- Juniper support suggested enabling:
set protocols evpn leave-sync-route-oldstylewhich because of our multivendor setup did nothing.
Conclusion
Unfortunately we don’t have a permanent solution at this moment, so the only conclusion at this time is, don’t run your multivendor fabric with spines from vendor A and leafs from vendor B. If you want to run multivendor, it’s better to run AB both in the spine and leaf layer. But this only makes sense when you have a large network. As the maintenance will be hugely more complicated for your admins.
We are in contact with both Juniper and Arista support to try and solve this issue. We have not received a working solution at this time.
