SCinet reconfig to fix multicast

Thu Nov 18 01:52:32 CST 1999

Today I worked with Zaid Albanna and Kevin Thompson of vBNS, as well as
David Meyer of Cisco, to work through the problems we have been having with
SC99 multicast.  Thanks to their help, I believe the SC99 SCINet IP
multicast service is now operational.

The rest of this note has all the gritty details which you may feel free to
ignore if you don't care about them.

There were two major problems that got solved today.  

The first was that RPF checking was doing things that I did not initially
understand.  The symptom was that doing

   show ip rpf 140.221.128.185

on the core router returned an M-BGP route in the wrong direction, even
though there was an OSPF route going the right direction.  (More than one,
actually.)

If you refer to the SCINet architecture map, you will see that the design
has two separate Ethernet switches (Switch-3 and Switch-4) and that the
core Cisco GSR routers are both attached to each of those.  I had turned
off PIM on some interfaces on the Cisco GSRs in an attempt to debug
problems.  It turns out that the interfaces without PIM were the best OSPF
route.  Since the best OSPF route was not available to the RPF check, it
fell back on the M-BGP route.  My faulty assumption was that RPF would fall
back on the next best OSPF route available, instead of falling back to the
next routing protocol available (based on administrative distance).  The
fix for this was to ensure that PIM was enabled on the appropriate interfaces.

The second problem appears to have been orthogonal to the first.  The
symptoms was that the (S,G) entry for a given source, off the SC99 show
floor, would regularly come and go.  We were watching the (source,group)
(140.221.9.163,224.2.177.155) entry on the SCINet core-rtr-1 get created,
work for about 4-5 minutes, and then disappear after 6.5 minutes.  The
cycle would repeat continuously and predictably.

Zaid Albanna and Kevin Thompson reported to me that this behavior was due
to the SCINet core-rtr-1 not sending PIM Join Refresh messages towards
vBNS.  Since the vBNS router was not getting Join Refreshes, it would time
out the (source,group) routing entry and stop passing traffic.  Apparently,
the SCINet router would detect the loss of traffic and eventually time out
its (source,group) entry.  At that point the SCINet router would get an
MSDP Source Active message, send a PIM Join towards vBNS, and the cycle
would repeat.

Late in the day, a direct ATM permanent virtual circuit was brought up
between the SC99 show floor and Argonne National Laboratory.  Temporarily
bypassing the vBNS, I brought up a direct M-BGP/PIM/MSDP peering between
SC99 and Argonne.  This did not eliminate the periodic symptoms, which
indicates to me that the fundamental problem was not within vBNS.

Since we don't see this problem at Argonne, and Argonne's border router is
running the same code and the same basic configuration as SC99, the only
variable unaccounted for was the use of PIM Dense Mode within the SC99 show
floor.  My working hypothesis then became that the PIM Dense Mode we were
running was not operating correctly with M-BGP/PIM/MSDP peers.

I have partioned the SC99 network into two halves: a PIM Sparse Mode region
and a PIM Dense Mode region.  The PIM Sparse Mode region contains cm-rtr-1,
core-rtr-1, and sw-rtr-5.  The PIM Sparse Mode Rendezvous Point address is
140.221.128.30 (the loopback address of core-rtr-1).  The PIM Dense Mode
region contains sw-rtr-7 and sw-rtr-8.  core-rtr-2 acts as the border
between the PIM Dense Mode and PIM Sparse Mode regions.  All traffic
between the two regions (even unicast) goes through the 140.221.128.0/28
LIS, centered on cm-sw-1.  That is, unicast traffic from sw-rtr-5 to
sw-rtr-8 will traverse core-rtr-1, cm-sw-1, and core-rtr-2.

Once I made this change, the periodic loss observed with the
(140.221.9.163,224.2.177.155) source disappeared.  I'm pleased to report
that (S,G) has been active on core-rtr-1 for over 2.5 hours at this point.
This (S,G) source is being delivered to SC99 by vBNS, as I have shut down
the direct ATM peering between SC99 and ANL.

Two notes for next year:

 - Where possible, avoid multiple redundant paths between the core/border
   routers and the edge/distribution switch-routers.  These tend to
   complicate the Reverse Path Forwarding checks inherent in multicast
   routing.

 - If PIM Sparse Mode is not available on all edge/distribution switch-
   routers, test the PIM Dense Mode <--> M-BGP/PIM/MSDP interoperability
   thoroughly.  In retrospect, this is something that could and should
   have been done at the SCINet staging at the Capital Center.

I look forward to watching the multicast traffic tomorrow, the last day of
the SC99 show, to see if we've solved the problems or if additional ones
crop up.
===
Bill Nickless    http://www.mcs.anl.gov/people/nickless      +1 630 252 7390
PGP:0E 0F 16 80 C5 B1 69 52 E1 44 1A A5 0E 1B 74 F7     nickless at mcs.anl.gov