FWSM Question - Lost connectivity to shared mgmt interface


Background:
We have Cisco 6500 Firewall Service Modules with multiple contexts and shared management interface. The FWSM has a secondary module, every context has its own management IP address along with secondary FWSM management IPs. The modules are in two 6500's bound by VSS acting as a single switch running on software code 4.1(10); upgrading to 4.1(13) soon. 

Problem:
Randomly lost any TCP connectivity to the management interface on all contexts. No ssh, no telnet (turned on to test and turned back off); although it appears that UDP works fine. We use LMS and are able to do an SNMPWalk on all contexts as well as TFTP files to system for upgrades.

Some of our findings (nothing conclusive obviously or I wouldn't be posted for advice :) )

1. admin %FWSM-6-106025: Failed to determine security context for packet: vlan89 icmp src 10.110.8.153/2048 dest 10.110.9.17/52819 <---9.x are context mgmt addresses. 8.153 is something that polls, opmanager or something of that nature (we see this with all connections, not just 8.153)

2. admin %FWSM-6-106025: Failed to determine security context for packet: vlan89 udp src 10.110.8.153/49472 dest 10.110.9.7/161 <-- same type of thing except UDP, although we know that UDP traffic passes fine.

3. We are able to SSH to the BVI, thus ruling out that SSH is completely hosed up on the module, but unable to SSH to the management address from the 6500 itself which removes all other network devices down stream from our hosts. 

4. Unable to connect to secondary module mgmt addresses for any context. For instance, primary might be 9.7, the secondary would be 9.8 and we get the same results. 

5. This is the weird part...This occurred to both our revenue (last Thursday) and non-revenue (Monday) 6500's which are both VSS (totaling 4 6500's) with 1 fwsm card in each. We were able to fail over and reload revenue on Saturday which fixed the issue, it then occurred again this morning (Thursday)

TAC is stumped; re-queuing the case in the morning...trying to find more information before then if anyone has any ideas. I've read that its possible to have stuck sessions when the FWSM loses connectivity to TACACS (we use ACS) however this is not the case.


Comments

  • peetypeety ✭✭✭

    Have you tried to reach the FWSM out-of-band, i.e. get onto the 6500, then 'session slot # process 1'?  If so, anything seem out of the ordinary there?

  • Yes, we are able to manage the contexts via a session.  There are some issues with that though that we obviously want to avoid; for instance, there is no ACS logging and other various things like someone sitting in a show command that never times out. 

    To answer your question though, yes we are able to session into the module from the 6500 itself. There doesn't seem to be anything out of the ordinary, we are able to SSH to the BVI so it's not really an issue with the protocol itself.  It's very strange, TAC recommended rebooting the entire chassis.  While this may "fix" the issue, its not a workaround we're ready to impelement everytime we lose management connectivity; we need to figure out the root cause since we have already seen that the issue can easily return in a random amount of time.

    We are on a somewhat out-dated software version 4.1(10) I believe, with the newest being 4.1(13) which we may upgrade to but there are no caveats that I've seen that describe this problem in the first place.

  • Is there an impact cycling the module at all? Allow the failover to take primary whilst you reseat or reboot the standby?


    Definitely an interesting issue you have there and I am interested in the result! 

Sign In or Register to comment.