CUCM High CPU Spikes and Utilization

In my two previous INE rack lab sessions (8-25 and 9-8), I was assigned Vorack2. Both times I experienced heavy latency with accessing all of the UC servers. all of the UC servers were extremely slow and intermittently were unresponsive.

During my last lab session, this past Sunday, I noticed that the network became super slow and unresponsive while I was specifically trying to access Enterprise Parameters and Service Parameters. Normal MAC's CUCM were fine.

Troubleshooting the problem through RTMT, I noticed that when I was accessing Enterprise and Service Parameters pages, both the Pub and Sub spiked to 99% CPU utilization for an average duration of 6 to 10minutes. During that time, I would also lose connectivity to the Unity and CUPS servers. When the CUCM Parameter pages finally loaded, CPU utilizations returned to 48%. Once CPU utilization on the CUCM's returned to below 50% mark, Unity and the Presence servers were responsive again.

During the 99% CPU spikes, looking at both RTMT and "show proc load cpu", I noticed that only two CPU processors were active, RISDB and one other process. Neither one took up more than 6% of CPU time, so there is no clear explanation as to why the CPU's were showing 99% utilization.

Regardless, I turned off ALL servers except WebAXL and restarted both the Pub and Sub. This did not resolve the issue. After the reboots, I stop and started (not restart) all network and feature services individually, including RISDB and then rebooted the Pub/Sub servers again. This still did not resolve the issues. I performed "utils snmp hardw restart", this did not resolve the issue. I took a long shot and performed a "utils dbreplication stop" and "utils dbreplication reset all" on the pub, needless to say this did not resolve the issue.

While I did open emergency tickets [#JYO-512382] and [#RLH-683594] and received my token refund both times, working with Christopher Drain from tech support, I think this issue remains open with VORACK2.

My question for the experts; am I'm missing some steps?

Please help

Thanks

Steven

Comments

  • Hi Steven,

    Sometime the vCPU processes got leaked and a simple VM restore solve such issues. I can remember I had the same issue for CUPS once but it got resolved by restoring the snapshot only. I hope UCM has nothing to do on this.

    Let's wait to see if anyone else has faced the similar issue on pods to get some other workaround ideas.

    On Sep 11, 2013 8:03 PM, "sxklauz" <[email protected]> wrote:

    In my two previous INE rack lab sessions (8-25 and 9-8), I was assigned Vorack2. Both times I experienced heavy latency with accessing all of the UC servers. all of the UC servers were extremely slow and intermittently were unresponsive.

    During my last lab session, this past Sunday, I noticed that the network became super slow and unresponsive while I was specifically trying to access Enterprise Parameters and Service Parameters. Normal MAC's CUCM were fine.

    Troubleshooting the problem through RTMT, I noticed that when I was accessing Enterprise and Service Parameters pages, both the Pub and Sub spiked to 99% CPU utilization for an average duration of 6 to 10minutes. During that time, I would also lose connectivity to the Unity and CUPS servers. When the CUCM Parameter pages finally loaded, CPU utilizations returned to 48%. Once CPU utilization on the CUCM's returned to below 50% mark, Unity and the Presence servers were responsive again.

    During the 99% CPU spikes, looking at both RTMT and "show proc load cpu", I noticed that only two CPU processors were active, RISDB and one other process. Neither one took up more than 6% of CPU time, so there is no clear explanation as to why the CPU's were showing 99% utilization.

    Regardless, I turned off ALL servers except WebAXL and restarted both the Pub and Sub. This did not resolve the issue. After the reboots, I stop and started (not restart) all network and feature services individually, including RISDB and then rebooted the Pub/Sub servers again. This still did not resolve the issues. I performed "utils snmp hardw restart", this did not resolve the issue. I took a long shot and performed a "utils dbreplication stop" and "utils dbreplication reset all" on the pub, needless to say this did not resolve the issue.

    While I did open emergency tickets [#JYO-512382] and [#RLH-683594] and received my token refund both times, working with Christopher Drain from tech support, I think this issue remains open with VORACK2.

    My question for the experts; am I'm missing some steps?

    Please help

    Thanks

    Steven





    INE - The Industry Leader in CCIE Preparation

    http://www.INE.com



    Subscription information may be found at:

    http://www.ieoc.com/forums/ForumSubscriptions.aspx
  • I was issued vorack2 on Monday 9/9 and experienced similar issues. Everything was very slow to load. I didn't take the time to look at anything during my session because I got a late start and my time was limited, but I have had this happen several times. I know a couple of weeks ago there was a hardware failure with the VM servers. I wonder if it is related and something else is beginning to fail on them.

  • I experienced this on vorack1 yesterday.

    ALL other pages loaded quickly. The service parameter pages took upwards of 10 minutes to load.



    On Wed, Oct 2, 2013 at 7:11 AM, DanatShekhe <[email protected]> wrote:

    Same here at vorack2 today




    INE - The Industry Leader in CCIE Preparation

    http://www.INE.com



    Subscription information may be found at:

    http://www.ieoc.com/forums/ForumSubscriptions.aspx

  • This post should be under Rack Rentals > CCIE Voice section. Thanks.

Sign In or Register to comment.